Runway for Satellite Services Vision
Epic: Expand Platform Engineering to more runtimes
Objective
Outline a vision of how Satellite Services could be integrated with GitLab, and how Satellite Services could be run for gitlab.com (multi-tenant SaaS), Dedicated (single-tenant SaaS), and by self-managed customers, to create early alignment on the effort’s direction.
Throughout this document, Satellite Services refers to services that may run alongside a GitLab instance. These services may provide additional features, or may optionally run external to the monolith to provide scalability. The AI Gateway is an example of a Satellite Service and will be used for a litmus test of suggested solutions.
The work done on CI Runners and GitLab Dedicated served as an inspiration for this doc. We do want to pass off this prior art as an original idea.
Background
Most of the GitLab platform is implemented in a distributed monolith. However, some functionality is better implemented in satellite services running alongside the monolith. Examples of such functionality are Gitaly, CI Runners, and AI Gateway, and glgo.
As we move services closer to customers and target different user entry points (in-IDE, scheduled runs, add-on services), our expectation is that GitLab is going to encounter more scenarios in where we are best able to meet market demand by building satellite services, achieving horizontal scalability, more modularization / less coupling, fault tolerance, and higher development velocity.
Additionally our long term organizational goal is to increase the “self-service” capabilities of stream aligned teams, i.e. allow developers to create and manage the infrastructure of their services, to increase velocity and remove operational teams as bottlenecks.
In order to meet the changing expectations of the market and deliver AI Gateway, a Platform called “Runway” was created in FY24. Runway used Cloud Run as its first underlying runtime, which is a managed runtime offered by GCP. Cloud Run was chosen due to its low barrier to entry and high-level service management capabilities. Runway has attracted several (internal) users and allowed the dev team to turn up additional regions for AI Gateway by themselves.
Recently, Runway’s reliance on Cloud Run has been identified as a limitation for AI Gateway: many customers have regulatory requirements and chose GitLab specifically because it can be self-managed. Using Cloud Run is undesirable or prohibited to these self-managed customers. Future Satellite Services will likely run into this limitation, too.
This vision outlines a future in which GitLab benefits from a managed platform while enabling self-managed customers to meet their regulatory obligations.
Long Term Vision
Approximately 3–5 years in the future
Self-managed and Dedicated customers can enable a feature, such as AI Code Completion, in the Admin menu of their GitLab instance. When doing so, the UI offers three options:
-
A “GitLab-managed” option. If selected, the GitLab instance uses GitLab Cloud Connector to connect to an instance operated by GitLab.
-
A “GitLab-managed self-hosted” option. Users select a Kubernetes cluster from a drop-down menu, and upon confirming their choices the GitLab instance will turn-up/-down the required backends on the selected Kubernetes cluster.
ℹ️ Under the hood, the GitLab instance uses the GitLab agent for Kubernetes and Flux to manage Kubernetes workloads. A turn-up typically involves creating appropriate registration and access tokens, storing them in Kubernetes Secrets, and instantiating a Helm chart for Flux to pick up. The workload then calls back to GitLab to finalize the registration process.
-
A “self-managed” option. Users are responsible for downloading a Helm chart or container image and running it in any way that works for their organization.
gitlab.com uses the GitLab-managed backend services via Cloud Connector. The GitLab-managed backends use the Runway platform, which ensures uniformity, adherence to best practices, integration with common GitLab infrastructure, and safe and secure deployments.
This level of product integration requires buy-in from stage groups and is not something the Runway team can deliver independently. Runway aims to deliver a platform that will enable GitLab teams to make decisions about product/feature offerings as well as product monetisation that were not available (or extremely complex to set up) before this - e.g. consumption based pricing, services close to users for better UX, Data residency, Outage tolerant services
Medium Term Vision
Approximately 1–3 years in the future
Runway manifests are stored in several Git repositories owned by the service owners, alongside the Satellite Service’s code. The Runway team provides CI jobs generating a Helm chart based on the Runway manifest and a Helm template owned by the Runway team. The generated Helm charts are pushed to a central, public Package registry operated by the Runway team.
Helm charts have labels along two dimensions, GitLab release and deployment stage. The GitLab release component is used to signal compatibility with a given GitLab release. The deployment stage component is used by GitLab engineers to control the gradual deployment or new versions.
Runway allows teams to easily create a GCP project for their service and set up GKE complete with GitLab agent for Kubernetes and Flux with no manual work required. These GCP projects and GKE clusters are used to run the Satellite Service for the GitLab.com SaaS offering.
Large self-service customers with capable IT departments can replicate the setup using the public container images and Helm charts.
Short Term Vision
End of FY25
By establishing container images and Helm charts as the artifacts to be distributed, we enable large, technically savvy self-managed customers to run Satellite Services themselves. For AI Gateway, this unblocks adoption by several customers that have strict data governance requirements and/or want to use their own large-language models.
The Runway team operates a shared GKE cluster that is integrated with GitLab / Flux. Owners of Satellite Service can add their repository and Helm chart to be deployed onto this cluster with moderate manual work. The Runway team owns integration into GitLab infrastructure, such as Vault for secret storage, metrics, dashboards, etc.
The Runway team aims to expand the supported feature set of this GKE cluster to reach feature parity with the existing Cloud Run based setup, using concrete use-cases to set prioritization.