Global Load Balancing

Cloudflare Global Load Balancing

Runway provides global load balancing through Cloudflare to distribute traffic across multiple regions, improving latency for global users and increasing availability through automatic failover.

Overview

Cloudflare Global Load Balancer sits at the edge of Cloudflare’s network (300+ data centers worldwide) and makes routing decisions at the Point of Presence (PoP) closest to the user. This enables:

Latency-based routing: Traffic is automatically routed to the fastest available origin based on measured latency from each Cloudflare PoP (depending on the load balancer settings)
Automatic failover: Unhealthy origins are removed from rotation within seconds
Multi-region distribution: Enables distribution of traffic across multiple regions

How It Works

┌─────────────────┐
│   User Request  │
└────────┬────────┘
         │
         ▼
┌────────────────────────────┐
│  Cloudflare PoP            │
│  (usually nearest to user) │
└────────┬───────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│  Global Load Balancer           │
│  - Check pool/endpoint health   │
│  - Compare latency              │
│  - Select fastest pool          │
└────────┬────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│     Origin Pools(GKE Regional LBs)      │
├─────────────┬─────────────┬─────────────┤
│  GKE US     │  GKE EU     │  GKE Asia   │
│  us-east1   │  eu-west1   │  asia-ne3   │
└─────────────┴─────────────┴─────────────┘

When a user makes a request, the request flow will be:

The request arrives at the nearest Cloudflare PoP
The load balancer checks health status and latency data for all origin pools and endpoints
Traffic is routed to the pool with the lowest latency from that PoP or from that region (depending on the setting)
If the selected pool becomes unhealthy, traffic automatically fails over to the next best pool within seconds

Configuration

Enabling Global Load Balancing

To enable global load balancing for your workload, add the following to your service entry in the provisioner’s workload inventory file (config/runtimes/[gke|eks]/workloads.yml):

- runway_service_id: my-service
  project_id: 12345678
  cloudflare:
    global_loadbalancer:
      enabled: true

This creates:

Origin pools for each region where your service is deployed
Health monitors to check endpoint availability
A load balancer with dynamic latency-based steering (routes traffic to the origin with lowest latency)

Health Monitor Configuration

Health monitors probe your origins to determine availability and measure latency. Configure monitors based on your requirements:

- runway_service_id: my-service
  project_id: 12345678
  cloudflare:
    global_loadbalancer:
      enabled: true
      monitor:
        protocol: tcp # tcp or https (default: tcp)
        path: /health # health endpoint path (https only, default /health)
        interval: 10 # seconds between checks (default: 10)
        timeout: 3 # seconds before marking failed (default: 3)
        consecutive_up: 1 # number of checks needed to mark healthy (default: 1)
        consecutive_down: 1 # number of checks needed to mark unhealthy (default: 1)

Monitor Protocol Options

Protocol	Use Case	Trade-offs
TCP (default)	Simple connectivity check	Fast, but only validates load balancer is reachable, the application does not receive the requests, the cloud provider’s load balancer does
HTTPS	Application-level health	More accurate (detects unhealthy backends), but adds load from health check requests - up to 70 rps per region

TCP monitors only need the regional load balancer to respond on port 443 which is already configured to do so by Runway. You do not need to change your application. This setting has minimal overhead but lacks the accuracy of HTTPS monitor as it only validates network connectivity.

HTTPS monitors query a specific health endpoint on your application. Use this when you need more accurate health status that reflects backend availability. Using this setting means your application will receive health check traffic from Cloudflare.

Protecting Health Endpoints

When using HTTPS monitors, you can block external access to your health endpoint while still allowing Cloudflare monitors. Runway achieves this using Cloudflare rules:

cloudflare:
  global_loadbalancer:
    enabled: true
    monitor:
      protocol: https
      path: /health
      block_health_endpoint: true # blocks public access to /health

All Data Centers Monitoring

By default, health checks run from a subset of Cloudflare regions. For more comprehensive latency measurement:

cloudflare:
  global_loadbalancer:
    enabled: true
    monitor:
      all_data_centers: true # probe from all Cloudflare PoPs

Complete Configuration Example

- runway_service_id: my-service
  project_id: 12345678
  groups:
    - gitlab-org/my-team
  cloudflare:
    enabled: true
    global_loadbalancer:
      enabled: true
      monitor:
        protocol: https
        path: /health
        interval: 10
        timeout: 3
        consecutive_up: 2
        consecutive_down: 2
        block_health_endpoint: true

Failover Behavior

When an origin pool becomes unhealthy:

Health monitors detect the failure (based on consecutive_down threshold)
The unhealthy pool is removed from rotation
Traffic automatically shifts to the next lowest-latency healthy pool
When the pool recovers (based on consecutive_up threshold), it rejoins rotation

If all the pools become unhealthy, the us-east1 region for GKE and us-east-1 region for EKS will receive the traffic as a last resort, even if they are unhealthy.

Keep in mind that the failover behaviour might take up to (consecutive_down + 1) × interval seconds to remove unhealthy pools.

You can adjust consecutive_up, consecutive_down, interval and timeout to tune failover behavior and its sensitivity.

Multi-Region Deployment

Global load balancing works best when the service is deployed to multiple regions. While EKS services can use the Global load balancer setting, they currently only deploy to one region and therefore don’t provide the same multi-region failover benefits as GKE.

Current Supported Regions

Cloud	Regions
GKE	us-east1, europe-west1, asia-northeast3
EKS	us-east-1

Cost Implications

HTTPS health checks from multiple regions increase request volume to your origins
Using all_data_centers: true further increases health check traffic

Support

For questions or issues with global load balancing, contact the Runway team in #f_runway.