Scalability

Runway supports scaling for horizontal and vertical resources of a service.

Horizontal

By default, Runway will scale up instances to handle all incoming requests. When a service is not receiving any traffic, instances are scaled down to zero.

Minimum instances

The minimum number of instances of a service. To override the default configuration:

# omitted for brevity
spec:
  scalability:
    min_instances: 3

Recommendation: Use this setting if you need to reduce cold start latency for a service.

To learn more, refer to documentation.

Maximum instances

The maximum number of instances of a service. To override the default configuration:

# omitted for brevity
spec:
  scalability:
    max_instances: 3

Recommendation: Use this setting if you need to limit the number of connections to a backing service, e.g. database.

To learn more, refer to documentation.

Maxmimum instance concurrent requests

The maximum number of concurrent requests per instance of the service. To override the default configuration:

# omitted for brevity
spec:
  scalability:
    max_instance_concurrent_requests: 100

Recommendation: Use this setting if you need to either optimize cost efficiency, or limit concurrency of a service.

To learn more, refer to documentation.

Vertical

By default, Runway will provision CPU and memory resources limits of 1000m and 512Mi, respectively. When a resource limit is exceeded, instance is terminated.

Note: CPU resources can be defined in millicores. If your container needs two full cores to run, you would put the value 2000m. If your container only needs ¼ of a core, you would put a value of 250m.

Memory

The memory limit of an instance. To override the default configuration:

# omitted for brevity
spec:
  resources:
    limits:
      memory: 2G

Recommendation: For right-sizing a service, refer to capacity planning.

To learn more, refer to documentation.

CPU

The CPU limit of an instance. To override the default configuration:

# omitted for brevity
spec:
  resources:
    limits:
      cpu: '2'

Recommendation: For right-sizing a service, refer to capacity planning.

To learn more, refer to documentation.

CPU Boost

Provide additional CPU during instance startup time. To enable configuration:

# omitted for brevity
spec:
  resources:
    startup_cpu_boost: true

Recommendation: Use this setting if you need to reduce cold start latency for a service.

To learn more, refer to documentation.