Worker auto scaling

From Manual Tuning to Zero-Config: The AutoScaler That Eliminates Cadence Scaling Headaches

Visualizing the CPU utilization problem

The following Grafana dashboards demonstrate the CPU utilization issue that AutoScaler solves:

CPU Utilization vs CPU Quota

Low CPU utilization (5-15%) despite active workflow processing, leading to incorrect downscaling by compute autoscalers. See how utilization jumps inside the target range (45%) once the worker autoscaler is enabled.

Worker Instance Count Impact

Worker Instances Worker instance count fluctuations caused by CPU-based autoscaling decisions. Once the autoscaler is enabled, the instance count decreases 50%, saving on compute spend.

Golang Sample Demo

📚 Interactive Learning Sample: Try our AutoScaler sample implementation with built-in load generation, real-time metrics collection, and monitoring dashboards. Perfect for understanding how AutoScaler responds to different workload patterns and visualizing poller state changes in real-time.

Overview

What AutoScaler does

Cadence Worker AutoScaler automatically adjusts your worker configuration to optimize resource utilization and prevent common scaling issues. Instead of manually tuning poller counts, AutoScaler monitors real-time metrics from your workers and the Cadence service to make intelligent scaling decisions.

The AutoScaler addresses these critical production problems:

Insufficient throughput capacity: Automatically scales up pollers when task load increases, ensuring workflows are processed without delays
Better resource utilization: By adjusting poller counts based on actual task demand rather than static configuration, workers utilize their allocated CPU resources more efficiently, preventing unnecessary downscaling by compute autoscalers (e.g. AWS EC2 Auto Scaling)
Manual configuration complexity: Eliminates the need for service owners to understand and tune complex worker parameters

Key benefits

Zero-configuration scaling: Works out of the box with sensible defaults
Improved resource efficiency: Automatically scales up when needed, scales down when idle
Reduced operational overhead: No more manual tuning of poller counts and execution limits
Production reliability: Prevents scaling-related incidents and workflow processing delays

How to get started

To get started, just add the following to your worker options:

worker.Options{
    ...
    AutoScalerOptions: worker.AutoScalerOptions{
			Enabled: true,
    }
}

📚 Interactive Learning Sample: Try our AutoScaler sample implementation with built-in load generation, real-time metrics collection, and monitoring dashboards. Perfect for understanding how AutoScaler responds to different workload patterns and visualizing poller state changes in real-time.

⚠️ Note: If enabled, the AutoScaler will ignore these options:

worker.Options{
    ...
    MaxConcurrentActivityTaskPollers: 4,
    MaxConcurrentDecisionTaskPollers: 2, 
    ...
}

Compatibility considerations

Poller Count Setup: Before enabling AutoScaler, ensure your initial poller count equals the maximum of your decision and activity worker poller counts. This prevents AutoScaler from starting with insufficient polling capacity.

For example:

worker.Options{
    ...
    AutoScalerOptions: worker.AutoScalerOptions{
			Enabled: true,
            PollerMinCount: 2,
            PollerMaxCount: 8,
            PollerInitCount: 4, // Max of previous manually configured pollers (4 and 2 above)
    }
}

Scenario: Low CPU Utilization

Problem description

One of the most common production issues with Cadence workers occurs when compute autoscalers incorrectly scale down worker instances due to low CPU utilization. This creates a deceptive situation where workers appear to be underutilized from a resource perspective, but are actually performing critical work.

Here's what typically happens: Cadence workers spend most of their time polling the Cadence service for tasks. This polling activity is lightweight and doesn't consume significant CPU resources, leading to consistently low CPU usage metrics (often 5-15%). Compute autoscalers like Kubernetes HPA (Horizontal Pod Autoscaler) or cloud provider autoscaling groups see these low CPU numbers and interpret them as signals that fewer worker instances are needed.

When the autoscaler reduces the number of worker instances, several problems emerge:

Reduced polling capacity: Fewer workers means fewer pollers actively checking for new tasks, which can delay task processing
Cascading delays: As tasks take longer to be picked up, workflow execution times increase, potentially causing timeouts or SLA violations
Inefficient resource allocation: The remaining workers may actually be quite busy processing tasks, but the polling overhead doesn't reflect in CPU metrics

This problem is particularly acute in environments where:

Workflow tasks are I/O intensive rather than CPU intensive
Workers handle high volumes of short-duration tasks
There are periods of variable workload where quick scaling response is crucial

The fundamental issue is that traditional CPU-based autoscaling doesn't account for the unique nature of Cadence worker operations, where "being busy" doesn't necessarily translate to high CPU usage.

How AutoScaler helps

AutoScaler solves the CPU utilization problem by providing intelligent metrics that better represent actual worker utilization. Instead of relying solely on CPU metrics, AutoScaler monitors Cadence-specific signals to make scaling decisions.

The AutoScaler tracks several key indicators:

Poller utilization: How busy the pollers are, regardless of CPU usage
Task pickup latency: How quickly tasks are being retrieved from task lists
Queue depth: The number of pending tasks waiting to be processed
Worker capacity: The actual capacity of workers to handle more work

When AutoScaler detects that workers are genuinely underutilized (based on Cadence metrics, not just CPU), it can safely reduce poller counts. Conversely, when it detects that workers are busy or task lists are backing up, it increases poller counts to improve task pickup rates.

This approach prevents the common scenario where compute autoscalers scale down workers that appear idle but are actually critical for maintaining workflow performance. AutoScaler provides a more accurate representation of worker utilization that can be used to make better scaling decisions at both the worker configuration level and the compute infrastructure level.

📊 See the problem in action: See visualizations above

Scenario: Task List Backlogs

Understanding task list imbalances

Task list backlogs occur when some task lists receive more traffic than their allocated pollers can handle, while other task lists remain underutilized. This imbalance is common in production environments with multiple workflows, domains, or variable traffic patterns.

The core problem stems from static poller allocation. Traditional Cadence worker configurations assign a fixed number of pollers to each task list, which works well for predictable workloads but fails when:

Traffic varies between task lists: Some task lists get heavy traffic while others remain quiet
Workload patterns change over time: Peak hours create temporary backlogs that resolve slowly
Multi-domain deployments: Traffic spikes in one domain affect resource availability for others
Workflow dependencies: Backlogs in upstream task lists cascade to dependent workflows

These imbalances lead to uneven workflow execution times, SLA violations, and inefficient resource utilization. Manual solutions like increasing worker counts or reconfiguring poller allocation are expensive and reactive.

AutoScaler's poller management

AutoScaler solves task list backlogs through dynamic poller management that automatically redistributes polling capacity based on real-time demand.

Key capabilities include:

Automatic backlog detection: AutoScaler monitors task list metrics like queue depth, pickup latency, and task arrival rates to identify developing backlogs before they become critical.

Dynamic reallocation: When a task list needs more capacity, AutoScaler automatically moves pollers from underutilized task lists. This reallocation happens without worker restarts or manual intervention.

Pattern learning: The system learns from historical traffic patterns to anticipate regular peak periods and preemptively allocate resources.

Safety controls: AutoScaler maintains minimum poller counts for each task list and includes safeguards to prevent resource thrashing or system instability.

This approach ensures that polling capacity is always aligned with actual demand, preventing backlogs while maintaining efficient resource utilization across all task lists.

Visualizing task list backlogs

The following dashboard shows how AutoScaler addresses task list imbalances:

Decision Scheduled to Start Latency (p95)

Decision Latency High latency indicates task list backlogs that AutoScaler automatically resolves by redistributing pollers

Metrics Guide

Key metrics to monitor

Client Dashboards: http://localhost:3000/d/dehkspwgabvuoc/cadence-client

Note: Make sure to select a Domain in Grafana for the dashboards to display data. The dashboards will be empty until a domain is selected from the dropdown.

Monitor these key metrics to understand AutoScaler performance:

Decision Poller Quota

Description: Track decision poller count over time
Name: cadence_concurrency_auto_scaler_poller_quota_bucket
WorkerType: DecisionWorker
Type: Heatmap

Activity Poller Quota

Description: Track activity poller count over time
Name: cadence-concurrency-auto-scaler.poller-quota
WorkerType: ActivityWorker
Type: Heatmap

Decision Poller Wait Time

Description: Track decision poller wait time over time
Name: cadence-concurrency-auto-scaler.poller-wait-time
WorkerType: DecisionWorker
Type: Heatmap

Activity Poller Wait Time

Description: Track activity poller wait time over time
Name: cadence-concurrency-auto-scaler.poller-wait-time
WorkerType: ActivityWorker
Type: Heatmap

From Manual Tuning to Zero-Config: The AutoScaler That Eliminates Cadence Scaling Headaches​

Visualizing the CPU utilization problem​

CPU Utilization vs CPU Quota​

Worker Instance Count Impact​

Golang Sample Demo​

Overview​

What AutoScaler does​

Key benefits​

How to get started​

Compatibility considerations​

Scenario: Low CPU Utilization​

Problem description​

How AutoScaler helps​

Scenario: Task List Backlogs​

Understanding task list imbalances​

AutoScaler's poller management​

Visualizing task list backlogs​

Decision Scheduled to Start Latency (p95)​

Metrics Guide​

Key metrics to monitor​

Decision Poller Quota​

Activity Poller Quota​

Decision Poller Wait Time​

Activity Poller Wait Time​