Deployment topology
Overview
Cadence is a highly scalable fault-oblivious stateful code platform. The fault-oblivious code is a next level of abstraction over commonly used techniques to achieve fault tolerance and durability.
A common Cadence-based application consists of a Cadence service, workflow and activity_workers, and external clients. Note that both types of workers as well as external clients are roles and can be collocated in a single application process if necessary.
Cadence Service
At the core of Cadence is a highly scalable multitenant service. The service exposes all of its functionality through a strongly typed gRPC API. A Cadence cluster include multiple services, each of which may run on multiple nodes for scalability and reliablity:
- Front End: which is a stateless service used to handle incoming requests from Workers. It is expected that an external load balancing mechanism is used to distribute load between Front End instances.
- History Service: where the core logic of orchestrating workflow steps and activities is implemented
- Matching Service: matches workflow/activity tasks that need to be executed to workflow/activity workers that are able to execute them. Matching is assigned task for execution by the history service
- Internal Worker Service: implements Cadence workflows and activities for internal requirements such as archiving
- Workers: are effectively the client apps for Cadence. This is where user created workflow and activity logic is executed
Internally it depends on a persistent store. Currently, Apache Cassandra, MySQL, PostgreSQL, CockroachDB (PostgreSQL compatible) and TiDB (MySQL compatible) stores are supported out of the box. For listing workflows using complex predicates, ElasticSearch and OpenSearch cluster can be used.
Cadence service is responsible for keeping workflow state and associated durable timers. It maintains internal queues (called task_lists) which are used to dispatch tasks to external workers.
Cadence service is multitenant. Therefore it is expected that multiple pools of workers implementing different use cases connect to the same service instance. For example, at Uber a single service is used by more than a hundred applications. At the same time some external customers deploy an instance of Cadence service per application. For local development, a local Cadence service instance configured through docker-compose is used.
Workflow Worker
Cadence reuses terminology from workflow automation domain. So fault-oblivious stateful code is called workflow.
The Cadence service does not execute workflow code directly. The workflow code is hosted by an external (from the service point of view) workflow_worker process. These processes receive decision_tasks that contain events that the workflow is expected to handle from the Cadence service, delivers them to the workflow code, and communicates workflow decisions back to the service.
As workflow code is external to the service, it can be implemented in any language that can talk service Thrift API. Currently Java and Go clients are production ready. While Python and C# clients are under development. Let us know if you are interested in contributing a client in your preferred language.
The Cadence service API doesn't impose any specific workflow definition language. So a specific worker can be implemented to execute practically any existing workflow specification. The model the Cadence team chose to support out of the box is based on the idea of durable function. Durable functions are as close as possible to application business logic with minimal plumbing required.
Activity Worker
Workflow fault-oblivious code is immune to infrastructure failures. But it has to communicate with the imperfect external world where failures are common. All communication to the external world is done through activities. Activities are pieces of code that can perform any application-specific action like calling a service, updating a database record, or downloading a file from Amazon S3. Cadence activities are very feature-rich compared to queuing systems. Example features are task routing to specific processes, infinite retries, heartbeats, and unlimited execution time.
Activities are hosted by activity_worker processes that receive activity_tasks from the Cadence service, invoke correspondent activity implementations and report back task completion statuses.
External Clients
Workflow and activity_workers host workflow and activity code. But to create a workflow instance (an execution in Cadence terminology) the StartWorkflowExecution
Cadence service API call should be used. Usually, workflows are started by outside entities like UIs, microservices or CLIs.
These entities can also:
- notify workflows about asynchronous external events in the form of signals
- synchronously query workflow state
- synchronously wait for a workflow completion
- cancel, terminate, restart, and reset workflows
- search for specific workflows using list API