# Cluster Configuration

This section will help to understand what you need for setting up a Cadence cluster.

You should understand some basic static configuration of Cadence cluster.

There are also many other configuration called "Dynamic Configuration" for fine tuning the cluster. The default values are good to go for small clusters.

Cadence’s minimum dependency is a database(Cassandra or SQL based like MySQL/Postgres). Cadence uses it for persistence. All instances of Cadence clusters are stateless.

For production you also need a metric server(Prometheus/Statsd/M3/etc).

For advanced features Cadence depends on others like ElastiCache+Kafka if you need Advanced visibility feature to search workflows. Cadence will depends on a blob store like S3 if you need to enable archival feature.

# Static configuration

# Understand the basic static configuration

There are quite many configs in Cadence. Here are the most basic configuration that you should understand.

Config name Explanation Recommended value
numHistoryShards This is the most important one in Cadence config.It will be a fixed number in the cluster forever. The only way to change it is to migrate to another cluster. Refer to Migrate cluster section.

Some facts about it:
1. Each workflow will be mapped to a single shard. Within a shard, all the workflow creation/updates are serialized.
2. Each shard will be assigned to only one History node to own the shard, using a Consistent Hashing Ring. Each shard will consume a small amount of memory/CPU to do background processing. Therefore, a single History node cannot own too many shards. You may need to figure out a good number range based on your instance size(memory/CPU).
3. Also, you can’t add an infinite number of nodes to a cluster because this config is fixed. When the number of History nodes is closed or equal to numHistoryShards, there will be some History nodes that have no shards assigned to it. This will be wasting resources.

Based on above, you don’t want to have a small number of shards which will limit the maximum size of your cluster. You also don’t want to have a too big number, which will require you to have a quite big initial size of the cluster.
Also, typically a production cluster will start with a smaller number and then we add more nodes/hosts to it. But to keep high availability, it’s recommended to use at least 4 nodes for each service(Frontend/History/Matching) at the beginning.
1K~16K depending on the size ranges of the cluster you expect to run, and the instance size. Typically 2K for SQL based persistence, and 8K for Cassandra based.
ringpop This is the config to let all nodes of all services connected to each other. ALL the bootstrap nodes MUST be reachable by ringpop when a service is starting up, within a MaxJoinDuration. defaultMaxJoinDuration is 2 minutes.

It’s not required that bootstrap nodes need to be Frontend/History or Matching. In fact, it can be running none of them as long as it runs Ringpop protocol.
For dns mode: Recommended to put the DNS of Frontend service

For hosts or hostfile mode: A list of Frontend service node addresses if using hosts mode. Make sure all the bootstrap nodes are reachable at startup.
publicClient The Cadence Frontend service addresses that internal Cadence system(like system workflows) need to talk to.

After connected, all nodes in Ringpop will form a ring with identifiers of what service they serve. Ideally Cadence should be able to get Frontend address from there. But Ringpop doesn’t expose this API yet.
Recommended be DNS of Frontend service, so that requests will be distributed to all Frontend nodes.

Using localhost+Port or local container IP address+Port will not work if the IP/container is not running frontend
services.NAME.rpc Configuration of how to listen to network ports and serve traffic.

bindOnLocalHost:true will bind on 127.0.0.1. It’s mostly for local development. In production usually you have to specify the IP that containers will use by using bindOnIP

NAME is the matter for the “--services” option in the server startup command.
Name: Use as recommended in development.yaml. bindOnIP : an IP address that the container will serve the traffic with
services.NAME.pprof Golang profiling service , will bind on the same IP as RPC a port that you want to serve pprof request
services.Name.metrics See Metrics&Logging section cc
clusterMetadata Cadence cluster configuration.

enableGlobalDomain:true will enable Cadence Cross datacenter replication(aka XDC) feature.

failoverVersionIncrement: This decides the maximum clusters that you will have replicated to each other at the same time. For example 10 is sufficient for most cases.

masterClusterName: a master cluster must be one of the enabled clusters, usually the very first cluster to start. It is only meaningful for internal purposes.

currentClusterName: current cluster name using this config file.

clusterInformation is a map from clusterName to the cluster configure

initialFailoverVersion: each cluster must use a different value from 0 to failoverVersionIncrement-1.

rpcName: must be “cadence-frontend”. Can be improved in this issue.

rpcAddress: the address to talk to the Frontend of the cluster for inter-cluster replication.

Note that even if you don’t need XDC replication right now, if you want to migrate data stores in the future, you should enable xdc from every beginning. You just need to use the same name of cluster for both masterClusterName and currentClusterName.

Go to cross dc replication for how to configure replication in production
As explanation.
dcRedirectionPolicy For allowing forwarding frontend requests from passive cluster to active clusters. “selected-apis-forwarding”
archival This is for archival history feature, skip if you don’t need it. Go to workflow archival for how to configure archival in production N/A
blobstore This is also for archival history feature Default cadence server is using file based blob store implementation. N/A
domainDefaults default config for each domain. Right now only being used for Archival feature. N/A
dynamicConfigClient Dynamic config is a config manager that you can change config without restarting servers. It’s a good way for Cadence to keep high availability and make things easy to configure.

Default cadence server is using FileBasedClientConfig. But you can implement the dynamic config interface if you have a better way to manage.
Same as the sample development config
persistence Configuration for data store / persistence layer.

Values of DefaultStore VisibilityStore AdvancedVisibilityStore should be keys of map DataStores.

DefaultStore is for core Cadence functionality.

VisibilityStore is for basic visibility feature

AdvancedVisibilityStore is for advanced visibility

Go to advanced visibility for detailed configuration of advanced visibility. See persistence documentation (opens new window) about using different database for Cadence
As explanation

# The full list of static configuration

Starting from v0.21.0, all the static configuration are defined by GoDocs in details.

Version GoDocs Link Github Link
v0.21.0 Configuration Docs (opens new window) Configuration (opens new window)
...other higher versions (opens new window) ...Replace the version in the URL of v0.21.0 ...Replace the version in the URL of v0.21.0

For earlier versions, you can find all the configurations similarly:

Version GoDocs Link Github Link
v0.20.0 Configuration Docs (opens new window) Configuration (opens new window)
v0.19.2 Configuration Docs (opens new window) Configuration (opens new window)
v0.18.2 Configuration Docs (opens new window) Configuration (opens new window)
v0.17.0 Configuration Docs (opens new window) Configuration (opens new window)
...other lower versions (opens new window) ...Replace the version in the URL of v0.20.0 ...Replace the version in the URL of v0.20.0

# Dynamic Configuration

Dynamic configuration is for fine tuning a Cadence cluster.

There are a lot more dynamic configurations than static configurations. Most of the default values are good for small clusters. As a cluster is scaled up, you may look for tuning it for the optimal performance.

Starting from v0.21.0 with this change (opens new window), all the dynamic configuration are well defined by GoDocs.

Version GoDocs Link Github Link
v0.21.0 Dynamic Configuration Docs (opens new window) Dynamic Configuration (opens new window)
...other higher versions (opens new window) ...Replace the version in the URL of v0.21.0 ...Replace the version in the URL of v0.21.0

For earlier versions, you can find all the configurations similarly:

Version GoDocs Link Github Link
v0.20.0 Dynamic Configuration Docs (opens new window) Dynamic Configuration (opens new window)
v0.19.2 Dynamic Configuration Docs (opens new window) Dynamic Configuration (opens new window)
v0.18.2 Dynamic Configuration Docs (opens new window) Dynamic Configuration (opens new window)
v0.17.0 Dynamic Configuration Docs (opens new window) Dynamic Configuration (opens new window)
...other lower versions (opens new window) ...Replace the version in the URL of v0.20.0 ...Replace the version in the URL of v0.20.0

However, the GoDocs in earlier versions don't contain detailed information. You need to look it up the newer version of GoDocs.
For example, search for "EnableGlobalDomain" in Dynamic Configuration Comments in v0.21.0 (opens new window) or Docs of v0.21.0 (opens new window), as the usage of DynamicConfiguration never changes.

  • KeyName is the key that you will use in the dynamicconfig yaml content
  • Default value is the default value
  • Value type indicates the type that you should change the yaml value of:
    • Int should be integer like 123
    • Float should be number like 123.4
    • Duration should be Golang duration like: 10s, 2m, 5h for 10 seconds, 2 minutes and 5 hours.
    • Bool should be true or false
    • Map should be map of yaml
  • Allowed filters indicates what kinds of filters you can set as constraints with the dynamic configuration.
    • DomainName can be used with domainName
    • N/A means no filters can be set. The config will be global.

For example, if you want to change the ratelimiting for List API, below is the config:

// FrontendVisibilityListMaxQPS is max qps frontend can list open/close workflows
// KeyName: frontend.visibilityListMaxQPS
// Value type: Int
// Default value: 10
// Allowed filters: DomainName
FrontendVisibilityListMaxQPS

Then you can add the config like:

frontend.visibilityListMaxQPS:
  - value: 1000
  constraints:
    domainName: "domainA"
  - value: 2000
  constraints:
    domainName: "domainB"      

You will expect to see domainA will be able to perform 1K List operation per second, while domainB can perform 2K per second.

NOTE 1: the size related configuration numbers are based on byte.

NOTE 2: for <frontend,history,matching>.persistenceMaxQPS versus <frontend,history,matching>.persistenceGlobalMaxQPS --- persistenceMaxQPS is local for single node while persistenceGlobalMaxQPS is global for all node. persistenceGlobalMaxQPS is preferred if set as greater than zero. But by default it is zero so persistenceMaxQPS is being used.

# How to update Dynamic Configuration

  • Local docker-compose by mounting volume: update the cadence section in the docker compose file, and mount config folder to host machine like the following:
cadence:
  image: ubercadence/server:master-auto-setup
  ports:
    ...(don't change anything here)
  environment:
    ...(don't change anything here)
    - "DYNAMIC_CONFIG_FILE_PATH=/etc/custom-dynamicconfig/development.yaml"
  volumes:
    - "/Users/<?>/cadence/config/dynamicconfig:/etc/custom-dynamicconfig"
  • Local docker-compose by logging into the container: run docker exec -it docker_cadence_1 /bin/bash to login your container. Then vi config/dynamicconfig/development.yaml to make any change. After you changed the config, use docker restart docker_cadence_1 to restart the cadence instance. Note that you can also use this approach to change static config, but it must be changed through config/config_template.yaml instead of config/docker.yaml because config/docker.yaml is generated on startup.

  • In production cluster: Follow this example of Helm Chart to deploy Cadence, update dynamic config here (opens new window) and restart the cluster.

  • DEBUG: How to make sure your updates on dynamicconfig is loaded? for example, if you added the following to development.yaml

frontend.visibilityListMaxQPS:
  - value: 10000

After restarting Cadence instances, execute a command like this to let Cadence load the config(it's lazy loading when using it). cadence --domain <> workflow list

Then you should see the logs like below

cadence_1        | {"level":"info","ts":"2021-05-07T18:43:07.869Z","msg":"First loading dynamic config","service":"cadence-frontend","key":"frontend.visibilityListMaxQPS,domainName:sample,clusterName:primary","value":"10000","default-value":"10","logging-call-at":"config.go:93"}

# Other Advanced Features

# Deployment & Release

Kubernetes is the most popular way to deploy Cadence cluster. And easiest way is to use Cadence Helm Charts (opens new window) that maintained by a community project.

If you are looking for deploying Cadence using other technologies, then it's reccomended to use Cadence docker images. You can use offical ones, or you may customize it based on what you need. See Cadence docker package (opens new window) for how to run the images.

It's always recommended to use the latest release. See Cadence release pages (opens new window).

Please subscribe the release of project by :

Go to https://github.com/uber/cadence -> Click the right top "Watch" button -> Custom -> "Release".

And see how to upgrade a Cadence cluster