# Cluster Monitoring

# Instructions

Cadence is emitting metrics in both Server and client:

metrics:
  prometheus:
    timerType: "histogram"
    listenAddress: "0.0.0.0:8001"

The rest of the instructions are using local environment as an example.

For local server emitting metrics to Promethues, easies way is to use docker-compose (opens new window) to start a local Cadence.

Make sure to update the prometheus_config.yml to add "host.docker.internal:9098" to the scrape list before starting the docker-compose:

global:
  scrape_interval: 5s
  external_labels:
    monitor: 'cadence-monitor'
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: # addresses to scrape
          - 'cadence:9090'
          - 'cadence:8000'
          - 'cadence:8001'
          - 'cadence:8002'
          - 'cadence:8003'
          - 'host.docker.internal:9098'

Note: host.docker.internal may not work for some docker versions (opens new window)

  • After updating the prometheus_config.yaml as above, run docker-compose up to start the local Cadence

  • Go the the sample repo, build the helloworld sample make helloworld and run the worker ./bin/helloworld -m worker, and then in another Shell start a workflow ./bin/helloworld

  • Go to local Prometheus server (opens new window) , you should be able to check the metrics handler from client/frontend/matching/history/sysWorker are all healthy as targets (opens new window) Screen Shot 2021-02-20 at 11 31 11 AM

  • Go to local Grafana (opens new window) , login as admin/admin.

  • Configure Prometheus as datasource: use http://host.docker.internal:9090 as URL of prometheus.

  • Import the Grafana dashboard tempalte as JSON files.

Client side dashboard looks like this: Screen Shot 2021-02-20 at 12 32 23 PM

And server basic dashboard: Screen Shot 2021-02-20 at 12 31 54 PM

Screen Shot 2021-02-20 at 11 06 54 AM

# Grafana dashboard templates

This package (opens new window) contains examples of Cadence dashboards with Prometheus.

  • Cadence-Client is the dashboard of client metrics, and a few server side metrics that belong to client side but have to be emitted by server(for example, workflow timeout).

  • Cadence-Server-Basic is the the basic server dashboard to monitor/navigate the health/status of a Cadence cluster.

  • Apart from the basic server dashboard, it's recommended to set up dashboards on different components for Cadence server: Frontend, History, Matching, Worker, Persistence, Archival, etc. Any contribution (opens new window) is always welcome to enrich the existing templates or new templates!

# Periodic tests(Canary) for health check

It's recommended to run periodical test every hour on your cluster following this package (opens new window) to make sure a cluster is healthy.