Skip to main content

9 posts tagged with "Announcement"

Announcement tag description

View All Tags

Safe deployments of Versioned workflows

· 8 min read
Seva Kaloshin
Software Engineer II @ Uber

At Uber, we manage billions of workflows with lifetimes ranging from seconds to years. Over the course of their lifetime, workflow code logic often requires changes. To prevent non-deterministic errors that changes may cause, Cadence offers a Versioning feature. However, the feature's usage is limited because changes are only backward-compatible, but not forward-compatible. This makes potential rollbacks or workflow execution rescheduling unsafe.

To address these issues, we have made recent enhancements to the Versioning API, enabling the safe deployment of versioned workflows by separating code changes from the activation of new logic.

What is a Versioned Workflow?

Cadence reconstructs a workflow's execution history by replaying past events against your workflow code, expecting the exact same outcome every time. If your workflow code changes in an incompatible way, this replaying process can lead to non-deterministic errors.

A versioned workflow uses a Versioning feature to help you avoid errors. This allows developers to safely update their workflow code without breaking existing executions. The key is the workflow.GetVersion function (available in Go and Java). By using workflow.GetVersion, you can mark points in your code where changes occur, ensuring that future calls will return a specific version number.

Before the rollout, only instances of workflow code v0.1 existed:

v := workflow.GetVersion(ctx, "change-id", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, ActivityA, data).Get(ctx, &result1)
} else {
err = workflow.ExecuteActivity(ctx, ActivityC, data).Get(ctx, &result1)
}

Deployment flow

Let’s consider an example deployment of a change from workflow code v0.1, where only FooActivity is supported.

// Git tag: v0.1 
func MyWorkflow(ctx workflow.Context) error {
return workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
}

to workflow code v0.2, which introduces a new BarActivity and utilizes the Versioning feature:

// Git tag: v0.2 
func MyWorkflow(ctx workflow.Context) error {
version := workflow.GetVersion(ctx, "MyChange", workflow.DefaultVersion, 1)
if version == workflow.DefaultVersion {
return workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
}
return workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}

Before the rollout, only instances of workflow code v0.1 existed:

old-deployment-flow-v0.1.png

Rollouts are typically performed gradually, with new workers replacing previous worker instances one at a time. This means that multiple workers with workflow code v0.1 and v0.2 can exist simultaneously. When a worker is replaced, a running workflow execution is rescheduled to another worker. Thanks to the Versioning feature, a worker with workflow code v0.2 can support a workflow execution started by a worker with workflow code v0.1.

old-deployment-flow-v0.1-v0.2.png During rollouts, the service should continue to serve production traffic, allowing new workflows to be initiated. If a new worker processes a "Start Workflow Execution" request, it will execute a workflow based on the new version. However, if an old worker handles the request, it will start a workflow based on the old version.

old-deployment-flow-v0.1-v0.2-start-workflow.png

If a rollout is completed successfully, both the new and old workflows will continue to execute simultaneously. old-deployment-flow-v0.2.png

Versioned Workflow Rescheduling Problem

Workflows typically execute on the same worker on which they started. However, various factors can necessitate rescheduling with a different worker.:

  • Worker Shutdown: Occurs when a worker is shut down due to reasons such as rollouts, rollbacks, restarts, or instance crashes.
  • Worker Unavailability: Occurs when a worker is running but loses connection to the server, becoming unavailable.
  • High Traffic Load: Occurs when a worker's sticky cache is fully utilized, preventing further workflow execution and causing the server to reschedule the workflow to another worker.

During a rollout or rollback, workflow rescheduling for workflow executions with new versions becomes unsafe, especially during rollbacks: workflow-rescheduling-problem.png

  • If an old workflow is rescheduled to either an old or a new worker, it generally processes correctly.
  • If a new workflow is rescheduled to an old worker, it will be blocked or even fail (depending on NonDeterministicWorkflowPolicy).

Why did it happen?

The old worker doesn't support the new version and cannot replay its history correctly, which leads to a non-deterministic error. The Versioning API allowed customers to make only backward-compatible changes to workflow code definitions; however, these changes were not forward-compatible.

At the same time, there were no workarounds allowing customers to make these changes forward-compatible, so they couldn't separate code changes from the activation of the new version.

What impact did we have at Uber?

Depending on the workflow code, code changes, and impact, to eliminate the negative impact of a rollback, a Cadence customer needed to identify all problematic workflows, terminate them if they did not fail automatically, and restart them. These steps resulted in a significant on-call burden, leading to possible SLO violations and incidents.

Based on customer impact, we introduced changes in the Versioning API, enabling customers to separate code changes from the activation of the new version.

ExecuteWithVersion and ExecuteWithMinVersion

The recent release of the Go SDK (Java soon) has extended the GetVersion function and introduced two new options:

// When it's executed for the first time, it returns 2, instead of 10 
version := workflow.GetVersion(ctx, "changeId", 1, 10, workflow.ExecuteWithVersion(2))

// When it's executed for the first time, it returns 1, instead of 10
version := workflow.GetVersion(ctx, "changeId", 1, 10, workflow.ExecuteWithMinVersion())

These two new options enable customers to choose which version should be returned when GetVersion is executed for the first time, instead of the maximum supported version.

  • ExecuteWithVersion returns a specified value.
  • ExecuteWithMinVersion returns a minimal supported version.

Let’s extend the example above and consider the deployment of versioned workflows with new functions:

Deployment of Versioned workflows

Step 0

The initial version remains v0.1

// Git tag: v0.1 
// MyWorkflow supports: workflow.DefaultVersion
func MyWorkflow(ctx workflow.Context) error {
return workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
}

When a StartWorkflowExecution request is processed, a new workflow execution will have a DefaultVersion of the upcoming change ID.

new-deployment-flow-step-0.png

Step 1

GetVersion is still used; however, workflow.ExecuteWithVersion has also been added.

// Git tag: v0.2   
// MyWorkflow supports: workflow.DefaultVersion and 1
func MyWorkflow(ctx workflow.Context) error {
// When GetVersion is executed for the first time, workflow.DefaultVersion will be returned
version := workflow.GetVersion(ctx, "MyChange", workflow.DefaultVersion, 1, workflow.ExecuteWithVersion(workflow.DefaultVersion))

if version == workflow.DefaultVersion {
return workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
}
return workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}

Worker v0.2 contains the new workflow code definition that supports the new logic. However, when a StartWorkflowExecution request is processed, a new workflow execution will still have the default version of the “MyChange” change ID.

new-deployment-flow-step-1.png

This change enables customers to easily roll back to worker v0.1 without encountering any non-deterministic errors.

Step 2

Once all v0.2 workers are replaced with v0.1 workers, we can deploy a new worker that begins workflow executions with the new version.

// Git tag: v0.3   
// MyWorkflow supports: workflow.DefaultVersion and 1
func MyWorkflow(ctx workflow.Context) error {
// When GetVersion is executed for the first time, Version #1 will be returned
version := workflow.GetVersion(ctx, "MyChange", workflow.DefaultVersion, 1)

if version == workflow.DefaultVersion {
return workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
}
return workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}

Worker v0.3 contains the new workflow code definition that supports the new logic while still supporting the previous logic. Therefore, when a StartWorkflowExecution request is processed, a new workflow execution will have Version #1 of the “MyChange” change ID.

new-deployment-flow-step-2.png

This change enables customers to easily roll back to worker v0.2 without any non-deterministic errors, as both worker versions support "DefaultVersion" and "Version #1" of the “MyChange” change ID.

Step 3

Once all workers v0.3 replace the old worker v0.2 and all workflows with the DefaultVersion of “MyChange” are finished, we can deploy a new worker that starts workflow executions with the new version and doesn’t support the previous logic.

// Git tag: v0.4     
// MyWorkflow supports: 1
func MyWorkflow(ctx workflow.Context) error {
// When GetVersion is executed for the first time, Version #1 will be returned
_ := workflow.GetVersion(ctx, "MyChange", 1, 1)
return workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}

Worker v0.4 contains the new workflow code definition that supports the new logic but does not support the previous logic. Therefore, when a StartWorkflowExecution request is processed, a new workflow execution will have Version #1 of the “MyChange” change ID.

new-deployment-flow-step-3.png

This change finalizes the safe rollout of the new versioned workflow. At each step, both versions of workers are fully compatible with one another, making rollouts and rollbacks safe.

Differences with the previous deployment flow

The previous deployment flow for versioned workflows included only Steps 0, 2, and 3. Therefore, a direct upgrade from Step 0 to Step 2 (skipping Step 1) was not safe due to the inability to perform a safe rollback. The new functions enabled customers to have Step 1, thereby making the deployment process safe.

Conclusion

The new options introduced into GetVersion address gaps in the Versioning logic that previously led to failed workflow executions. This enhancement improves the safety of deploying versioned workflows, allowing for the separation of code changes from the activation of new logic, making the deployment process more predictable. This extension of GetVersion is a significant improvement that opens the way for future optimizations.

Introducing cadence-web v4.0.0

· 5 min read
Adhitya Mamallan
Software Engineer II @ Uber

We are excited to announce the release of cadence-web v4.0.0—a complete rewrite of the Cadence web app. Cadence has always been about empowering developers to manage complex workflows, and with this release, we not only modernize the web interface by embracing today’s cutting-edge technologies but also strengthen the open source community by aligning our tools with the broader trends seen across the industry.

What's new in cadence-web v4.0.0

  • Revamped UI & Experience – A fresh, modern interface designed for better usability and efficiency.
  • Multi-Cluster Support – The UI can now connect to multiple Cadence clusters.
  • Performance Improvements – Faster load times, optimised API calls, and a smoother experience.

Cadence Repositories Have Moved!

· One min read
Josué Alexander Ibarra
Developer Advocate @ Uber

We’re excited to announce that all Cadence GitHub repositories have been consolidated under the cadence-workflow organization! 🎉

Previously, Cadence repositories were distributed across multiple organizations at Uber: uber, uber-go, uber-common. To improve developer cohesiveness and simplify access, the Cadence Core team has migrated all open-source repositories to the cadence-workflow organization.

For example, our main repository has moved from:

👉 uber/cadence

To its new home:

👉 cadence-workflow/cadence

You can find the full list of Cadence repositories here 👉 orgs/cadence-workflow/repositories

Announcement: Cadence Helm Charts v0 Release

· 3 min read
Taylan Isikdemir
Sr. Staff Software Engineer @ Uber

We’ve heard your feedback: deploying Cadence has been a challenge, especially with limited documentation on operational aspects. So far, we’ve only provided a few docker compose files to help you get started on a development machine. However, deploying and managing Cadence at scale requires a deep understanding of underlying services, configurations and their dependencies.

To address these challenges, we’re launching several initiatives to make it easier to deploy and operate Cadence clusters. These include deployment specs for common scenarios, monitoring dashboards, alerts, runbooks, and more comprehensive documentation.

Cadence Community Spotlight Update - July 2023

· 3 min read
Sharan Foga
Director of Operations & Customer Success @ Encube Technologies

Welcome to the latest of our regular monthly Community Spotlight updates that gives you news from in and around the Cadence community!

Please see below for a roundup of the highlights:

Getting Started with Cadence

Are you new to Cadence and want to understand the basic concepts and architecture? Well we have some great information for you!

Community member Chris Qin has written a short blog post that takes you through the the three main components that make up a Cadence application. Please take a look and feel free to give us your comments and feedback.

Thanks Chris for sharing your knowledge and helping others to get started.

Cadence Go Client v1.0 Released

This month saw the release of v1.0 of the Cadence Go Client. Note that the work done on this release was as a result of community feedback asking for it - so we are listening and responding to community needs.

Thanks very much to everyone who worked hard to get this release out!

2023 Cadence Community Survey Results

· 4 min read
Ender Demirkaya
Senior Manager at Uber, Cadence. Author of the Software Engineering Handbook

We released a user survey earlier this year to learn about who our users are, how they use Cadence, and how we can help them. It was shared from our Slack workspace, cadenceworkflow.io Blog and LinkedIn. After collecting the feedback, we wanted to share the results with our community. Thank you everyone for filling it out! Your feedback is invaluable and it helps us shape our roadmap for the future.

Here are some highlights in text and you can check out the visuals to get more details:

using.png

job_role.png

Most of the people who replied to our survey were engineers who were already using Cadence, actively evaluating, or migrating from a similar technology. This was exciting to hear! Some of you have contacted us to learn more about benchmarks, scale, and ideal use cases. We will share more guidelines about this but until then, feel free to contact us over our Slack workspace for guidance.

Announcing Cadence OSS office hours and community sync up

· 2 min read
Liang Mei
Engineering Manager @ Uber

Are you a current Cadence user, do you operate Cadence services, or are you interested in learning about workflow technologies and wonder what problems Cadence could solve for you? We would like to talk to you!

Our team has spent a significant amount of time working with users and partner teams at Uber to design, scale and operate their workflows. This helps our users understand the technology better, smooth their learning curve and ramp up experience, and at the same time allows us to get fast and direct feedback so we can improve the developer experience and close feature gaps. As our product and community grows, we would like to expand this practice to our users in the OSS community. For the first time ever, members of the Cadence team along with core contributors from the community will host bi-weekly office hours to answer any questions you have about Cadence, or workflow technology in general. We can also dedicate future sessions to specific topics that have a common interest. Please don’t hesitate to let us know your thoughts.

Please join a session if you would like to talk about any of the following topics:

  1. Understand what Cadence is and why it might be useful for you and your company
  2. Guidance about running Cadence services and workers in production
  3. Workflow design and operation consultation
  4. Product update, future roadmaps as well as collaboration opportunities

Building and maintaining a healthy and growing community is the key to the success of Cadence, and one of the top priorities for our team. We would like to use the office hours as an opportunity to understand and help our customers, seek feedback, and forge partnerships. We look forward to seeing you in one of the meetings.

Upcoming Office Hours

As we have a geo-distributed userbase, we are still trying to figure out a time that works for most of the people. In the meanwhile, we will manually schedule the first few instances of the meeting until we settle on a fixed schedule. Our next office hours will take place on Thursday, October 21 2pm-3pm PT/5pm-6pm EST/9pm-10pm GMT. Please join via this zoom link.

Long-term commitment and support for the Cadence project, and its community

· 3 min read
Liang Mei
Engineering Manager @ Uber

Dear valued Cadence users and developers,

Some of you might have read Temporal’s recent announcement about their decision to drop the support for the Cadence project. This message caused some confusion in the community, so we would like to take this opportunity to clear things out.

First of all, Uber is committed to the long-term success of the Cadence project. Since its inception 5 years ago, use cases built on Cadence and their scale have grown significantly at Uber. Today, Cadence powers a variety of our most business-critical use cases (some public stories are available here and here). At the same time, the Cadence development team at Uber has enjoyed rapid growth with the product and has been driving innovations of workflow technology across the board, from new features (e.g. graceful failover, workflow shadowing, UI improvements) to better engineering foundations (e.g. gRPC support, multi-tenancy support), all in a backwards compatible manner. Neither Uber’s use nor support of Cadence is going to change with Temporal’s announcement. We have a long list of features and exciting roadmaps ahead of us, and we will share more details in our next meetup in November ‘21. As always we will continue to push the boundaries of scale and reliability as our usage within Uber grows.