Versioning
The definition code of a Cadence workflow must be deterministic because Cadence uses event sourcing to reconstruct the workflow state by replaying the saved history event data on the workflow definition code. This means that any incompatible update to the workflow definition code could cause a non-deterministic issue if not handled correctly.
workflow.GetVersion()
Consider the following workflow definition:
func MyWorkflow(ctx workflow.Context, data string) (string, error) {
ao := workflow.ActivityOptions{
ScheduleToStartTimeout: time.Minute,
StartToCloseTimeout: time.Minute,
}
ctx = workflow.WithActivityOptions(ctx, ao)
var result1 string
err := workflow.ExecuteActivity(ctx, ActivityA, data).Get(ctx, &result1)
if err != nil {
return "", err
}
var result2 string
err = workflow.ExecuteActivity(ctx, ActivityB, result1).Get(ctx, &result2)
return result2, err
}
Now let's say we have replaced ActivityA with ActivityC, and deployed the updated code. If there is an existing workflow_execution that was started by the original version of the workflow code, where ActivityA had already completed and the result was recorded to history, the new version of the workflow code will pick up that workflow_execution and try to resume from there. However, the workflow will fail because the new code expects a result for ActivityC from the history data, but instead it gets the result for ActivityA. This causes the workflow to fail on the non-deterministic error.
Thus we use workflow.GetVersion().
var err error
v := workflow.GetVersion(ctx, "Step1", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, ActivityA, data).Get(ctx, &result1)
} else {
err = workflow.ExecuteActivity(ctx, ActivityC, data).Get(ctx, &result1)
}
if err != nil {
return "", err
}
var result2 string
err = workflow.ExecuteActivity(ctx, ActivityB, result1).Get(ctx, &result2)
return result2, err
When workflow.GetVersion()
is run for the new workflow_execution, it records a marker in the workflow
history so that all future calls to GetVersion
for this change ID--Step 1
in the example--on this
workflow_execution will always return the given version number, which is 1
in the example.
If you make an additional change, such as replacing ActivityC with ActivityD, you need to add some additional code:
v := workflow.GetVersion(ctx, "Step1", workflow.DefaultVersion, 2)
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, ActivityA, data).Get(ctx, &result1)
} else if v == 1 {
err = workflow.ExecuteActivity(ctx, ActivityC, data).Get(ctx, &result1)
} else {
err = workflow.ExecuteActivity(ctx, ActivityD, data).Get(ctx, &result1)
}
Note that we have changed maxSupported
from 1 to 2. A workflow that had already passed this
GetVersion()
call before it was introduced will return DefaultVersion
. A workflow that was run
with maxSupported
set to 1, will return 1. New workflows will return 2.
After you are sure that all of the workflow_executions prior to version 1 have completed, you can remove the code for that version. It should now look like the following:
v := workflow.GetVersion(ctx, "Step1", 1, 2)
if v == 1 {
err = workflow.ExecuteActivity(ctx, ActivityC, data).Get(ctx, &result1)
} else {
err = workflow.ExecuteActivity(ctx, ActivityD, data).Get(ctx, &result1)
}
You'll note that minSupported
has changed from DefaultVersion
to 1
. If an older version of the
workflow_execution history is replayed on this code, it will fail because the minimum expected version
is 1. After you are sure that all of the workflow_executions for version 1 have completed, then you
can remove 1 so that your code would look like the following:
_ := workflow.GetVersion(ctx, "Step1", 2, 2)
err = workflow.ExecuteActivity(ctx, ActivityD, data).Get(ctx, &result1)
Note that we have preserved the call to GetVersion()
. There are two reasons to preserve this call:
- This ensures that if there is a workflow_execution still running for an older version, it will fail here and not proceed.
- If you need to make additional changes for
Step1
, such as changing ActivityD to ActivityE, you only need to updatemaxVersion
from 2 to 3 and branch from there.
You only need to preserve the first call to GetVersion()
for each changeID
. All subsequent calls to
GetVersion()
with the same change ID are safe to remove. If necessary, you can remove the first
GetVersion()
call, but you need to ensure the following:
- All executions with an older version are completed.
- You can no longer use
Step1
for the changeID. If you need to make changes to that same part in the future, such as change from ActivityD to ActivityE, you would need to use a different changeID likeStep1-fix2
, and start minVersion from DefaultVersion again. The code would look like the following:
v := workflow.GetVersion(ctx, "Step1-fix2", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, ActivityD, data).Get(ctx, &result1)
} else {
err = workflow.ExecuteActivity(ctx, ActivityE, data).Get(ctx, &result1)
}
Upgrading a workflow is straightforward if you don't need to preserve your currently running
workflow_executions. You can simply terminate all of the currently running workflow_executions and
suspend new ones from being created while you deploy the new version of your workflow code, which does
not use GetVersion()
, and then resume workflow creation. However, that is often not the case, and
you need to take care of the currently running workflow_executions, so using GetVersion()
to update
your code is the method to use.
However, if you want your currently running workflows to proceed based on the current workflow logic,
but you want to ensure new workflows are running on new logic, you can define your workflow as a
new WorkflowType
, and change your start path (calls to StartWorkflow()
) to start the new workflow
type.
Safe Deployment of Versioned Workflows
When deploying workflow changes in production environments, you may want to separate the versioning
of the workflow code from the activation of new logic to ensure ability to roll back changes safely.
The ExecuteWithVersion
and ExecuteWithMinVersion
options provides this capability by allowing you
to control which version is returned when GetVersion()
is executed for the first time.
Step 1: Deploy with Compatibility
Initially, your workflow has the following code:
err = workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
To safely roll out changes that replace FooActivity
with BarActivity
, you need both versions
of your workflow code to be compatible with each other. Use GetVersion
with the ExecuteWithMinVersion
option:
v := workflow.GetVersion(ctx, "fooToBarChange", workflow.DefaultVersion, 1, workflow.ExecuteWithMinVersion())
// equivalent to
// v := workflow.GetVersion(ctx, "fooToBarChange", workflow.DefaultVersion, 1, workflow.ExecuteWithVersion(workflow.DefaultVersion))
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}
At this step:
- The previous version of the code supports only
DefaultVersion
- The new version of the code supports both
DefaultVersion
and1
- The new version of the code is not yet activated, so workflows started on the new code will still execute the
FooActivity
(previous version) - This makes it possible to safely roll back your changes if needed, as the previous code supports
DefaultVersion
only
Step 2: Activate New Logic
When the previous version of the code is no longer running, you can activate the new code by removing the ExecuteWithMinVersion
option:
v := workflow.GetVersion(ctx, "fooToBarChange", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
err = workflow.ExecuteActivity(ctx, FooActivity).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
}
At this step:
- Both versions of the code support both
DefaultVersion
and1
- The new version of the code is activated, so workflows started on the new code will execute the
BarActivity
(new version) - This still allows safe rollback because both versions of the code support both
DefaultVersion
and1
Step 3: Clean Up Old Code
When there are no running workflows using DefaultVersion
, you can remove the corresponding branch:
workflow.GetVersion(ctx, "fooToBarChange", 1, 1)
err = workflow.ExecuteActivity(ctx, BarActivity).Get(ctx, nil)
ExecuteWithMinVersion
and ExecuteWithVersion
options are particularly useful when you want to ensure that
your changes can be safely rolled back if needed, as both versions of the workflow code remain compatible with
each other throughout the deployment process.
Sanity checking
The Cadence client SDK performs a sanity check to help prevent obvious incompatible changes. The sanity check verifies whether a decision made in replay matches the event recorded in history, in the same order. The decision is generated by calling any of the following methods:
- workflow.ExecuteActivity()
- workflow.ExecuteChildWorkflow()
- workflow.NewTimer()
- workflow.Sleep()
- workflow.SideEffect()
- workflow.RequestCancelWorkflow()
- workflow.SignalExternalWorkflow()
- workflow.UpsertSearchAttributes()
Adding, removing, or reordering any of the above methods triggers the sanity check and results in a non-deterministic error.
The sanity check does not perform a thorough check. For example, it does not check on the activity's
input arguments or the timer duration. If the check is enforced on every property, then it becomes
too restricted and harder to maintain the workflow code. For example, if you move your activity code
from one package to another package, that changes the ActivityType
, which technically becomes a different
activity. But, we don't want to fail on that change, so we only check the function name part of the
ActivityType
.