How can I version Cadence workflows? - cadence-workflow

Cadence workflows are required to be deterministic, which means that a workflow is expected to produce the exact same results if it’s executed with the same input parameters.
When I learned the requirement above as a new Cadence user, I wondered how I can maintain workflows in the long run when determinism-breaking changes are required.
An example scenario is where you have a workflow that executes Activity1 and Activity2 consecutively, and then you need to change the order of these activities so that the workflow executes Activity2 before Activtiy1. There are many other ways to make determinism-breaking changes like this, and I wanted to understand how to handle those changes.
This is especially important in cases where the workflows can run for long durations such as days, weeks, or even months!

Apparently, this is probably one of the most common questions a new Cadence developer asks. Cadence workflows are required to be deterministic algorithms. If a workflow algorithm isn’t deterministic, Cadence workers will be at the risk of hitting nondeterministic workflow errors when they try replaying the history (ie. during worker failure recovery).
There are two ways to solve this problem:
Creating a brand-new workflow: This is the most naive approach for
versioning workflows. The approach is as simple as it sounds: anytime
you need to make a change to your workflow’s algorithm, you make a
copy of your original workflow and edit it the way you want, give it
a new name like MyWorkflow_V2 and start using for all new instances
going forward. If your workflow is not very long-living, your
existing workflows will “drain out” at some point and you’ll be able
to delete the old version altogether. On the other hand, this
approach can turn into a maintenance nightmare very quickly for
obvious reasons.
Using the GetVersion() API to fork workflow logic: Cadence client has
a function named GetVersion, which tells you what version of the
workflow is currently running. You can use the information returned
by this function to decide which version of your workflow algorithm
needs to be used. In other words, your workflow has both the old and
new algorithms running side-by-side and you are able to pick the
right version for your workflow instances to ensure that they run
deterministically.
Below is an example of the GetVersion() based approach. Let’s assume you want to change the following line in your workflow:
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
to
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
This is a breaking change since it runs the bar activity instead of foo. If you simply make that change without worrying about determinism, your workflows will fail to replay if they need to and they’ll be stuck with the nondeterministic workflow error. The correct way to make this change properly is updating the workflow as follows:
v := GetVersion(ctx, "fooChange", DefaultVersion, 1)
if v == DefaultVersion {
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
}
The GetVersion function accepts 4 parameters:
ctx is the standard context object
“fooChange” is a human-readable ChangeID or the semantic change you are making in your workflow algorithm that breaks the determinism
DefaultVersion is a constant that simply means Version 0.In other
words, the very first version. It’s passed as the minSupportedVersion
parameter to the GetVersion function
1 is the maxSupportedVersion that can be handled by your current
workflow code. In this case, our algorithm can support workflow
versions from DefaultVersion to Version 1 (inclusively).
When a new instance of this workflow reaches the GetVersion() call above for the first time, the function will return the maxSupportedVersion parameter so that you can run the latest version of your workflow algorithm. In the meantime, it’ll also record that version number in the workflow history (internally known as a Marker Event) so that it is remembered in the future. When replaying this workflow later on, Cadence client will keep returning the same version number even if you pass a different maxSupportedVersion parameter (ie. if your workflow has even more versions).
If the GetVersion call is encountered during a history replay and the history doesn’t have a marker event that was logged earlier, the function will return DefaultVersion, with the assumption that the “fooChange” had never existed in the context of this workflow instance.
In case you need to make one more breaking change in the same step of your workflow, you simply need to change the code above like this:
v := GetVersion(ctx, "fooChange", DefaultVersion, 2) // Note the new max version
if v == DefaultVersion {
err = workflow.ExecuteActivity(ctx, foo).Get(ctx, nil)
} else if v == 1 {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else { // This is the Version 2 logic
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}
When you are comfortable with dropping the support for the Version 0, you change the code above like this:
v := GetVersion(ctx, "fooChange", 1, 2) // DefaultVersion is no longer supported
if v == 1 {
err = workflow.ExecuteActivity(ctx, bar).Get(ctx, nil)
} else {
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)
}
After this change, if your workflow code runs for an old workflow instance with the DefaultVersion version, Cadence client will raise an error and stop the execution.
Eventually, you’ll probably want to get rid of all previous versions and only support the latest version. One option is to simply get rid of the GetVersion call and the if statement altogether and simply have a single line of code that does the right thing. However, it’s actually a better idea to keep the GetVersion() call in there for two reasons:
GetVersion() gives you a better idea of what went wrong if your
worker attempts to replay the history of an old workflow instance.
Instead of investigating the root cause of a mysterious
nondeterministic workflow error, you’ll know that the failure is
caused by workflow versioning at this location.
If you need to make more breaking changes to the same step of your
workflow algorithm, you’ll be able to reuse the same Change ID and
continue following the same pattern as you did above.
Considering the two reasons mentioned above, you should be updating your workflow code like the following when it’s time to drop to support for all old versions:
GetVersion(ctx, "fooChange", 2, 2) // This acts like an assertion to give you a proper error
err = workflow.ExecuteActivity(ctx, baz).Get(ctx, nil)

Related

Go Unit Test irrelevant error "hostname resolving error"

I am trying to write unit test for this project
It appears that i need to refactor a lot and currently working on it. In order to test functions in project/api/handlers.go i was trying to write some unit test code however always taking error related with DB initializing. DB is from Psql Docker container. Error says given hostname is not valid, however without testing it works as no problem. Also, for Dockerized postgresql, container name is being used as hostname and this shouldn't be a problem.
The error is:
DB connection error: failed to connect to host=postgresdbT user=postgres database=worth2watchdb: hostname resolving error (lookup postgresdbT: no such host)
Process finished with the exit code 1
Anyway, i did a couple refactor and managed abstracting functions from DB query functions however this error still occurs and i cannot perform the test. So finally i decided to perform a totally blank test within same package simply checks with bcrypt package.
func TestCheckPasswordHash(t *testing.T) {
ret, err := HashPassword("password")
assert.Nil(t, err)
ok := CheckPasswordHash("password", ret)
if !ok {
t.Fail()
}
}
//HashPassword function hashes password with bcrypt algorithm as Cost value and return hashed string value with an error
func HashPassword(password string) (string, error) {
bytes, err := bcrypt.GenerateFromPassword([]byte(password), 4)
return string(bytes), err
}
//CheckPasswordHash function checks two inputs and returns TRUE if matches
func CheckPasswordHash(password, hash string) bool {
err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password))
return err == nil
}
However when I've tried to perform test for only TestCheckPasswordHash function with command of go test -run TestCheckPasswordHash ./api, it still gives same error. Btw, File is handlers_test.go, functions are at handlers.go file, package name is api for both .
There is no contact with any kind of DB related functions however i am having same error again and again. When i run this TestCheckPasswordHash code in another project or at project/util/util_test.go, it checks and passes as expected.
I don't know what to do, it seems that i cannot perform any test in this package unless figure this out.
Thanks in advance. Best wishes.
Was checking your repo, nice implementation, neat an simple good job!
I think your issue is in the init function, please try commenting it out and see if it work for that single test.
Is a bit complex to explain how the init function works without a graph of files as example but you can check the official documentation:
https://go.dev/doc/effective_go#init
PD: if this doesn't work please write me back
I've found the reason why this error occured and now it's solved.
This is partially related with Docker's working principle of how DB_Hostname must be declared but we can do nothing about it. So we need a better approach to overcome.
When go test command executed for (even) a function, Go testing logic checks whole package(not whole project) as per execution order, such as firstly variable declarations and init() function later. In case of calling an external package item, runtime detours that package for the item also.
Even if you are testing only an irrelevant function, you should consider that, before performing the test, Go runtime will evaluate whole package.
In order to prevent this, i wrapped the package level variables(even though they are in another package) that directly calls DB and cache. On initial stage, these can be allocated as a new variable. However, their connection will be ensured by main or main.init()
And now, prior to testing, all relevant variables (in same package or via external) are checked. Let's say if DB agent (redis.client or pgxpool.Pool) is needed, we are creating a new agent, compiler doesn't complain and testing begins. Agent will be operational only by a function call from main or somewhere we want to.
This is a better(may be not best) practice for a more maintainable code. Initialization of dependants should be able to be handled cautiously and in context with the functional needs. At least, with a simple refactoring, problem should be solvable.

Is it safe to invoke a workflow activity from defer() within the workflow code?

func MyWorkflow(ctx Context) (retErr error) {
log := workflow.GetLogger()
log.Info("starting my workflow")
defer func() {
if retErr != nil {
DoActivityCleanupError(ctx, ..)
} else {
DoActivityCleanupNoError(ctx, ...)
}
}
err := DoActivityA(ctx, ...)
if err != nil {
return err
}
...
err := DoActivityB(ctx, ...)
if err != nil {
return err
}
}
Basically there are catchall activities, ActivityCleanupNoError and ActivityCleanupError, that we want to execute whenever the workflow exits (particularly in the error case which we don't want to repeatedly call ActivityCleanupError in all error returns.
Does this work with distributed decision making? For example, if ownership of workflow decision move from one worker to another, is it going to trigger defer on the original worker?
Bonus Question: Does the logger enforce logging only once per workflow run? Even if decisions are moved from one worker to another? Do you expect to see the log line appear in both worker's log? or is there magic behind the scene to prevent this from happening?
Yes it is.
But it's quite complicated to understand why it is safe. This is how to get the conclusion:
In no-sticky mode(without sticky cache), Cadence SDK will always execute workflow code to make(collect) workflow decisions, and release all the goroutines/stack. When releasing them, the defer will be executed which mean the cleanup activity code path will run -- HOWEVER, those decision will be ignored. Hence it will not impact the actual correctness.
In sticky mode, if workflow is not closing, Cadence SDK will be blocking on somewhere; If the workflow is actually closing, then the defer will be executed and the decisions will be collected.
When the sticky cache(goroutines/stack) is evicted, what will happen is exactly the same as 1. so it's also safe.
Does the logger enforce logging only once per workflow run? Even if decisions are moved from one worker to another? Do you expect to see the log line appear in both worker's log? or is there magic behind the scene to prevent this from happening?
Each log line will only appear in the worker that actually executed the code as making decision -- in other words, in non-replay mode. That's the only magic :)

NUnit - Is it possible to count total test, total time, total passed test?

Can we count total test, total time, total passed test in NUnit ?
Currently having, In which test got identify with Pass/Fail. I am looking for if is there any method, which gives count;
if (TestContext.CurrentContext.Result.Outcome.Status.ToString() == "Failed")
{
Log.Info("TestContext.Message = " + TestContext.CurrentContext.Result.Message);
Log.Info("TestContext.StackTrace = " + TestContext.CurrentContext.Result.StackTrace);
}
else if (TestContext.CurrentContext.Result.Outcome.Status.ToString() == "Passed")
{
Log.Info("TestContext.Status = " + TestContext.CurrentContext.Result.Outcome.Status.ToString());
}
else
{
Log.Info("Undefined TestContext.Status = " + TestContext.CurrentContext.Result.Outcome.Status.ToString());
}
As you may have guessed, TestContext is really only useful while tests are running. Trying to get the final results while tests are still running is kind of like trying to get your final hotel bill while you are still using the room. The answer you get is tentative and subject to change, for example, if you eat breakfast, take something from the minibar, watch a movie, etc.
For that reason, it's best to wait till after the tests are over to look at the results. For an individual test case, that would be in the [TearDown] method. For a fixture or SetUpFixture, in the [OneTimeTearDown] method. Even so, if those methods happen to throw an exception, all bets are off!
For the total run, I would use an engine extension rather than putting the code in my tests. You would write a TestListener extension. In it, you would only take action when the entire test run is complete. Then the entire outcome of the test, including all the counts would be available. This is the "correct" approach, but it's also a bit more work than what you are doing. See docs for details.
Another approach is to write a program that processes the test result XML file and gets the info there. This has the advantage of being a separate, straightforward program and not requiring you to know how to write extensions.
Finally, as a workaround, you can use code similar to what you have. However, it may not work in all future releases, because it uses knowledge of internal structures...
Create an assembly-level SetUpFixture with a OneTimeTearDown method. To be assembly-level, it must be outside of any of your namespaces.
In the OneTimeTearDown, access NUnit.Framework.Internal.TestExecutionContext.CurrentContext.CurrentResult. This field is a TestResult and contains everything there is to know about the result of the assembly, including counts of tests that passed, failed etc.
Whatever else you do, do not try to do anything that changes the TestResult. Odds are you'll break something if you do that. :-)
Good luck!

API function to add an Action to an Event or Schedule?

I need to add an Action to a Schedule object that is being created through the API. There are documented interfaces to set almost all the options except the Action. How are Actions attached to these Objects?
When I attempt to programmatically add a new event, read from a separate configuration file, to a Schedule object I get errors stating that the Schedule has already been initialized and that I must construct a new object and add its configuration manually. I can do most of that using the available Schedule API. I can set up everything about the Schedule except the Action code.
The Schedule is used in a Process Model. Looking at the model in the Java editor, I see the code I'm trying to replicate via the API in a function that looks like this:
#Override
#AnyLogicInternalCodegenAPI
public void executeActionOf( EventTimeout _e ) {
if ( _e == _fuelDeliverySchedule_Action_xjal ) {
Schedule<Integer> self = this.fuelDeliverySchedule;
Integer value = fuelDeliverySchedule.getValue();
logger.info("{} received {} pounds of fuel", this.getName(), this.fuelDeliverySchedule.getValue());
this.fuelAvailablePounds += fuelDeliverySchedule.getValue();
;
_fuelDeliverySchedule_Action_xjal.restartTo( fuelDeliverySchedule.getTimeOfNextValue() );
return;
}
super.executeActionOf( _e );
}
Maybe I can use something like this to create my own action function, but I'm not sure how to make the Scheduled event use it.
Thanks,
Thom
[Edited (expanded/rewrote) 03.11.2014 after more user detail on the context.]
You clarified the context with
When I attempt to programatically add "a thing that happens", read
from a separate configuration file, to a Schedule object I get errors
stating that the Schedule has already been initialized and that I must
construct a new object and add its configuration manually. I can do
most of that using the available Schedule API. I can set up everything
about the Schedule except the Action code.
(You might want to edit that into the question... In general, it's always good to explain the context for why you're trying to do the thing.)
I think I understand now. I presume that your config file contains scheduling details and, when you say you were trying to "add a thing that happens" (which errored), you meant that you were trying to change the scheduling 'pattern' in the Schedule. So your problem is that, since you couldn't adjust a pre-existing schedule, you had to instantiate (create) your own programmatically, but the Schedule API doesn't allow you to set the action code (as seen on the GUI schedule element).
This is a fairly involved solution so bear with me. I give a brief 'tl;dr'
summary before diving into the detail.
Summary
You can't programmatically code an AnyLogic action (for any element) because that would amount to
dynamically creating a Java class. Solving your problem requires recognising
that the schedule GUI element creates both a Schedule instance and a
timeout event (EventTimeout) instance to trigger the action. You can therefore create these two elements explicitly yourself (the former dynamically). The trick is to reset the timeout event when you replace the Schedule instance (to trigger at the next 'flip' point of the new Schedule).
[Actually, from your wording, I suspect that the action is always the same but, for generality, I show how you could handle it if your config file details might want to change the nature of the action as well as those of the scheduling pattern.]
Detail
The issue is that the GUI element (confusingly) isn't just a Schedule instance
in terms of the code it generates. There is one (with the same name as that of
the GUI element), which just contains the schedule 'pattern' and, as in the API,
has methods to determine when the next on/off period (for an on/off schedule) occurs. (So
it is kind of fancy calendar functionality.) But AnyLogic also generates a
timeout event to actually perform the action; if you look further in the code
generated, you'll see stuff similar to the below (assuming your GUI schedule is called
fuelSchedule, with Java comments added by
me):
// Definition of the timeout event
#AnyLogicInternalCodegenAPI
public EventTimeout _fuelSchedule_Action_xjal = new EventTimeout(this);
// First occurrence time of the event as the next schedule on/off change
// time
#Override
#AnyLogicInternalCodegenAPI
public double getFirstOccurrenceTime( EventTimeout _e ) {
if ( _e == _fuelSchedule_Action_xjal ) return fuelSchedule.getTimeOfValue() == time() ? time() : fuelSchedule.getTimeOfNextValue();
return super.getFirstOccurrenceTime( _e );
}
// After your user action code, the event is rescheduled for the next
// schedule on/off change time
_fuelSchedule_Action_xjal.restartTo( fuelSchedule.getTimeOfNextValue() );
i.e., this creates an event which triggers each time the schedule 'flips', and performs the action specified in the GUI schedule element.
So there is no action to change on the Schedule instance; it's actually related to the EventTimeout instance. However, you can't programmatically change it there (or create a new one dynamically) for the same reason that you can't change the action of any AnyLogic element:
this would effectively be programmatically
creating a Java class definition, which isn't possible without very specialised
Java code. (You can create Java source code in a string and
dynamically run a Java compiler on it to generate a class. However, this is very
'advanced' Java, has lots of potential pitfalls, and I would definitely not
recommend going that route. You would also have to be creating source for a user subclass
of EventTimeout, since you don't know the correct source code for AnyLogic's proprietary EventTimeout class, and this might change per release in any case.)
But you shouldn't need to: there should be a strict set of possible actions that your config file can contain. (They can't be arbitrary Java code snippets, since they have to 'fit in' with the simulation.) So you can do what you want by programmatically creating the Schedule but with a GUI-created timeout event that you adjust accordingly(assuming an off/on schedule here and that there is
only one schedule active at once; obviously tweak this skeleton to your needs
and I haven't completely tested this in AnyLogic):
1. Have an AnyLogic variable activeAction which specifies the current active
action. (I take this as an int here for simplicity, but it's better to use a
Java enum which is the same as an AnyLogic 7 Option List, and can just be
created in raw Java in AnyLogic 6.)
2. Create a variable in the GUI, say called fuelSchedule, of type Schedule but with initial value null. Create a separate timeout event, say called fuelScheduleTrigger, in User Control mode, with action as:
// Perform the appropriate action (dependent on activeAction)
doAppropriateScheduleAction();
// Set the event to retrigger at the next schedule on/off switch time
fuelScheduleTrigger.restartTo(fuelSchedule.getTimeOfNextValue());
(Being in User Control mode, this event isn't yet triggered to initially fire, which is what we want.)
3. Code a set of functions for each of the different action alternatives; let's say
there are only 2 (fuelAction1 and fuelAction2) here as an example. Code
doAppropriateScheduleAction as:
if (activeAction == 1) {
fuelAction1();
}
else if (activeAction == 2) {
fuelAction2();
}
4. In your code which reads the config file and gets updated schedule info.
(presumably run from a cyclic timeout event or similar), have this replace
fuelSchedule with a new instance with the revised schedule pattern (as you've
been doing), set activeAction appropriately, and then reset the timeout event to
the new fuelSchedule.getTimeOfValue() time:
[set up fuelSchedule and activeAction]
// Reset schedule action to match revised schedule
fuelScheduleTrigger.restartTo(fuelSchedule.getTimeOfNextValue());
I think this works OK in the edge case when the new Schedule had its next 'flip' at the time
you set it up. (If you restart an event to the current time, I think it schedules an event OK at the current time which will occur next if there are no other events also scheduled for the current time; actually, it will definitely occur next if you are using a LIFO simultaneous-time-scheduling regime---see my blog post.)
Alternative & AnyLogic Enhancement
An alternative is to create a 'full' schedule in the GUI with action as earlier. Your config file reading code can replace the underlying Schedule instance and then reset the internal AnyLogic-generated timeout event. However, this is less preferable because you are relying on an internally-named AnyLogic event (which might also change in future AnyLogic releases, breaking your code).
AnyLogic could help this situation by adding a method to the Schedule API that gets the related timeout event; e.g., getActionTriggeringEventTimeout(). Then you would be able to 'properly' restart it and the Schedule API would make much clearer that the Schedule was always associated with an EventTimeout that did the triggering for the action.
Of course, AnyLogic could also go further by changing Schedule to allow scheduling details to be changed dynamically (and internally handling the required updates to the timeout event if it continued to be designed like that), but that's a lot more work and there may be deeper technical reasons why they wanted the schedule pattern to be fixed once the Schedule is initialised.
Any AnyLogic support staff reading?

System.Reactive.Concurrency.DefaultScheduler

In my application I've written all of my Rx code to use Scheduler.Default.
I wanted to know if there's a difference between specifying Scheduler.Default and not specifying a scheduler at all?
What is the strategy employed by System.Reactive.Concurrency.DefaultScheduler?
Rx uses an appropriate strategy dependent on the platform specific PlatformServices that are loaded - hence you can have a different approach in different cases. The OOB implementation looks at whether Threads are available on your platform, and if so uses Threads and the platform Timer implementation to schedule items, otherwise it uses Tasks. The later case arises in Windows 8 Apps, for example.
You can find a good video about how platform services are implemented from the creator here: http://channel9.msdn.com/Shows/Going+Deep/Bart-De-Smet-Rx-20-RTM-and-RTW
Look here for information about how the built-in operators behave when you do and don't specify a scheduler: http://msdn.microsoft.com/en-us/library/hh242963(v=vs.103).aspx
Yes there is a difference between specifying Scheduler.Default and not specifying a scheduler. Using Scheduler.Default will introduce asynchronous and possibly concurrent behavior, while not supplying a scheduler leaves it up to the discretion of the operator. Some operators will choose to execute synchronously while others will execute asynchronously, while others will choose to jump threads.
It is probably a bad idea (for performance and possibly even correctness since too much concurrency might lead you into a deadlock situation) to supply Scheduler.Default to every Rx operator. If you do not have a specific scheduling requirement, then do not supply a scheduler and let the operator pick what it needs.
For example,
this will complete synchronously:
int result = 0;
Observable.Return(42).Subscribe(v => result = v);
result == 42;
while this will complete asynchronously (and likely on another thread):
int result = 0;
Observable.Return(42, Scheduler.Default).Subscribe(v => result = v);
result == 0;
// some time later
result == 42;