Uber Cadence Workflow Version Change Compatibility

Uber Cadence Workflow Version Change Compatibility - cadence-workflow

I understand that I cannot make backward incompatible changes to Workflows per:
How to make changes or fixes to Uber Cadence Workflow without breaking determinism?
However I am not sure what "backward incompatible" means here. Can I simply not deploy new code without using getVersion, period? Or as long as the history tracking that is used for restoration is compatible between the 2 versions then I can update without getVersion? Or I can update without getVersion under some under conditions?

Generally anything that can change the way the history is generated is considered backwards incompatible. The following changes are backwards compatible:
Any activity implementation.
Duration passed to sleep and timer creation function.
Changing arguments to activities.
Changing activity options and retry policies.
Changing values of variables that don't affect the workflow execution path. For example if some variable accumulates some data and this data is only used in a query.
We recommend saving a few histories and then replaying them as part of unit tests to find out about non compatible changes earlier.

Related

Why is executionStartToCloseTimeoutSeconds required?

When using the Java client to start a workflow in Candece "executionStartToCloseTimeoutSeconds" is required on the Workflow. If I have a workflow that can run for an indeterminate amount of time, how do I get around this restriction?

That was a mistake to require this value. The new version of the platform I'm working on (temporal.io) defaults this value to infinity.

an indeterminate amount of time
First of all I believe indeterminate amount of time is not infinite amount of time.
As letting a workflow execution run and grow infinitely is anti-pattern in Cadence workflow. See Recommendation #5 in this article https://longquanzheng.github.io/cadence-lab/book/learnings/what-should-be-in-a-workflow-or-an-activity-in-cadence.html
A good timeout value can protect your workflow grow infinitely.
Because it's not recommended to let workflow run forever, as it will cause potential performance issue in both worker and server, the original idea is to enforce client providing a timeout value. We didn't provide defaults, as it's difficult to have a reasonable default for all use cases.
A too small default values will be even worse, because no one like workflow timeouted in production unexpectedly. Even though you can use "Reset" command to reopen it.
A too big default value, like Maxim suggests, is slightly better than too small values. But I personally disagree because that induces client forget thinking about how long the workflow will run, and how long the workflow history will grow. This will also turn out to be a production issue at some points later.
The biggest issue I see is that this required option is not friendly. It should be compiling error instead of running error. I think this is probably we can improve in Cadence -- if this is a required field, make it required at coding experience. At the same time provide some hardcoded fake "infinite" value to help some edge cases may also make sense.
Back to you question, I would suggest using some fake infinite value if you think it's indeterminate. An good example here is in Cadence system workflow: https://github.com/uber/cadence/blob/11547ee6db5dd306cb507b263381a6ea94c3faf1/service/worker/scanner/workflow.go#L48

Reading custom metrics from the last build for custom baseline comparisons

I'm planning to introduce linting into a rather massive code base. Fixing all existing issues beforehand is not possible, so seeing thousands of linter errors at start is inevitable.
I'd like to record the number of detected errors each time the build runs for master and treat this number as a success / failure threshold. If a new pull request does not exceed the current baseline, its pipeline passes and so the proposed change is good to go. However, if the number of errors increases, I'd like the pipeline to fail, thus preventing the merge.
This functionality I’ve described narrows down to writing variables to Azure DevOps servers as some side-effects of builds and also reading these values from the previous build. This looks very similar to comparing code coverage, however, I can't seem to find any docs on how to implement the read-write logic manually.
What pipeline task could I use? What else can I leverage to track a custom metric over a number of builds and compare the value with previous? To summarise, my ultimate goal is to gradually lower an arbitrary value from a large number to zero over the course of several months.

What does edge-based and level-based mean?

What does "level-based" and "edge-based" mean in general?
I read "In other words, the system's behavior is level-based rather than edge-based" from kubernetes documentation:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/api-conventions.md
with Google, I only find:
http://www.keil.com/forum/9423/edge-based-vs-level-based-interrupt/
Thank you.

It also has a more general definition (at least the way we tend to use it in the documentation). A piece of logic is "level based" if it only depends on the current state. A piece of logic is "edge-based" if it depends on history/transitions in addition to the current state.
"Level based" components are more resilient because if they crash, they can come back up and just look at the current state. "Edge-based" components must store the history they rely on (or depend on some other component that stores it), so that when they come back up they can look at the current state and the history. Also, if there is some kind of temporary network partition and an edge-based component misses some of the updates, then it will compute the wrong output.
However, "level based" components are usually less efficient, because they may need to scan a lot of state in order to compute an output, rather than just reading deltas.
Many components are a mixture of the two.
Simple example: You want to build a component that reports the number of pods in READY state. A level-based implementation would fetch all the pods from etcd (or the API server) and count. An edge-based implementation would do that once at startup, and then just watch for pods entering and exiting READY state.

I'd say they explained it pretty well on the site:
When a new version of an object is POSTed or PUT, the "spec" is updated and available immediately. Over time the system will work to bring the "status" into line with the "spec". The system will drive toward the most recent "spec" regardless of previous versions of that stanza. In other words, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the "status" to 3.
So from that statement, we know that "level base" means a PUT request does not need to be satisfied if the primary goal does not require it; it's free to skip PUT requests when seen fit.
This makes me assume that "edge based" systems require every PUT request to be satisfied, even if some requests could be skipped without altering the final result, since that would be the alternative to skipping requests.
I'm no RESTful developer (you can see by my account activity). I could not find any source of information for these things anywhere else, so I'm going based on the explanation they gave, which seems pretty straight forward.

The Kubernetes API doesn't store a history of all the changes made to an object. The controller that is responsible for that object cannot reliably observe each change; it may only observe the current state of the object.
The term "level" means the current state of the object, and the term "edge" means a transition to a new state. Controllers are "level-based" because they cannot reliably observe the transitions.
From the API conventions:
When a new version of an object is POSTed or PUT, the spec is updated and available immediately. Over time the system will work to bring the status into line with the spec. The system will drive toward the most recent spec regardless of previous versions of that stanza. For example, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the status to 3. In other words, the system's behavior is level-based rather than edge-based. This enables robust behavior in the presence of missed intermediate state changes.

Why should I combine code and tests in a single commit?

Let's say I write an atomic change of code. I also write some tests to make sure the change works and keeps working.
I think I should commit the code change together with the tests.
PRO:
This makes sure (as far as possible) that every changeset of the branch results in running code and passing tests.
It documents that these tests "belong to" the code change that I did.
CON:
I may want to backout the change without backing out the tests some time in the future
What reasons more are there for doing it one way or the other?
Has either strategy ever bitten you? How exactly?
Somewhat related: Should change to code be committed separately from corresonding change to test suite?

My only rule is to make sure that code builds prior to checking in.
Yes, ideally you want to check in atomic bits of code. However, how do you define atomic? And then what happens if, later, you have to change one of the bits? Your changes are no longer in one commit.

I think I should commit the code change together with the tests.
I never do, but I would, if I used a non-branching SCM.
In branching SCMs (I've worked with mercurial and git) it's a non-issue.
I use a different branch every time I start working or something; I many times have local branch commits that just commit outstanding changes (that don't even compile) when I move to something else for a while, or when I just need a backup point before I start a major code change). I know that most of those intermediary commits will never be checked out again, when I create them. They don't affect anything though and are really useful when needed.
I only merge branches back when I have code that compiles and tests that run successfully on it.

Why Rational Team Concert changes the files' last modified attribute?

I'm having some issues with the installation of Rational Team Concert on my server.
The thing is that when I upload some changes to the server (any kind), it changes the last modified attribute of the file, but it shouldn't do it.
Is there a way to avoid this behavior?
Thank you in advance!

This is something that we have tried to add to RTC SCM (and we still plan to). However, we found that it needs to be an option on load/update.
There are numerous details and discussions available # this work item on jazz.net

Regarding timestamp, getting over the fact that relying on it in a version control tool isn't always considered a best-practice (see "What's the equivalent of use-commit-times for git?"), it is actually a complex issue:
an SCM loader wouldn't use just timestamp to determined what file has changed (Task 179263)
you can have various requirements for that timestamp (like in Defect 159043, where the file timestamp of the modified file on disk that of when it was delivered, not when I accepted.). The variable JAZZ_CCM_SKIP_MOD_TIME=true is mentioned so check if that could improve your specific case.
it is all based on the assumption the timestamp is correctly set by the local workstation, which isn't always true, as illustrated in Task 77201

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse