Why is executionStartToCloseTimeoutSeconds required? - cadence-workflow

When using the Java client to start a workflow in Candece "executionStartToCloseTimeoutSeconds" is required on the Workflow. If I have a workflow that can run for an indeterminate amount of time, how do I get around this restriction?

That was a mistake to require this value. The new version of the platform I'm working on (temporal.io) defaults this value to infinity.

an indeterminate amount of time
First of all I believe indeterminate amount of time is not infinite amount of time.
As letting a workflow execution run and grow infinitely is anti-pattern in Cadence workflow. See Recommendation #5 in this article https://longquanzheng.github.io/cadence-lab/book/learnings/what-should-be-in-a-workflow-or-an-activity-in-cadence.html
A good timeout value can protect your workflow grow infinitely.
Because it's not recommended to let workflow run forever, as it will cause potential performance issue in both worker and server, the original idea is to enforce client providing a timeout value. We didn't provide defaults, as it's difficult to have a reasonable default for all use cases.
A too small default values will be even worse, because no one like workflow timeouted in production unexpectedly. Even though you can use "Reset" command to reopen it.
A too big default value, like Maxim suggests, is slightly better than too small values. But I personally disagree because that induces client forget thinking about how long the workflow will run, and how long the workflow history will grow. This will also turn out to be a production issue at some points later.
The biggest issue I see is that this required option is not friendly. It should be compiling error instead of running error. I think this is probably we can improve in Cadence -- if this is a required field, make it required at coding experience. At the same time provide some hardcoded fake "infinite" value to help some edge cases may also make sense.
Back to you question, I would suggest using some fake infinite value if you think it's indeterminate. An good example here is in Cadence system workflow: https://github.com/uber/cadence/blob/11547ee6db5dd306cb507b263381a6ea94c3faf1/service/worker/scanner/workflow.go#L48

Related

How to investigate time required to obtain lock - and why - within a procedure

I am stumped on an issue I am having. The true context is rather complicated, but I can boil it down to these functional points (everything else is not related to the problematic table):
I have a trigger function that contains several SELECTs and then an UPDATE
The update takes an unreasonable amount of time to execute ("unreasonable" = > 1.4s)
The same exact queries when run outside the trigger (for the same rows, parameters, etc.) do not have any issues (i.e. they execute in under 1-2ms)
I am pretty sure that indexes, etc., are working as necessary; i.e. there shouldn't be any issues.
There are no circular triggers
There is on trigger on the destination table, but even with that removed the behavior is the same.
I have done many tests to no avail, but these are pretty meaningful:
when the update is replaced with a SELECT, the response time is fast, as expected
when the update is replaced with a SELECT... FOR UPDATE, the response time is slow, the same as the update
^ this (as well as other things) has led me to possibly believe that the delay is spent waiting to achieve a lock
No other transactions are really happening on that table. I am truly bewildered.
Server context: This is being run in AWS/RDS on db.m5.xlarge.
What I am looking for is whether there is a way to get some information about locks that are happening mid-transaction or possibly even a history of acquired locks? Or anything else that can give me insight into what is causing the delay that seems so closely related to acquiring a lock on that table.
Unfortunately, just to make everything even more frustrating, I cannot replicate the issue when I attempt to use EXPLAIN in the function body. The only way to do this (that I know of) is to use the EXECUTE... syntax with a query string. That doesn't have a delay - its also useless for the trigger.

Reading custom metrics from the last build for custom baseline comparisons

I'm planning to introduce linting into a rather massive code base. Fixing all existing issues beforehand is not possible, so seeing thousands of linter errors at start is inevitable.
I'd like to record the number of detected errors each time the build runs for master and treat this number as a success / failure threshold. If a new pull request does not exceed the current baseline, its pipeline passes and so the proposed change is good to go. However, if the number of errors increases, I'd like the pipeline to fail, thus preventing the merge.
This functionality I’ve described narrows down to writing variables to Azure DevOps servers as some side-effects of builds and also reading these values from the previous build. This looks very similar to comparing code coverage, however, I can't seem to find any docs on how to implement the read-write logic manually.
What pipeline task could I use? What else can I leverage to track a custom metric over a number of builds and compare the value with previous? To summarise, my ultimate goal is to gradually lower an arbitrary value from a large number to zero over the course of several months.

Selenoid query priority

There is a question: Is there a possibility to set tests to perform priority in selenoid.
Problem: There is a suite> 20 tests, correspondingly at startup it fills the queue. After that, another test is run. He gets to the end of the line.
Is there an option to make it run as soon as the browser is freed, without waiting for all the tests to run before it?
No, this is not possible in current implementation. All incoming requests have equal priority. Two alternatives:
I think such issues should be addressed in test framework of you choice. For example for py.test a quick search shows a plugin for ordering your tests: https://github.com/ftobia/pytest-ordering Not sure whether it works.
You could also install Ggr and use different Selenoids and quota names for different tests, but this seems to be too much complicated for your case.

Anylogic: How to set Service delay time depending on the resourceSet being used

Basically I've got a Service which can work with two alternatives of ResourceSets. Let's say, the Service would optimally work with one Doctor and one Nurse, but it is also possible to work with only one Doctor if a Nurse isn't available.
Now, assuming the Doctor works slower without a Nurse, the Service's delay time must depend upon the resourceSet being employed at the moment (Doctor+Nurse or Doctor). Any idea how can I program this?
You should also have in mind that my model has various Services in parallel working in the same way, it's not just only one Service line.
Thanks!
You're using Services but, to me, using the combination of Seize, Delay and Release gives you more flexibility.
What I've done is set the resource choice according to the image bellow:
It is important to have the nurses prior to the doctors in the first set (for some reason anylogic would opt for using only the doctor if otherwise - even with a nurse available).
Than, I would write this code:
Which means that if the agent was only able to seize one resource it will take longer (15 is just a random value).
In the delay block, I would set the processing time to agent.processTime
The topology I'm using is this:
Obviously this is a workaround and will not work for every case. You can always change the conditions you verify. I couldn't find a way to check which resource set was picked by the seize operation. If you're in a hurry this will do the trick.
Hope that helps,
Luís

Matlab code taking a long time to run

I have a Matlab code (from a journal paper) and I'm trying to re-simulate their data.
I executed the code one week ago. I think the code is taking so long time to run. Matlab is still busy and taking 50% of my cpu.
I was wondering if the process has ended with some errors somewhere in the code. My question is:
When I see no errors, can I be sure that everything is fine with this running process? And I can wait until it is finished?
Is there any way to check which part of code is being run now ( without stopping the execution)?
Or I should stop the program and try something else?
Actually I don't want to loose this 1 week and if you think everything is fine, I would wait until the code stops.
(The authors of the paper didn't reply to my question and I don't know how long should it naturally take... They just mentioned it may take a long time to simulate the data).
Unfortunately, there is little we can do for you.
When I see no errors, can I be sure that everything is fine with this running process?
That's pretty much the definition of an error. If no error is raised, then it means that the program is still running.
Is there any way to check which part of code is being run now (without stopping the execution)?
Unfortunately no. For long-lasting execution times like that, a good developing practice is to display some information from time to time to inform the end user of the execution status.
However, if the programs produces files all along the way (like for instance at every step in an iterative simulation) you can check on your computer that the files are well-produced, and the production rate will more or less inform you on the total execution time.
For all your other questions, well, it's up to you to decide what to do (stop it or let it run). Be aware that the execution time can differ significantly from one machine to another, so the time it took on the author's machine may not be really informative to you.
In the future, I would advise you to react faster than within a week. When you launch a code that has a long execution time and see that there is no display within the first hour, you should stop it, modify it such that it regulatly displays information, and re-run it. It's better to loose one hour than one week.
Best,