Is there a Re-Try option for cadence or temporal from the frontend for the failed workflows? - cadence-workflow

Is there a Re-Try option for cadence or temporal from the front-end for the failed workflows?

Related

How to synchronize data between multiple Orions

For example, I have built Orion and DB in AWS and GCP respectively and want to synchronize data between each other.
I registered a subscription to each Orion specifying "/v2/op/nority" of the other Orion, but when there is a communication failure, the failed subscription notification does not remain, so I could not synchronize even if the failure is resolved.
Is there any synchronization method that assumes a communication failure period due to failure of the respective service?

Pending decisison tasks never picked for execution and eventually times out in uber cadence workflow

What could be the reason for the decision tasks to not get picked for execution in cadence cluster. They remains at pending state and finally times out. I dont see any error logs. How do I debug this ?
It’s very likely that there is no worker available and actively polling tasks for the tasklist.
Best way to confirm is to click on the tasklist naming in the webUI and see what are the workers behind the tasklist. Since it’s decision task, you should check the decision handler for the tasklist.
You can also use CLI to describe the tasklist to give the same information:
cadence tasklist desc —-tl <tasklist name>
In some extremely rare cases(I personally never seen but heard that happened in Uber with large scale cluster) that cadence server lost the task. In that case you can use CLI to either regenerate the task, or reset the workflow to unblock the workflow:
To regenerate task:
cadence adm wf refresh-tasks -w <wf id>
To reset:
cadence wf reset —-reset_type LastDecisionCompleted -w <wf id>

Is there a way to configure retries for Azure DevOps pipeline tasks or jobs?

Currently I have a OneBranch DevOps pipeline that fails every now and then while restoring packages. Usually it fails because of some transient error like a socket exception or timeout. Re-trying the job usually fixes the issue.
Is there a way to configure a job or task to retry?
Azure Devops now supports the retryCountOnTaskFailure setting on a task to do just this.
See this page for further information:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/pipelines/sprint-195-update
Update:
Automatic retries for a task was added and when you read this it should be available for usage.
It can be used as follow:
- task: <name of task>
retryCountOnTaskFailure: <max number of retries>
...
Here are a few things to note when using retries:
The failing task is retried immediately.
There is no assumption about the idempotency of the task. If the task has side-effects (for instance, if it created an external resource partially), then it may fail the second time it is run.
There is no information about the retry count made available to the task.
A warning is added to the task logs indicating that it has failed before it is retried.
All of the attempts to retry a task are shown in the UI as part of the same task node.
Original answer:
There is no way of doing that with native tasks. However, if you can script then you can put such logic inside.
You could do this for instance in this way:
n=0
until [ "$n" -ge 5 ]
do
command && break # substitute your command here
n=$((n+1))
sleep 15
done
However there is no native way of doing this for regular tasks.
Automatically retry a task in on roadmap so it could change in near future.

Is there a way to manually retry a step in Argo DAG workflow?

Argo UI shows a "Retry" button for DAG workflows but if a step fails and I use it to retry it always fails to retry. Is manual retry even supported in Argo?

Workflow Exception - Retry handling - Adobe AEM/CQ

Workflow Processes throw WorkflowException in case of failure, there is a setting in Web Console Apache Sling Job Default Queue. In this max retries is set to 10 on failure.
Now on failure, workflow is retried 10 more times on failure. So if a workflow if having step for example Version Creation, 10 more versions are created of resource.
I could think of following solutions
Set the max retries count on failure to 0 in Apache Sling Job Default Queue. Is it fine to do this?
Replace OOTB Version Creation process with custom process and add check for retries probably by saving flag in workflow metadata.
Version Creation process is taken as example here, it could be any other process which is doing some other functionality, that would also be tried 10 more times on failure. Has anyone faced similar situation?
It is not advisable to make it zero. Some workflow needs to be retried, for example activation workflow, when there were network issues or publish boxes were down etc. Your settings would totally bypass this safety mechanism.
I would prefer your second method as an alternative. org.apache.sling.event.jobs.Jobs has getRetryCount().