How to resume a argo-workflow only if it is already in suspended state? - argo-workflows

I can suspend a workflow using argo suspend (Suspend), and I can resume the workflow again by argo-resume (Resume)
However, while resuming, argo-workflows makes no checks on whether the workflow is already in a suspended state. How can this be imposed from the client side?
In summary, I only want to resume a workflow if it has already gone into a suspended state. If it has not gone into suspended state, I will wait for the workflow to get suspended, and resume only thereafter.
I tried using workflow.Status.Phase (Status) to check the state of workflow before resuming, however, the Phase string only has "Running" field, which does not differentiate between a running workflow and a suspended workflow.(Phase String code)

I think to be checked values of type and state.
If running, type is Suspend and state is Running.
If Task completed, Value of state will change as succeeded value (e.g)
curl https://localhost:2746/api/v1/workflows/argo -H "Authorization: $ARGO_TOKEN" -k | jq '.items[].status.nodes[] | select(.type=="Suspend" and .phase=="Running")'

Related

How to revoke a task properly considering its state could happen to shift

I'm developing the service for customer's orders monitoring, using MongoDB as the standalone backend for tracking and storage of celery tasks' state. So far, it works well, refreshing and displaying the state of all the tasks submitted by a current customer, e.g. STARTED, SUCCESS, FAILURE.
The monitoring UI could be some format as following.
+----------+--------------+-------------+--------------------+
|task_id |created_at |status |operation |
|----------|--------------|-------------|--------------------|
|[uuid] |[timestamp] |[STARTED] |[DOWNLOAD] [DELETE] |
|[uuid] |[timestamp] |[SUCCESS] |[DOWNLOAD] [DELETE] |
|[uuid] |[timestamp] |[RECEIVED] |[DOWNLOAD] [DELETE] |
|[uuid] |[timestamp] |[FAILURE] |[DOWNLOAD] [DELETE] |
|... |... |... |... |
+----------+--------------+-------------+--------------------+
Now I want to implement this [DELETE] utility, which means the customer could revoke a task being executed, via a HTTP request. Considering state of a task could happed to switch into a SUCCESSor FAILURE or other state if there's a latency of request due to HTTP overhead, is it proper to use app.control.revoke(task_id, terminate=True) ?
UPDATED:
Now I configure worker_state_db='/var/run/celery/worker.state.db' in config file of celery for persistent revokes, and stick to app.control.revoke(..., terminate=True). Is it a right option ? I did realize how this revoke command works when I found related answers here.
Which is the best way to programatically terminate (cancel) a celery task
Revoke a task from celery
Celery Task Custom tracking method
Because the service couldn't know target task state when revoke command being broadcasted, it could be as following.
scenario1: target task state is SUCCESS or FAILURE:
worker node is executing another new task (just say task aaa) when being revoked, and will restart executing task aaa. So I have to synchronize REVOKED status of target task into MongoDB without usage of task_revoked signal
scenario2: target task state is RECEIVED or STARTED or other :
worker node is executing this target task, and task_revoked signal would be triggered. But I failed to use this task_revoked signal to synchronize task status into MongoDB. Thus I tried to manually update MongoDB in the same way of scenario1 given a reply received from app.control.revoke(..., terminate=True, reply=True). But I still got a problem as following.
[# ERROR/MainProcess] Task handler raised error: Terminated(15)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/billiard/pool.py", line 1774, in _set_terminated
raise Terminated(-(signum or 0))
billiard.exceptions.Terminated: 15
How to solve this problem ? And please correct me if I still got something wrong.
I have it implemented the way you describe. There are a couple of things to note:
If you send the revoke command and the task is already done (either in SUCCESS or FAILURE status), the revoke command won't do anything and simply reply that it doesn't know the task. (And more importantly it also won't terminate the child worker process.) See the code for yourself.
You have to be cautious when using terminate=True, though. Depending on how you configure Celery, the worker that's executing your task might have already started another task when the revoke command is sent. There's an explicit warning in the documentation.
On the other hand, once the Celery worker starts to process your task and you really need to cancel the processing, revoking and killing the worker process is the only option.

Force stop Azure App Service Deployment Slot Swap

We are using Azure DevOps to deploy to a staging slot and then swap with production.
When there is an issue swapping it will keep trying for nearly 30 minutes.
Therefore I would like to put a timeout on the swap task, but if I do that it will stop the task in DevOps and leave the process happening in Azure.
I would like a way to force stop the process through a CLI, API, PowerShell or DevOps task.
Azure CLI doesn't seem to have anything
Kudu API can delete deployments but doesn't look to stop them (https://github.com/projectkudu/kudu/wiki/REST-API#deployment)
I have read that you can stop a process, but using a Linux Container App Service, I can't see that option. Azure-Web-sites: How to cancel a deployment?
Is there a way?
Please try to use the following command in Azure PowerShell task to cancel a pending swap:
Invoke-AzResourceAction -ResourceGroupName [resource group name] -ResourceType Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action resetSlotConfig -ApiVersion 2015-07-01
Here is the document and my sample, I added -Force -Confirm:$false at the end of the command:
Update
If any errors occur in the target slot (for example, the production slot) after a slot swap, restore the slots to their pre-swap states by swapping the same two slots immediately.
So, we don't need to stop it, just wait swap operation succeed.
When you submit swap slot request, you will get HttpStatus 202 code. On portal, when you click swap button, you will find that the browser has been requesting the url of location to get the status of swap.
As for when it ends, we can check the swap operation by polling.
If the swap operation time is too long, it is recommended to raise a support ticket and ask the engineer to check the reason.
Previous
You can use AzureAppServiceManage task.
Azure App Service Manage task
Use this task to start, stop, restart, slot swap, Swap with Preview, install site extensions, or enable continuous monitoring for an Azure App Service.
Tips
When you use rest api to swap slot, you can check location in response header.
When you submit swap slot request, you will get HttpStatus 202 code. On portal, when you click swap button, you will find that the browser has been requesting the url of location to get the status of swap.

How to restart a timeouted/failed/canceled/terminated cadence workflow

Hi I have a workflow with 2 activities.
Scenario. one activities is completed while executing second, URL link to which I need to communicate is down. Now when that URL is up workflows is timed out. So How can I restart Timed out workflow?
This question is inspired by a Github issue.
Cadence allows you easily restart any workflow that is already closed in any status: timeouted/failed/canceled/terminated. And even closed with success status: completed or continuedAsNew.
You can use CLI command to reset a workflow to LastDecisionCompleted or FirstDecisionCompleted resetType. LastDecisionCompleted will reset to the last point if you want to save the progress of workflows has made before timeouted. FirstDecisionCompleted will reset to beginning of workflow, like really restarting by saving only the start parameters of workflow:
./cadence workflow reset -w <wid> --reset_type LastDecisionCompleted --reason "some_reason"
If you have many workflows to reset, and your Cadence cluster has advanced visibility feature enabling your search/filter workflows by query, then you can also use batch-reset command like this:
./cadence workflow reset-batch --query <query> --reset_type LastDecisionCompleted --reason "some_reason"
You can also use reset API in client sdk to reset a workflow to a particular point.
Reset is one of the most powerful feature in Cadence for operation. Not only resetting to FirstDecisionCompleted and LastDecisionCompleted, you can easily manipulate workflow to go back to any point of time like using time machine. There are more resetTypes supported if you use "--help" to read the command manual.

Scheduled Job & Balloon tip

I am looking for a way to run a job on a schedule and also alert the user to that running job. Specifically, I am using PowerShell to manage a computer lab scenario, and between sessions I want to refresh the environment, clean off the desktop, reset shortcuts pinned to the task bar for the next session, etc. But I want to warn anyone sitting at the machine that this is about to happen. However, my scripts that use Balloontips very successfully as regular scripts don't work as scheduled jobs. They run, and I have verified they run as the user in question, by creating a Scheduled Job that rights a text file to the user desktop. But Balloon Tips don't actually appear. Is there some secret to getting this to work, or is this a form of "interaction" that a scheduled job just can't do?
I also tried an alternative approach, launching the browser with a web page warning of the impending cleanup. That also didn't work. Suggesting some limits to what can be done as a Scheduled Job.
I would much rather go the very "integrated with the OS" route of the balloon tips, but for the life of me it seems like that just isn't an option. So, any other suggestions for providing user info by way of a scheduled job?
Since this runs in Session 0 where GUI interaction doesn't exist you must resort to some other mechanism.
You say this happens between sessions. You could show your ballon via another "notification script" that is executed from within your ScheduledJob. You have options here. For example:
Add entry to registry Run key that will self delete on run. Shows popup when user logs in (session change ? ). Entry executes posh script which parameters you could craft, i.e. (powershell -File notify.ps1 -ArgumentList "Operation bla bla..")
Add ScheduledTask that doesn't run in Session 0 (regular task). You need to do that only once. Every next time you run this job to show notification to the user from within main script via schtasks run or ScheduledTasks module depending on your system.
Add a ScheduledTask that check periodically EventLog for the input of your main script. The task would start on logon and subscribe to event log notifications. I don't like this as the script must run non-stop.

How to avoid multiple invoking of Background Task for every pending raw push

How to avoid invoking background task for every pending raw push delivered to application from wns server when user returns to connected state after being temporarily disconnected in windows runtime 8.1?
I just want to invoke my background task just once for all pending raw pushes?