Notify completion of argo workflow - event-handling

I have a use case where I am triggering argo workflow from a python application. However, I need a mechanism from argo workflow that it should notify my python application when the workflow execution is completed. I am already using a pub sub mechansim in my python application. So want my python app to subscribe to a redis queue and take action once workflow publishes a message on this queue informing its completion.
This is the interaction flow I am looking for
Workflow ——-> Redis queue ——> Python app
Thanks for help

you can use Argo Workflows's ExitHandler:
https://argoproj.github.io/argo-workflows/variables/#exit-handler
https://github.com/argoproj/argo-workflows/blob/master/examples/exit-handlers.yaml

Related

How to run an existing Argo Workflow after completion of another Argo workflow?

So I have a Argo workflow_B that needs to run after completion of Argo workflow_A which is used by another team. Both workflows already exist I just want to chain them together.
How can I achieve that?
Is it possible to do such thing using exit-handler?
or should I use event-source like Webhook or AWS SNS to do that?
Have you looked at the workflow of workflows pattern?
If team A is able to create workflow on team b namespace
https://github.com/argoproj/argo-workflows/blob/master/examples/workflow-of-workflows.yaml

Github Actions Concurrency Queue

Currently we are using Github Actions for CI for infrastructure.
Infrastructure is using terraform and a code change on a module triggers plan and deploy for changed module only (hence only updates related modules, e.g 1 pod container)
Since auto-update can be triggered by another github repository push they can come relatively on same time frame, e.g Pod A Image is updated and Pod B Image is updated.
Without any concurrency in place, since terraform holds lock, one of the actions will fail due to lock timeout.
After implementing concurreny it is ok for just 2 on same time pushes to deploy as second one can wait for first one to finish.
Yet if there are more coming, Githubs concurreny only takes into account last push for queue and cancels waiting ones (in progress one can still continue). This is logical from single app domain perspective but since our Infra code is using difference checks, by passing deployments on canceled job actually bypasses and deployment!.
Is there a mechanism where we can queue workflows (or even maybe give a queue wait timeout) on Github Actions ?
Eventually we wrote our own script in workflow to wait for previous runs
Get information on current run
Collect previous non completed runs
and wait until completed (in a loop)
If exited waiting loop continue
on workflow
Tutorial on checking status of workflow jobs
https://www.softwaretester.blog/detecting-github-workflow-job-run-status-changes

What is the current recommended approach to manage/stop a spring-batch job?

We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).

Is there a way to manually retry a step in Argo DAG workflow?

Argo UI shows a "Retry" button for DAG workflows but if a step fails and I use it to retry it always fails to retry. Is manual retry even supported in Argo?

Creating a queue system with Argo Workflows

I am trying to figure out how to set up a work queue with Argo. The Argo Workflows are computationally expensive. We need to plan for many simultaneous requests. The workflow items are added to the work queue via HTTP requests.
The flow can be demonstrated like this:
client
=> hasura # user authentication
=> redis # work queue
=> argo events # queue listener
=> argo workflows
=> redis + hasura # inform that workflow has finished
=> client
I have never build a K8s cluster that exceeds its resources. Where do I limit the execution of workflows? Or does Argo Events and Workflows limit these according to the resources in the cluster?
The above example could probably be simplified to the following, but the problem is what happens if the processing queue is full?
client
=> argo events # HTTP request listener
=> argo workflows
Argo Workflows has no concept of a queue, so it has no way of knowing when the queue is full. If you need queue control, that should happen before submitting workflows.
Once the workflows are submitted, there are a number of ways to limit resource usage.
Pod resources - each Workflow step is represented by a Kubernetes Pod. You can set resource requests and limits just like you would with a Pod in a Deployment.
Step parallelism limit - within a Workflow, you can limit the number of steps running concurrently. This can help when a step is particularly resource-intensive.
Workflow parallelism limit - you can limit the number of workflows running concurrently by configuring them to us a semaphore.
There are a number of other performance optimizations like setting Workflow and Pod TTLs and offloading YAML for large Workflows to a DB instead of keeping them on the cluster.
As far as I know, there is no way to set a Workflow limit so that Argo will reject additional Workflow submissions until more resources are available. This is a problem if you're worried about Kubernetes etcd filling up with too many Workflow definitions.
To keep from blowing up etcd, you'll need another app of some kind sitting in front of Argo to queue Workflows submissions until more resources become available.