I have a 5 steps Argo-workflow:
step1: create an VM on cloud
step2: do some work
step3: do some more work
step4: do some further work
step5: delete the VM
All the above steps are time consuming. And for whatever reasons, a running workflow might be stopped or terminated by issuing the stop/terminate command.
What I want to do is, if the stop/terminate command is issued at any stage before step4 is started, I want to directly jump to step4, so that I can clean up the VM created at step1.
Is there any way to achieve this?
I was imagining it can happen this way:
Suppose I am at step2 when the stop/terminate signal is issued.
The pods running at step2 gets a signal that the workflow is going to be stopped.
The pods stop doing their current work and outputs a special string telling the next steps to skip
So step3 sees the outputs from step2, skips its work and passes it on to step4 and so on.
step5 runs irrespective of the input and deletes the VM.
Please let me know if something like this is achievable.
It sounds like step 5 needs to be run regardlessly, which is what exit handler is for. Here is an example. Exit handler would be executed when you 'stop' at any step, but would be skipped if you terminated the entire workflow.
Related
I am new to spring-batch, got few questions:-
I have got a question about the restart. As per documentation, the restart feature is enabled by default. What I am not clear is do I need to do any extra code for a restart? If so, I am thinking of adding a scheduled job that looks at failed processes and restarts them?
I understand spring-batch-admin is deprecated. However, we cannot use spring-cloud-data-flow right now. Is there any other alternative to monitor and restart jobs on demand?
The restart that you mention only means if a job is restartable or not .It doesn't mean Spring Batch will help you to restart the failed job automatically.
Instead, it provides the following building blocks for developers for achieving this task on their own :
JobExplorer to find out the id of the job execution that you want to restart
JobOperator to restart a job execution given a job execution id
Also , a restartable job can only be restarted if its status is FAILED. So if you want to restart a running job that was stop running because of the server breakdown , you have to first find out this running job and update its job execution status and all of its task execution status to FAILED first in order to restart it. (See this for more information). One of the solution is to implement a SmartLifecycle which use the above building blocks to achieve this goal.
I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.
I am facing issue with preserving the outcome from my last test execution.
e.g.
I have a Test Plan 'Relase2.0' and the assigned configuration is 'Win 10'
Now I run this test plan and I could see how many have passed, failed, blocked, not run.
Now I go and create a new configuration and assign this Test plan to the newly added configuration.
Now I switch back to my previous configuration from step 1
When I check the outcome from the step 2 execution, I see that I have lost those and I see all those in Not Run status.
Question: How do I preserve the execution from step 2.?
I have a job in Rundeck with many tasks within, but when some task fails I have to duplicate de Job, remove all the other tasks, save it and then run this new reduced copy of my original job.
Is there a way to run only specific tasks without having to do all this workaround?
Thanks in advance.
AFAIK there is no way to do that.
As a workaround, you can simply add options for every step in your Rundeck job, so for instances, if you have 3 script steps in your job, you can add 3 options named: skip_step_1, skip_step_2 and skip_test_3 and then assign true to the ones that have finished successfully and false to the one that has failed in the first execution. And for every script step, you can add a condition whether to run it or not.
A smiliar feature request is already proposed to the rundeck team :
Optionally execute workflow step based on job options
I created a simple process by Applicaiton Lab interface in Bluemix Workload Scheduler. I ran my process, but the step didn't proceed and remained in queued status.
How can I proceed the step?
I executed the process by "Run now". The process doesn't have triggers
The step remains "Queued status".
The Process information
There is only one step. The step is "ping www.ibm.com"
Process doesn't have trigger. It is an on-demand process.
There might be a problem with the agent as I can successfully run a simple workload process without any issues. If you are using the Workload Automation Agent that is created for you then you will need to open a support ticket to have the Workload team look at that agent.
reviewing your question I think that a process submitted to the Workload Scheduler service should be a process which will complete: a ping command like the one you are trying to submit, will never complete if not 'killed' using CTRL+C (or called with [-c count] option)