SpringBatch seems to be lacking the metadata for the job definition in the database.
In order to create a job instance in the database, the only thing it considers is jobName and jobParamter, "JobInstance createJobInstance(String jobName, JobParameters jobParameters);"
But,the object model of Job is rich enough to consider steps and listeners. So, if i create a new version of the existing job, by adding few additional steps, spring batch does not distinguish it from the previous version. Hence, if i ran the previous version today and run the updated version, spring batch does not run the updated version, as it feels that previous run was successful. At present, it seems like, the version number of the job, should be part of the name. Is this correct understanding ?
You are correct that the framework identifies each job instance by a unique combination of job name and (identifying) job parameters.
In general, if a job fails, you should be able to re-run with the same parameters to restart the failed instance. However, you cannot restart a completed instance. From the documentation:
JobInstance can be restarted multiple times in case of execution failure and it's lifecycle ends with first successful execution. Trying to execute an existing JobIntance that has already completed successfully will result in error. Error will be raised also for an attempt to restart a failed JobInstance if the Job is not restartable.
So you're right that the same job name and identifying parameters cannot be run multiple times. The design framework prevents this, regardless of what the business steps job performs. Again, ignoring what your job actually does, here's how it would work:
1) jobName=myJob, parm1=foo , parm2=bar -> runs and fails (assume some exception)
2) jobName=myJob, parm1=foo , parm2=bar -> restarts failed instance and completes
3) jobName=myJob, parm1=foo , parm2=bar -> fails on startup (as expected)
4) jobName=myJob, parm1=foobar, parm2=bar -> new params, runs and completes
The "best practices" we use are the following:
Each job instance (usually defined by run-date or filename we are processing) must define a unique set of parameters (otherwise it will fail per the framework design)
Jobs that run multiple times a day but just scan a work table or something use an incrementer to pass a integer parameter, which we increase by 1 upon each successful completion
Any failed job instances must be either restarted or abandoned before pushing code changes that affect the the job will function
Related
I am new to spring-batch, got few questions:-
I have got a question about the restart. As per documentation, the restart feature is enabled by default. What I am not clear is do I need to do any extra code for a restart? If so, I am thinking of adding a scheduled job that looks at failed processes and restarts them?
I understand spring-batch-admin is deprecated. However, we cannot use spring-cloud-data-flow right now. Is there any other alternative to monitor and restart jobs on demand?
The restart that you mention only means if a job is restartable or not .It doesn't mean Spring Batch will help you to restart the failed job automatically.
Instead, it provides the following building blocks for developers for achieving this task on their own :
JobExplorer to find out the id of the job execution that you want to restart
JobOperator to restart a job execution given a job execution id
Also , a restartable job can only be restarted if its status is FAILED. So if you want to restart a running job that was stop running because of the server breakdown , you have to first find out this running job and update its job execution status and all of its task execution status to FAILED first in order to restart it. (See this for more information). One of the solution is to implement a SmartLifecycle which use the above building blocks to achieve this goal.
I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.
I am writing Spring batch for chunk processing.
I have Single Job and Single Step. Within that Step I have Chunks which are dynamically sized.
SingleJob -> SingleStep -> chunk_1..chunk_2..chunk_3...so on
Following is a case I am trying to implement,
If Today I ran a Job and only chunk_2 failed and rest of chunks ran successfully. Now Tomorrow I want to run/restart ONLY failed chunks i.e. in this case chunk_2. (I don't want to run whole Job/Step/Other successfully completed Chunks)
I see Spring batch allow to store metadata and using that it helps to restart Jobs. but I did not get if it is possible to restart specific chunk as discuss above.
Am I missing any concept or if it is possible then any pseudo code/theoretical explanation or reference will help.
I appreciate your response
That's how Spring Batch works in a restart scenario, it will continue where it left off in the previous failed run.
So in your example, if the in the first run chunk1 has been correctly processed and chunk2 failed, the next job execution will restart at chunk2.
I'm running a composed task with three child tasks.
Composed task definition:
composed-task-runner --graph='task1 && task2 && task3'
Launch command
task launch my-composed-task --properties "app.composed-task-runner.composed-task-arguments=arg1=a.txt arg2=test"
Scenario 1:
when the composed task runs without any error, the arguments are passed to all child tasks.
Scenario 2:
when the second child task fails and if the job is restarted , the composed task arguments are passed to second child task but not to third child task
Scenario 3 :
when the first and second tasks succeed and third child task fails and if the job is restarted , the composed task arguments are now passed to third child task.
Observation:
After a task failure and restart, the composed-task-arguments are passed only to the failed task and not to the tasks after that.
How the arguments are retrieved in the composed-task after job restart ? what could be the reason for this behavior ?
Version used :
Spring cloud local server - 1.7.3 , Spring boot - 2.0.4 , Spring cloud starter task - 2.0.0
The issue that you are experiencing is that SCDF is not storing the properties specified at launch time.
This issue is being tracked here: https://github.com/spring-cloud/spring-cloud-dataflow/issues/2807 and is scheduled to be fixed in SCDF 2.0.0
[Detail]
So when the job is restarted these properties are not submitted (since they are not currently stored) to the new CTR launch.
And thus subsequent tasks (after the failed task succeeds) will not have the properties set for them.
The reason that the failed job still has this value is that the arguments are stored in the batch-step-execution-context for that step.
[Work Around until Issue is resolved]
Instead of restarting the job, launch the CTR task definition using the properties (so long as they are the same).
When we run multiple concurrent jobs with different parameters, how can we control (stop, restart) the appropriate jobs? Our internal code provides the jobExecution object, but under the covers The jobOperator uses the job name to get the job instance.
In our case all of the jobs are from "do-stuff.xml" (okay, it's sanitized and not very original). After looking at the spring-batch source code, our concern is that if there is more then one job running and we stop a job it will take the most recently submitted job and stop it.
The JobOperator will allow you to fetch all running executions of the job using getRunningExecutions(String jobName). You should be able to iterate over that list to find the one you want. Then, just call stop(long executionId) on the one you want.
Alternatively, we've also implemented listeners (both at step and chunk level) to check an outage status table. When we want to implement a system-wide outage, we add the outage there and have our listener throw an exception to bring our jobs down. once the outage is lifted, all "failed" executions may be restarted.