I have multiple batch_step but currently it all starts processing synchronously i.e. batch_step2 should start only if batch_step1 is executed completely.
How to identify whether batch_step1 is finished processing and then start with batch_step2 and so on.
I am not sure what you want to achieve with the above logic. Batch processing is meant to process individual records by each batch step. Every record is passed through batch steps sequentially. batch step 2 for a record will be executed only after the batch step1 is completed. As per the mulesoft docs "Note that a batch job instance does not wait for all its queued records to finish processing in one batch step before pushing any of them to the next batch step".
Alternately you can have different batch with only one batch step. This will ensure that batch step 1 is completed for all records first and then next batch step is executed in next batch job.
Related
I have a requirement in Github-Action where i need to feed input after 1st job is finished.
I.e. The first job (Build) starts automatically after each commit without taking any input and it runs succesfully.
Then for second job to start(Deploy), based on input(Environment) i select the Deploy should start executing.
I have figured out the manual job execution from https://stackoverflow.com/a/73708545/2728619
But I need help for taking input after 1st job is finished.
For Manual trigger, i tried the solution https://stackoverflow.com/a/73708545/2728619
I am expeccting reading input after 1st job is run(not at the beginning of the workflow).
we have a requirement to process millions of records using spring batch . We have planned to use a Spring Batch to do this by reading the db using JdbcPagingItemReaderBuilder and process in chunks and write it to Kaafka Queue. The active consumers of the queue will process the chunks of data and update the db
The consumer task is to iterate every item from the chunk and invoke the external api's.
In case the external system is down or not responding with success response , there should be retries of atleast 3 times and considering that each task in the chunk has to do this, what would be the ideal approach?
Another use case to consider, what happens when the job is processing and the system goes down and say that the job has already processed 10000 record and the remaining records are yet to be processed . After the restart how to make sure the execution doesnt restart the entire process from beginning and to resume from the point of failure.
Spring Batch creates the following tables. You can use them to check the status of your job and customize your scheduler to behave in a way you see fit.
I'd use the step execution Id in BATCH_STEP_EXCECUTION to validate the status that's set and then retry based off on that status, Or something similar to that sense.
BATCH_JOB_EXECUTION
BATCH_JOB_EXECUTION_CONTEXT
BATCH_JOB_EXECUTION_PARAMS
BATCH_JOB_INSTANCE
BATCH_STEP_EXECUTION
jobOperator.restart(JobExecutionId)
starts a new job instance and don't resume job from the last chunk but just from last step. I need to resume the job from the last chunk written.
My reader is a custom RestReader that count first the total of items to process and then read this exact number from api. I'm using #StepScope annotation because I need custom variables in custom reader
Spring Batch restart functionality not working when using #StepScope.
Is it possible to resume the job from last chunk written or the problem is my custom reader?
Your RestReader must implement ItemStream. This is the contract that stateful item readers should implement. The ItemStream#update method will be called at chunk boundaries to save any contextual data required to restart from the last checkpoint in case of failure.
Am having 5 batch process in my flow and the batch process are running async I need to wait untill all the batch processs are over but once batch execute component is executed the payload moves to the next component where it requires a result from batch and it fails.How can I make it wait untill all the batch process are executed
#Satheesh,
Use a sessionVars or set static variable where you will use that as a flag or increment that. Then use an expression-filter where you will check if it already process all the batch processes.
https://docs.mulesoft.com/mule-user-guide/v/3.6/filters
I am maintaining a legacy application written using Spring Batch and need to tweak it to never lose data.
I have to read from various webservice (one for each step) and then write to a remote database. Things goes bad when connection with the DB drops because all itens read from webservice are discarded (can't read the same item twice), and the data is lost because can not be written.
I need to setup Spring Batch to keep already read data on one step to retry the writing operation next time the step runs. The same step can not read more data until the write operation is successfully concluded.
When not being able to write, the step should keep the read data and pass execution to the next step, after a while, when it's time to the failed step to run again, it should not read another item, retrying the failed writing operation instead.
The batch application should runs in an infinite loop and each step should gather data from one different source. Failed writing operations should be momentarily skipped (keeping the read data) to not delay others steps but should resume from the write operation next time they are called.
I am researching in various web sources aside from official docs, but Spring Batch hasn't the most intuitive docs I have come across.
Can this be achieved? If yes, how?
You can write the data you need to persist in case the job fails to the Batch Step's ExecutionContext. You can restart the job again with this data:
Step executions are represented by objects of the StepExecution class.
Each execution contains a reference to its corresponding step and
JobExecution, and transaction related data such as commit and rollback
count and start and end times. Additionally, each step execution will
contain an ExecutionContext, which contains any data a developer needs
persisted across batch runs, such as statistics or state information
needed to restart
More from: http://static.springsource.org/spring-batch/reference/html/domain.html#domainStepExecution
I do not know if this will be ok with you, but here are my thoughts on your configuration.
Since you have two remote sources that are open to failure, let us partition the overall system with two jobs (not two steps)
JOB A
Step 1: Tasklet
Check a shared folder for files. If files exist, do not proceed to the next step. Will be more understandable when writing about JOB B
Step 2: Webservice to files
Read from your web service and write results to flatfiles in the shared folder. Since you would be using flatfiles for the output, you will solve your "all items read from webservice are discarded and the data is lost because can not be written."
Use Quartz or equivalent for the scheduling of this job.
JOB B
Poll the shared folder for generated files and create a joblauncher with the file (file.getWhere as a jobparameter). Spring integration project may help in this polling.
Step 1:
Read from the file, write them to remote db and move/delete file if writing to db is successful.
No scheduling will be needed since job launching originates from polled in files.
Sample Execution
Time 0: No file in the shared folder
Time 1: Read from web service and write to shared folder
Time 2: Job B file polling occurs, tries to write to db.
If successfull, the system continues to execute.
If not, when Job A tries to execute on its scheduled time, it will skip reading from web service since files still exist in the shared folder. It will skip until Job B consumes the files.
I did not want to go into implementation specifics but Spring Batch can handle all of these situations. Hope that this helps.