Transaction Management in Remote chunk processing (at worker side) In Spring Batch

Transaction Management in Remote chunk processing (at worker side) In Spring Batch - spring-batch

We are trying to use Remote chunking design for our implementation. Our requirement is to process all items in a chunk in one DB transaction(one DB commit per chunk). I do not see any option to achieve this in remote chunking.
In remote chunk processing at worker side ChunkProcessorChunkHandler is directly calling the process method of Simple Chunk processor, hence I do not see any option to use transaction manager/transaction here .
Can some one help out how to achieve this.

If you want a transaction on the worker side, you can make the ChunkProcessorChunkHandler transactional either declaratively by using Spring AOP via annotations or XML, or programmatically by extending the class and calling handleChunk in a transaction with a TransactionTemplate.

Related

JdbcPagingItemReader with SELECT FOR UDATE and SKIP LOCKED

I have a multi instance application and each instance is multi threaded.
To make each thread only process rows not already fetched by another thread, I'm thinking of using pessimistic locks combined with skip locked.
My database is PostgreSQL11 and I use Spring batch.
For the spring batch part I use a classic chunk step (reader, processor, writer). The reader is a jdbcPagingItemReader.
However, I don't see how to use the pessimist lock (SELECT FOR UPDATE) and SKIP LOCKED with the jdbcPaginItemReader. And I can't find a tutorial on the net explaining simply how this is done.
Any help would be welcome.
Thank you

I have approached similar problem with a different pattern.
Please refer to
https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#remoteChunking
Here you need to break job in two parts:
Master
Master picks records to be processed from DB and sent a chunk as message to queue task-queue. Then wait for acknowledgement on separate queue ack-queue once it get all acknowledgements it move to next step.
Slave
Slave receives the message and process it.
send acknowledgement to ack-queue.

SpringBoot batch listener mode vs non-batch listener mode

I am just curious does batch listener mode in Spring Kafka gives better performance than non-batch listener mode?
If we are handling exceptions then we still need to process each record in Batch-listener mode. Non-batch seems less error prone, stable and customizable .
Please share your views on this as I didn't find any good comparison.

It completely depends on what your listener is doing with the data.
If it processes each record in a loop then there is no benefit; you might as well just let the container iterate over the collection and send the listener one record at-a-time.
Batch mode will improve performance if you are processing the batch as a whole - e.g. a batch insert using JDBC in a single transaction.
This will often run much faster than storing one record at-a-time (using a new transaction for each record) because it requires fewer round trips to the DB server.

can spring batch be used as job framework for non batch jobs (regular job)

Is it possible to use spring batch as a regular job framework?
I want to create a device service (microservice) that has the responsibility
to get events and trigger jobs on devices. The devices are remote so it will take time for the job to be complete, but it is not a batch job (not periodically running or partitioning large data set).
I am wondering whether spring batch can still be used a job framework, or if it is only for batch processing. If the answer is no, what jobs framework (besides writing your own) are famous?
Job Description:
I need to execute against a specific device a job that will contain several steps. Each step will communicate with a device and wait for a device to confirm it executed the former command given to it.
I need retry, recovery and scheduling features (thought of combining spring batch with quartz)
Regarding read-process-write, I am basically getting a command request regarding a device, I do a little DB reads and then start long waiting periods that all need to pass in order for the job/task to be successful.
Also, I can choose (justify) relevant IMDG/DB. Concurrency is outside the scope (will be outside the job mechanism). An alternative that came to mind was akka actors. (job for a device will create children actors as steps)

As far as I know - not periodically running or partitioning large data set are not primary requirements for usage of Spring Batch.
Spring Batch is basically a read - process - write framework where reading & processing happens item by item and writing happens in chunks ( for chunk oriented processing ) .
So you can use Spring Batch if your job logic fits into - read - process - write paradigm and rest of the things seem secondary to me.
Also, with Spring Batch , you should also evaluate the part about Job Repository . Spring Batch needs a database ( either in memory or on disk ) to store job meta data and its not optional.
I think, you should put more explanation as why you need a Job Framework and what kind of logic you are running that you are calling it a Job so I will revise my answer accordingly.

How does Spring Batch transaction management work?

I'm trying to understand how Spring Batch does transaction management. This is not a technical question but more of conceptual one: what approach does Spring Batch use and what are the consequences of that approach?
Let me try to clarify this question a bit. For instance, looking at the TaskletStep, I see that generally a step execution looks something like this:
several JobRepository transactions to prepare the step metadata
a business transaction for every chunk to process
more JobRepository transactions to update the step metadata with the results of chunk processing
This seems to make sense. But what about a failure between 2 and 3? This would mean the business transaction was committed but Spring Batch was unable to record that fact in its internal metadata. So a restart would reprocess the same items again even though they have already been committed. Right?
I'm looking for an explanation of these details and the consequences of the design decisions made in Spring Batch. Is this documented somewhere? The Spring Batch reference guide has very few details on this. It simply explains things from the application developer's point of view.

There are two fundamental types of steps in Spring Batch, a Tasklet Step and a chunk based step. Each has it's own transaction details. Let's look at each:
Tasklet Based Step
When a developer implements their own tasklet, the transactionality is pretty straight forward. Each call to the Tasklet#execute method is executed within a transaction. You are correct in that there are updates before and after a step's logic is executed. They are not technically wrapped in a transaction since rollback isn't something we'd want to support for the job repository updates.
Chunk Based Step
When a developer uses a chunk based step, there is a bit more complexity involved due to the added abilities for skip/retry. However, from a simple level, each chunk is processed in a transaction. You still have the same updates before and after a chunk based step that are non-transactional for the same reasons previously mentioned.
The "What if" scenario
In your question, you ask about what would happen if the business logic completed but the updates to the job repository failed for some reason. Would the previously updated items be re-processed on a restart. As in most things, that depends. If you are using stateful readers/writers like the FlatFileItemReader, with each commit of the business transaction, the job repository is updated with the current state of what has been processed (within the same transaction). So in that case, a restart of the job would pick up where it left off...in this case at the end, and process no additional records.
If you are not using stateful readers/writers or have save state turned off, then it is a bit of buyer beware and you may end up with the situation you describe. The default behavior in the framework is to save state so that restartability is preserved.

How to design spring batch to avoid long queue of request and restart failed job

I am writing a project that will be generating reports. It would read all requests from a database by making a rest call, based on the type of request it will make a rest call to an endpoint, after getting the response it will save the response in an object and save it back to the database by making a call to an endpoint.
I am using spring-batch to handle the batch work. So far what I came up with is a single job (reader, processor, writer) that will do the whole things. I am not sure if this is the correct design considering
I do not want to queue up requests if some request is taking a long time to get a response back. [not sure yet]
I do not want to hold up saving response until all the responses are received. [using commit-internal will help]
If the job crashes for some reason, how can I restart the job [maybe using batch-admin will help but what are my other options]

By using chunk oriented processing Reader, Processor and Writer get executed in order until Reader has nothing to return.
If you can read one item at a time, process it and send it back to the endpoint that handles the persistence this approach is handy.
If you must read ALL the information at once the reader will get a big collection with all items and pass it to processor. The processor will process all the items and send the result to the writer. You cannot send just a few to the writer so you would have to do the persistence directly from processor and that would be against the design.
So, as I understand this, you have two options:
Design a reader that can read one item at a time. Use the chunk oriented processing that you already started to read one item, process it and send it back for persistence. Have a look at how other readers are implemented (like JdbcCursorItemReader).
You create a tasklet that reads the whole collection of items process it and sends them back for processing. You can break this in different tasklets.
commit-interval only controls after how many items transaction is commited. So it will not help you as all the processing and persistence is done by calling rest services.

I have figured out a design and I think it will work fine.
As for the questions that I asked, following are the answers:
Using asynchronous processors will help avoiding any queue.
http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors
using commit-internal will solve it
This thread has the answer - Spring batch :Restart a job and then start next job automatically