Spring batch job execution context and step execution context clarification needed

Spring batch job execution context and step execution context clarification needed - spring-batch

I am using spring batch 3.0.3 and need some clarification about not serializing job execution context and step execution context as this we have large object sets and we dont want to persist them in spring batch tables. Is there anyway we can just store short_context and not serialized object?

By default, no because the ExecutionContext provides the data required for restartability. If you must do this (I'd encourage a different design), you'd have to implement your own ExecutionContextDao.
That being said, I'd encourage you not to go this route and to store your large object somewhere else. Even a Spring bean that is a Map that you want to use as a cache that is not maintained by the framework would be a better option IMHO.

Related

How to implement batch insert using spring-data-jdbc

is it possible to implement batch insert using spring-data-jdbc somehow? Or can i get access to JDBCTemplate using this spring-data realization?

There is currently no support for batch operations.
There are two issues requesting that one might want to follow if one is interested in that feature: https://jira.spring.io/browse/DATAJDBC-328 and https://jira.spring.io/browse/DATAJDBC-314
If one is working with Spring Data JDBC there will always be a NamedParameterJdbcTemplate in the application context so one can get that injected in order to perform batch operations without any additional configuration.

Can couchbase be used as the underlying JobRepository for spring-batch?

We have a requirement where we have to read a batch of a entitytype from the database, submit info about each entity to a service which will callback later with some data to update in the caller entity, save all the caller entities with the updated data. We thought of using spring-batch however we use Couchbase as our database which is eventually consistent and has no support for transactions.
I was going through the spring-batch documentation and I came across the Spring Batch Meta-Data ERD diagram here :
https://docs.spring.io/spring-batch/4.1.x/reference/html/index-single.html#metaDataSchema
With the above information in mind, my question is:
Can Couchbase be used as the underlying job-repository for spring-batch? What are the things I should keep in mind if its possible to use it? Any links to example implementations would be welcome.

The JobRepository needs to be transactional in order for Spring Batch to work properly. Here is an excerpt from the Transaction Configuration for the JobRepository section of the reference documentation:
The behavior of the framework is not well defined if the repository methods are not transactional.
Since Couchbase has no support for transactions as you mentioned, it is not possible to use it as an underlying datasource for the JobRepository.

Is it OK to globally set the mybatis executor mode to BATCH?

I am currently developing a Spring Boot app, which uses mybatis for its persistence layer. I want to optimize the batch insertion of entities in the following scenario:
// flightSerieMapper and legMapper are used to create a series of flights.
// legMapper needs to use batch insertion.
#Transactional
public FlightSerie add(FlightSerie flightSerie) {
Integer flightSerieId = flightSeriesSequenceGenerator.getNext();
flightSerie.setFlightSerieId(flightSerieId);
flightSerieMapper.create(flightSerie);
// create legs in batch mode
for (Leg leg : flightSerie.getFlightLegs()) {
Integer flightLegId = flightLegsSequenceGenerator.getNext();
leg.setLegId(flightLegId);
legMapper.create(leg);
}
return flightSerie;
}
mybatis is configured as follows in application.properties:
# this can be externalized if necessary
mybatis.config-location=classpath:mybatis-config.xml
mybatis.executor-type=BATCH
This means that mybatis will execute all statements in batch mode by default, including single insert/update/delete statements. Is this OK? Are there any issues I should be aware of?
Another approach would be to use a dedicated SQLSession specifically for the LegMapper. Which approach is the best (dedicated SQLSession vs global setting in application.properties)?
Note: I have seen other examples where "batch inserts" are created using a <foreach/> loop directly in the mybatis xml mapper file. I don't want to use this approach because it does not actually provide a batch insert.

As #Ian Lim said, make sure you annotate mapper methods with inserts and updates with #Flush annotation if you globally set executor type to BATCH.
Another approach would be to use a dedicated SQLSession specifically
for the LegMapper. Which approach is the best (dedicated SQLSession vs
global setting in application.properties)?
Keep in mind that if you are using different SQL sessions for different mappers there will be different transactions for each SQL session. If a service or service method annotated with #Transactional uses several mappers that use different SQL sessions it will allocate different SQL transactions. So it's impossible to do atomic data operation that involves mappers with different SQL sessions.

Spring Batch Execution Status Backed by Database

From the Spring Guides:
For starters, the #EnableBatchProcessing annotation adds many critical
beans that support jobs and saves you a lot of leg work. This example
uses a memory-based database (provided by #EnableBatchProcessing),
meaning that when it’s done, the data is gone.
How can I make the execution state backed by a database (or some other persistent record) so that, in case the application crashes, the job is resumed from the previous state?
My solution, until now, is having my ItemReader be an JdbcCursorItemReader which reads records from a table whose column X is not NULL, and my ItemWriter be a JdbcBatchItemWriter which updates the record with data on column X, making it non-null (so that it won't be picked on the next execution). However, this seems really hackish and I believe there's a more elegant way. Can anyone please shed some light?

When using the #EnableBatchProcessing annotation, if you provide a DataSource bean definition called dataSoure, Spring Batch will use that database for the job repository store instead of the in memory map. You can read more about this functionality in the documentation here: http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/core/configuration/annotation/EnableBatchProcessing.html

Spring batch - need to use an object in itemProcessor/itemWriter but not persist it

I need to access an object in both itemProcessor and itemWriter but I don't want to persist it in the executionContext. I would read this object in a pre-processing step.
What is the best way to do that?
So far what I have is - I put the object in the jobExecutionContext, then I set the scope of my itemProcessor to "step" and bind a property of the itemProcessor to "#{stepExecution.jobExecution.executionContext}". This does give me access to my object. But I am stuck at two issues with this solution:
When do I remove the object from the context so that it doesn't stay persisted, it has to be after all the items are done.
My object could be huge and it seems the column for the context is of size 2500.
Is this a good solution and if it is, how do I solve the two concerns mentioned above. And if not, is there a good way to do this in spring batch or is caching the best way to go?
Thanks.

execution/job/step ... Context uses by Spring batch are meant to be persisted in the metadata of spring batch for the restartable feature to name one!
What i have done previously is creating a normal spring bean with the object yo need and simply #autowired it in your Processor and writer!
Job Done.