Dynamically create Job in Spring Batch - spring-batch

Is it possible to create Spring Batch Job dynamically as not a bean?
I have created a lot of readers, writers, processors and another tasklets and I would like to have a possibility to build Job at runtime from these parts.
I have some Job descriptions files in my xml-based format, saved in some directory. These Job descriptions can contain dynamic information about Job, for example, what reader and writer chose for this job.
When the program starts, these files are parsed, and corresponding Jobs must be created.
I think to implement it like this:
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private ApplicationContext context;
public Job createJob(MyXmlJobConfig jobConfig) {
// My predefined steps in context
Step initStep = context.getBean("InitStep", Step.class);
Step step1 = context.getBean("MyFirstStep", Step.class);
Step step2 = context.getBean("MySecondStep", Step.class);
//......
// Mix these steps to build job
JobBuilder jobBuilder = jobBuilderFactory.get("myJob");
SimpleJobBuilder simpleJobBuilder = jobBuilder.start(initStep);
// Any logic of steps mixing and choosing
if(jobConfig.somePredicate())
simpleJobBuilder.next(step1);
else
simpleJobBuilder.next(step2);
//.........
//.......
return simpleJobBuilder.build();
}
Usage example:
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
MyXmlJobConfig config = getConfigFromFile(); // Loading config from file
MyCustomJobBuilder myCustomJobBuilder = context.getBean(MyCustomJobBuilder.class);
Job createdJob = myCustomJobBuilder.createJob(config);
jobLauncher.run(createdJob, new JobParameters());
Is this approach of job building correct? Note that the createdJob is not a bean. Will not it break anything of Spring Batch behind the scenes?

Spring Batch uses the Spring DI container and related facilities quite extensively. Proxying beans that are job or step scoped is just one example. The whole parsing of an XML based definition results in BeanDefinitions. Can you build a Spring Batch job without making it a bean? Sure. Would I recommend it? No.
Do keep in mind that there are ways of dynamically creating child ApplicationContext instances that you can have a job in. Spring Batch Admin and Spring XD both took advantage of this feature to dynamically create instances of Spring Batch jobs. I'd recommend this approach over having the job not part of an ApplicationContext in the first place.

Related

Nested Query in spring batch processing

I want to create an ETL process using Spring Batch, the steps will be read from a DB(s) and insert in one DB so basically I'm collecting similar information from different DB and inserting them in one DB, I have a large complex query that I need to run on those DBs and the result will be inserted in the so called one DB for later processing, my main concert is that I want to reference this query in the JpaPagingItemReader for example, it there a way I can for example add this query in my project as .sql file and then reference it in the reader?
Or any other solution I can follow?
Thank you
it there a way I can for example add this query in my project as .sql file and then reference it in the reader? Or any other solution I can follow?
You can put your query in a properties file and inject in your reader, something like:
#Configuration
#EnableBatchProcessing
#PropertySource("classpath:application.properties")
public class MyJob {
#Bean
public JpaPagingItemReader itemReader(#Value("${query}") String query) {
return new JpaPagingItemReaderBuilder<>()
.queryString(query)
// set other reader properties
.build();
}
// ...
}
In this example, you should have a property query=your sql query in application.properties. This is actually the regular Spring property injection mechanism, nothing Spring Batch specific here.

Bulkindexing JPA Entities modified during Spring transaction to Elasticsearch index

I have an JPA Entity Class that is also an Elasticsearch Document. The enviroment is a Spring Boot Application using Spring Data Jpa and Spring Data Elasticsearch.
#Entity
#Document(indexname...etc)
#EntityListeners(MyJpaEntityListener.class)
public class MyEntity {
//ID, constructor and stuff following here
}
When an instance of this Entity gets created, updated or deleted it gets reindexed to Elasticsearch. This is currently achieved with an JPA EntityListener which reacts on PostPersist, PostUpdate and PostRemove events.
public class MyJpaEntityListener {
#PostPersist
#PostUpdate
public void postPersistOrUpdate(MyEntity entity) {
//Elasticsearch indexing code gets here
}
#PostRemove
public void postPersistOrUpdate(MyEntity entity) {
//Elasticsearch indexing code gets here
}
}
That´s all working fine at the moment when a single or a few entities get modified during a single transaction. Each modification triggers a separate index operation. But if a lot of entities get modified inside a transaction it is getting slow.
I would like to bulkindex all entities that got modified at the end (or after commit) of a transaction. I took a look at TransactionalEventListeners, AOP and TransactionSynchronizationManager but wasn´t able to come up with a good setup till now.
How can I collect all modified entities per transaction in an elegant way without doing it per hand in every service method myself?
And how can I trigger a bulkindex at the end of a transaction with the collected entities of this transaction.
Thanks for your time and help!
One different and in my opinion elegant approach, as you don't mix your services and entities with elasticsearch related code, is to use spring aspects with #AfterReturning in the service layer transactional methods.
The pointcut expression can be adjusted to catch all the service methods you want.
#Order(1) guaranties that this code will run after the transaction commit.
The code below is just a sample...you have to adapt it to work with your project.
#Aspect
#Component()
#Order(1)
public class StoreDataToElasticAspect {
#Autowired
private SampleElasticsearhRepository elasticsearchRepository;
#AfterReturning(pointcut = "execution(* com.example.DatabaseService.bulkInsert(..))")
public void synonymsInserted(JoinPoint joinPoint) {
Object[] args = joinPoint.getArgs();
//create elasticsearch documents from method params.
//can also inject database services if more information is needed for the documents.
List<String> ids = (List) args[0];
//create batch from ids
elasticsearchRepository.save(batch);
}
}
And here is an example with a logging aspect.

Spring Batch Reader for distributed DB2 database

I am trying to write a job using Spring batch framework. Job needs to get data from a clustered db2 database, call some logic on each fetched record and then store transformed data in same db ( different table than from where it was read). I am trying to write step1 as below,
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(100).reader(reader)
.processor(processor).writer(writer).build();
}
Currently, I face two challenges due to database being DB2 and being clustered,
1.
SQLs provided for meta data at -
/org/springframework/batch/core/schema-db2.sql doesn't work for
distributed DB2. It fails on command , constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY) .
Queries written in this file can be
tweaked to distributed db2 or I can create tables manually too but I am
not sure if I should create tables manually? if that will have some
further complications?
I need all these tables because I wanted to used Spring batch for its PAUSE , RESTART functionalities.
2.
We need to fire all SELECT queries on DB2
with READ ONLY WITH UR SO
question.
If we don't run queries with this keyword, db can get locked.
Problem in point # 2 is that I can't use in built reader classes of Spring Batch (JdbcPagingItemReader etc )as those doesn't support this db2 specific keyword.
By reading useless simple examples on Internet that explain advantages of this framework, I thought that I will be up and running in a very short period but it looks I have to write own query provider classes, research meta data sqls and what not if db happens to be DB2 and distributed.
Has anybody implemented similar job for distributed Db2 database and guide me on above points?
I guess, to solve point # 1 , I will create tables manually since I have confirmed in another question that tables will not get dropped automatically so recreation will not be needed. One time manual activity should be enough.
and I will solve point # 2 by specifying isolation levels at transaction level so WITH UR in SELECT queries will not be needed,
#Autowired
private DataSource dataSource;
#Bean
public TransactionTemplate transactionTemplateUR(){
TransactionTemplate txnTemplate = new TransactionTemplate();
txnTemplate.setIsolationLevelName("ISOLATION_READ_UNCOMMITTED");
txnTemplate.setTransactionManager(txnManager);
return txnTemplate;
}
#Bean
public PlatformTransactionManager txnManager(DataSource dataSource){
DataSourceTransactionManager txnManager = new DataSourceTransactionManager();
txnManager.setDataSource(dataSource);
return txnManager;
}

Performance problems using a Spring Batch FieldSetMapper to map into an object that will be written with a JpaItemWriter?

We are writing a set of Spring Batch jobs that read values from text files and use that information to update objects that are read and written from the database using JPA. These jobs are not run in a web container, but on an application server. My problem seems to be with how the EntityManager is configured.
The code reads files from various vendors that update an order's status. The text file specifies the customer by name and the order by date/time. If the customer doesn't exist, the line from the text file is skipped. If the order exists, we update it. If not, then we create it.
We currently use DeltaSpike to get instances of our DAO objects like this:
DependentProvider<CustomerDaoImpl> provider = BeanProvider.getDependent(CustomerDaoImpl.class);
ICustomerDao custDao = provider.get();
I cache the DAO objects in my mapper so I am only getting them once. But every call to BeanProvider.getDependent() creates a new EntityManager through "Spring Batch Magic." The EntityManager is specified thusly:
#Configuration
public class BaseBatchConfiguration {
#Bean
#Produces
public EntityManager entityManager() {
Map<String, String> properties = new HashMap<String, String>();
properties.put("hibernate.connection.url", System.getProperty("JDBC_URL"));
properties.put("hibernate.default_schema", System.getProperty("APP_SCHEMA"));
properties.put("hibernate.connection.username", System.getProperty("APP_DB_ID"));
properties.put("hibernate.connection.password", System.getProperty("APP_DB_PWD"));
EntityManagerFactory emf = Persistence.createEntityManagerFactory(System.getProperty("PU_NAME"), properties);
return emf.createEntityManager();
}
}
I tried caching the EntityManager, but a new instance of the BaseBatchConfiguration class is used every time. This means that each DAO gets created with it's own EntityManager, so no real object caching is taking place across DAOs (reading a customer with the CustomerDaoImpl isn't cached and used when OrderDaoImpl loads an order that references that same customer).
This is causing a lot of unwanted object loading as we process through the text file.
Is there some other way we should be declaring our EntityManager?

Spring Data JPA - Using #Transactional in a CDI environment instead of Spring environment

I realized after writing this question I could sum it up in a few sentences. How can I manage transactions in Spring-Data-JPA with CDI the same way you would by using #Transactional in Spring itself?
First thing I did was set up Spring Data JPA CDI based on the documentation here. http://static.springsource.org/spring-data/data-jpa/docs/current/reference/html/jpa.repositories.html#jpd.misc.cdi-integration
I set this up and it is working fine for read operations but not write operations
For Example, Their example in the docs would work fine.
List<Person> people = repository.findAll();
So I have the basic setup complete.
Written by hand may have typos. This is similar to the code I execute.
#Inject
UserRepository userRepository;
User user;
#Transactional
public void signUpUserAction() {
userRepository.saveAndFlush(user);
}
Then I receive this error
Caused by: javax.persistence.TransactionRequiredException: no transaction is in progress
At first I realized I did not have the #Transactional so I added it and still did not work.(I believe in spring you need to use the AOP xml file to set up #Transactional so it makes sense this does not work in EE out of the box, I just do not know how to make it work.)
FYI annotating with this does not work
#TransactionAttribute(TransactionAttributeType.REQUIRED)
Something I tried while I was writing this post and I got it to work sort of... but I don't like the code and am still interested in using #Transactinoal, this code feels dirty, I'm pretty sure #Transactional handles calling other methods that are transactional in a clean way while this code would not.
This saves and I verify it's in the database.
#Inject
EntityManager em;
#Inject
UserRepository userRepository;
private User user;
public void signUpUserAction() {
em.getTransaction().begin();
userRepository.saveAndFlush(user);
em.getTransaction().commit();
}
So in short, how can I use #Transactional or something similar to manage my transactions?
Thank you for any help.
If you run Spring Data in a CDI environment, you're not running a Spring container at all. So you'll need to use EJB session beans to work with the repositories as CDI currently does not have support for transactions out of the box. The CDI extensions shipping with Spring Data is basically providing an entry point into the JavaEE world and you'll use the standard transaction mechanisms you can use in that environment.
So you either inject a repository into an #Stateless bean directly or you inject the CDI bean into one. This will allow you to use EJB transaction annotations on the EJB then.
for everyone who have this question yet.
I have this experimental project that support #Transactional in a CDI environment.
This project uses a custom code of Narayana as interceptor and provide compatibility with it and Spring Data Jpa implementation.
Key points to take in consideration:
Custom (Spring Data) Cdi Configuration -> add a Custom Transactional Post Processor custom spring data cdi configuration
Implement a custom Transactional Post Processor:
sample of a Custom Transactional Post Processor
Implement a custom Transactional Interceptor sample of a custom transactional interceptor
Add a Cdi Producer for your custom Tx Interceptor cdi producers
Create your custom repository fragments using #Transactional (JTA) custom fragments
Compose your Repository interface extending Repository interface and your fragments with #NoRepositoryBean annotation custom repositories
Take a look at this link that have some tips:
tips
Regards,