Spring Batch Implementation with Cassandra database - spring-batch

I'm trying to change existing spring batch job(XML config) which reads data from oracle database and write into txt AND XML file in required format but now I want to change same implementation to read data from Cassandra database instead of oracle but I don't see any Item Reader available similer to JdbcCursorItemReader in spring batch for Cassandra db.
Can someone tell me which ItemReader should i use to read data from Cassandra db? OR Do I need to create a custom ItemReader to read data from Cassandra db?

You can create a CustomItemReader
public class CustomItemReader implements ItemReader<List<YOUR_DOMAIN_OBJECT>> {
#PostConstruct
public void init() throws IOException {
//establish cassandra db connection
}
#Override
public List<YOUR_DOMAIN_OBJECT> read()
throws Exception{
//user cassandra connection to read data and build List<YOUR_DOMAIN_OBJECT>
return data;
}
}

Related

Nested Query in spring batch processing

I want to create an ETL process using Spring Batch, the steps will be read from a DB(s) and insert in one DB so basically I'm collecting similar information from different DB and inserting them in one DB, I have a large complex query that I need to run on those DBs and the result will be inserted in the so called one DB for later processing, my main concert is that I want to reference this query in the JpaPagingItemReader for example, it there a way I can for example add this query in my project as .sql file and then reference it in the reader?
Or any other solution I can follow?
Thank you
it there a way I can for example add this query in my project as .sql file and then reference it in the reader? Or any other solution I can follow?
You can put your query in a properties file and inject in your reader, something like:
#Configuration
#EnableBatchProcessing
#PropertySource("classpath:application.properties")
public class MyJob {
#Bean
public JpaPagingItemReader itemReader(#Value("${query}") String query) {
return new JpaPagingItemReaderBuilder<>()
.queryString(query)
// set other reader properties
.build();
}
// ...
}
In this example, you should have a property query=your sql query in application.properties. This is actually the regular Spring property injection mechanism, nothing Spring Batch specific here.

ItemReader for records returned by CrudRepository

I have a spring batch application wherein reader reads from an external db and processor transforms it to the POJO of my destination db , writer will write the transformed POJO to the destination db
I am using following CrudRepository
public interface MyCrudRepository extends CrudRepository<MyDbEntity, String> {
List<MyDbEntity> findByPIdBetween(String from, String to);
List<MyDbEntity> findByPIdGreaterThan(String from);
}
I wanted to know , how the ItemReader for above would look like?
Should I call myCrudRepository.findByPidBetween(String from, String to) in #PostConstruct of my ItemReader ?
Wouldnt that make the ItemReader static? As each job run would have different method parameter for findByPidBetween.
How should ItemReader be structured for above problem?
I wanted to know , how the ItemReader for above would look like?
RepositoryItemReader is what you need. You can use it with your repository and specify the method to use to read items. You can find an example here
each job run would have different method parameter for findByPidBetween
You can pass those as parameters to your job and use them in your reader.

Bulkindexing JPA Entities modified during Spring transaction to Elasticsearch index

I have an JPA Entity Class that is also an Elasticsearch Document. The enviroment is a Spring Boot Application using Spring Data Jpa and Spring Data Elasticsearch.
#Entity
#Document(indexname...etc)
#EntityListeners(MyJpaEntityListener.class)
public class MyEntity {
//ID, constructor and stuff following here
}
When an instance of this Entity gets created, updated or deleted it gets reindexed to Elasticsearch. This is currently achieved with an JPA EntityListener which reacts on PostPersist, PostUpdate and PostRemove events.
public class MyJpaEntityListener {
#PostPersist
#PostUpdate
public void postPersistOrUpdate(MyEntity entity) {
//Elasticsearch indexing code gets here
}
#PostRemove
public void postPersistOrUpdate(MyEntity entity) {
//Elasticsearch indexing code gets here
}
}
That´s all working fine at the moment when a single or a few entities get modified during a single transaction. Each modification triggers a separate index operation. But if a lot of entities get modified inside a transaction it is getting slow.
I would like to bulkindex all entities that got modified at the end (or after commit) of a transaction. I took a look at TransactionalEventListeners, AOP and TransactionSynchronizationManager but wasn´t able to come up with a good setup till now.
How can I collect all modified entities per transaction in an elegant way without doing it per hand in every service method myself?
And how can I trigger a bulkindex at the end of a transaction with the collected entities of this transaction.
Thanks for your time and help!
One different and in my opinion elegant approach, as you don't mix your services and entities with elasticsearch related code, is to use spring aspects with #AfterReturning in the service layer transactional methods.
The pointcut expression can be adjusted to catch all the service methods you want.
#Order(1) guaranties that this code will run after the transaction commit.
The code below is just a sample...you have to adapt it to work with your project.
#Aspect
#Component()
#Order(1)
public class StoreDataToElasticAspect {
#Autowired
private SampleElasticsearhRepository elasticsearchRepository;
#AfterReturning(pointcut = "execution(* com.example.DatabaseService.bulkInsert(..))")
public void synonymsInserted(JoinPoint joinPoint) {
Object[] args = joinPoint.getArgs();
//create elasticsearch documents from method params.
//can also inject database services if more information is needed for the documents.
List<String> ids = (List) args[0];
//create batch from ids
elasticsearchRepository.save(batch);
}
}
And here is an example with a logging aspect.

Spring Batch Reader for distributed DB2 database

I am trying to write a job using Spring batch framework. Job needs to get data from a clustered db2 database, call some logic on each fetched record and then store transformed data in same db ( different table than from where it was read). I am trying to write step1 as below,
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(100).reader(reader)
.processor(processor).writer(writer).build();
}
Currently, I face two challenges due to database being DB2 and being clustered,
1.
SQLs provided for meta data at -
/org/springframework/batch/core/schema-db2.sql doesn't work for
distributed DB2. It fails on command , constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY) .
Queries written in this file can be
tweaked to distributed db2 or I can create tables manually too but I am
not sure if I should create tables manually? if that will have some
further complications?
I need all these tables because I wanted to used Spring batch for its PAUSE , RESTART functionalities.
2.
We need to fire all SELECT queries on DB2
with READ ONLY WITH UR SO
question.
If we don't run queries with this keyword, db can get locked.
Problem in point # 2 is that I can't use in built reader classes of Spring Batch (JdbcPagingItemReader etc )as those doesn't support this db2 specific keyword.
By reading useless simple examples on Internet that explain advantages of this framework, I thought that I will be up and running in a very short period but it looks I have to write own query provider classes, research meta data sqls and what not if db happens to be DB2 and distributed.
Has anybody implemented similar job for distributed Db2 database and guide me on above points?
I guess, to solve point # 1 , I will create tables manually since I have confirmed in another question that tables will not get dropped automatically so recreation will not be needed. One time manual activity should be enough.
and I will solve point # 2 by specifying isolation levels at transaction level so WITH UR in SELECT queries will not be needed,
#Autowired
private DataSource dataSource;
#Bean
public TransactionTemplate transactionTemplateUR(){
TransactionTemplate txnTemplate = new TransactionTemplate();
txnTemplate.setIsolationLevelName("ISOLATION_READ_UNCOMMITTED");
txnTemplate.setTransactionManager(txnManager);
return txnTemplate;
}
#Bean
public PlatformTransactionManager txnManager(DataSource dataSource){
DataSourceTransactionManager txnManager = new DataSourceTransactionManager();
txnManager.setDataSource(dataSource);
return txnManager;
}

spring-data-mongodb. How can i dynamically create a database in mongo using spring-data-mongodb library?

spring-data-mongodb. How can i dynamically create a database in mongo using spring-data-mongodb library?
I am trying to use Spring-Mongodb-Data module for CRUD operations against Mongo database and going through examples and articles my assumption is that databasename should be pre-defined in spring context xml when defining MongoTemplate bean.
In my case I have an multi-tenant application that will accept requests over http and my application should create the mongodatabase on-the-fly and use the name provided in the input http request to create the database and then load the data into collection in the newly created database.
I am trying to figure out if there is a way to dynamically populate the databasename in MongoTemplate or MongoRepository without having to provide it in spring context.xml?
Please help me.
Thanks
-RK
Have you tried the following instead of going through the pre-defined spring context configuration.
MongoTemplate getMongoTemplate(Mongo mongo, String database) {
return new MongoTemplate(mongo, database);
}