I want to create an ETL process using Spring Batch, the steps will be read from a DB(s) and insert in one DB so basically I'm collecting similar information from different DB and inserting them in one DB, I have a large complex query that I need to run on those DBs and the result will be inserted in the so called one DB for later processing, my main concert is that I want to reference this query in the JpaPagingItemReader for example, it there a way I can for example add this query in my project as .sql file and then reference it in the reader?
Or any other solution I can follow?
Thank you
it there a way I can for example add this query in my project as .sql file and then reference it in the reader? Or any other solution I can follow?
You can put your query in a properties file and inject in your reader, something like:
#Configuration
#EnableBatchProcessing
#PropertySource("classpath:application.properties")
public class MyJob {
#Bean
public JpaPagingItemReader itemReader(#Value("${query}") String query) {
return new JpaPagingItemReaderBuilder<>()
.queryString(query)
// set other reader properties
.build();
}
// ...
}
In this example, you should have a property query=your sql query in application.properties. This is actually the regular Spring property injection mechanism, nothing Spring Batch specific here.
Related
Spring Data documentation describes how to create collection with given $jsonSchema, and how to perform a validation query.
Is there a way to update $jsonSchema for an existing collection? MongoTemplate.createCollection() for an existing one results in MongoCommandException with error code 48 (collection exists), schema is not being updated.
Ok, looks like there is no ready-to-use method in Spring Data, but it is pretty simple to implement:
<T> void updateSchema(MongoTemplate template, Class<T> entityClazz, MongoJsonSchema schema) {
template.executeCommand(new Document(Map.of(
"collMod", template.getCollectionName(entityClazz),
"validator", schema.toDocument()
)));
}
Also keep in mind that default readWrite role is not enough, user needs to have collMod privilege.
I have a spring batch application wherein reader reads from an external db and processor transforms it to the POJO of my destination db , writer will write the transformed POJO to the destination db
I am using following CrudRepository
public interface MyCrudRepository extends CrudRepository<MyDbEntity, String> {
List<MyDbEntity> findByPIdBetween(String from, String to);
List<MyDbEntity> findByPIdGreaterThan(String from);
}
I wanted to know , how the ItemReader for above would look like?
Should I call myCrudRepository.findByPidBetween(String from, String to) in #PostConstruct of my ItemReader ?
Wouldnt that make the ItemReader static? As each job run would have different method parameter for findByPidBetween.
How should ItemReader be structured for above problem?
I wanted to know , how the ItemReader for above would look like?
RepositoryItemReader is what you need. You can use it with your repository and specify the method to use to read items. You can find an example here
each job run would have different method parameter for findByPidBetween
You can pass those as parameters to your job and use them in your reader.
I intend to implement the batch to read data from various DB tables to populate the complex domain below, then perform calculation in processor and load the data into DB via writer.
public class A{
private String id;
private String name;
private ArrayList list1;
private ArrayList list2;
......
}
Now, I am stuck at the design of the reader. The idea is to query DB table to get a list of id, then query the other fields including list1 and list 2 based on each id. It seems the existing reader can not fulfill this requirement, do I need to create custom reader to achieve the goal? I think I would take chunk approach, but has no clue how to implement it.
Code example is much appreciated.
You can use the driving query pattern. The reader reads only the IDs and then the processor can query the details of each object based on the ID.
This is a common pattern and you can find more details about it in the common batch patterns section of the documentation here: https://docs.spring.io/spring-batch/4.0.x/reference/html/common-patterns.html#drivingQueryBasedItemReaders
I am trying to write a job using Spring batch framework. Job needs to get data from a clustered db2 database, call some logic on each fetched record and then store transformed data in same db ( different table than from where it was read). I am trying to write step1 as below,
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> reader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(100).reader(reader)
.processor(processor).writer(writer).build();
}
Currently, I face two challenges due to database being DB2 and being clustered,
1.
SQLs provided for meta data at -
/org/springframework/batch/core/schema-db2.sql doesn't work for
distributed DB2. It fails on command , constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY) .
Queries written in this file can be
tweaked to distributed db2 or I can create tables manually too but I am
not sure if I should create tables manually? if that will have some
further complications?
I need all these tables because I wanted to used Spring batch for its PAUSE , RESTART functionalities.
2.
We need to fire all SELECT queries on DB2
with READ ONLY WITH UR SO
question.
If we don't run queries with this keyword, db can get locked.
Problem in point # 2 is that I can't use in built reader classes of Spring Batch (JdbcPagingItemReader etc )as those doesn't support this db2 specific keyword.
By reading useless simple examples on Internet that explain advantages of this framework, I thought that I will be up and running in a very short period but it looks I have to write own query provider classes, research meta data sqls and what not if db happens to be DB2 and distributed.
Has anybody implemented similar job for distributed Db2 database and guide me on above points?
I guess, to solve point # 1 , I will create tables manually since I have confirmed in another question that tables will not get dropped automatically so recreation will not be needed. One time manual activity should be enough.
and I will solve point # 2 by specifying isolation levels at transaction level so WITH UR in SELECT queries will not be needed,
#Autowired
private DataSource dataSource;
#Bean
public TransactionTemplate transactionTemplateUR(){
TransactionTemplate txnTemplate = new TransactionTemplate();
txnTemplate.setIsolationLevelName("ISOLATION_READ_UNCOMMITTED");
txnTemplate.setTransactionManager(txnManager);
return txnTemplate;
}
#Bean
public PlatformTransactionManager txnManager(DataSource dataSource){
DataSourceTransactionManager txnManager = new DataSourceTransactionManager();
txnManager.setDataSource(dataSource);
return txnManager;
}
spring-data-mongodb. How can i dynamically create a database in mongo using spring-data-mongodb library?
I am trying to use Spring-Mongodb-Data module for CRUD operations against Mongo database and going through examples and articles my assumption is that databasename should be pre-defined in spring context xml when defining MongoTemplate bean.
In my case I have an multi-tenant application that will accept requests over http and my application should create the mongodatabase on-the-fly and use the name provided in the input http request to create the database and then load the data into collection in the newly created database.
I am trying to figure out if there is a way to dynamically populate the databasename in MongoTemplate or MongoRepository without having to provide it in spring context.xml?
Please help me.
Thanks
-RK
Have you tried the following instead of going through the pre-defined spring context configuration.
MongoTemplate getMongoTemplate(Mongo mongo, String database) {
return new MongoTemplate(mongo, database);
}