I am working on a Spring batch application. This application read data from DB, process and send to the Kafka.
I need to read data from two table in parent-child relationship.
Like :
Parent :
- Id, Name
Child :
- Id, Name, Parent_Id
I am using JpaPagingItemReader. I am reading Parent table data from reader and Child data from process.
#Autowired
private JpaTransactionManager transactionManager;
#PersistenceContext
private EntityManager em;
public ItemStreamReader<Parent> reader() {
JpaPagingItemReader<Parent> itemReader = new JpaPagingItemReader<>();
try {
String sqlQuery = "SELECT * FROM PARENT";
JpaNativeQueryProvider<Parent> queryProvider = new JpaNativeQueryProvider<Parent>();
queryProvider.setSqlQuery(sqlQuery);
queryProvider.setEntityClass(Parent.class);
queryProvider.afterPropertiesSet();
itemReader.setEntityManagerFactory(em.getEntityManagerFactory());
itemReader.setPageSize(100);
itemReader.setQueryProvider(queryProvider);
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
}
catch (Exception e) {
System.out.println("BatchConfiguration.reader() ==> error " + e.getMessage());
}
return itemReader;
}
#Autowired
private ChildRepository childRepository;
#Bean
public ItemProcessor<Parent,ParentVO> opptyProcess() {
return new ItemProcessor<Parent, ParentVO>() {
#Override
public ParentVO process(Parent parent) throws JsonProcessingException {
ParentVO parentVO = new ParentVO();
parentVO.setId(parent.getId());
parentVO.setName(parent.getName());
List<Child> childList = childRepository.findByParentId(parent.getId());
if(childList != null && childList.size() > 0) {
for(Child child :childList) {
ChildVo childVO= new ChildVO();
childVO.setId(child.getId);
childVO.setName(child.getName());
childVO.setParentId(child.getParentId())
ParentVO.getChildList().add(childVO);
}
}
return parentVO;
}
};
}
#Bean
public Step step() {
return stepBuilderFactory.get("step1")
.<Parent, ParentVO>chunk(100)
.reader(reader())
.processor(process())
.writer(writer)
.taskExecutor(threadPool)
.transactionManager(transactionManager)
.throttleLimit(10)
.build();
}
I am testing this app with 20k records. The performance of this app is very slow. Every minute it can read/process/write only 100 records. If I comments the below line it takes 2 minutes to complete the job.
List<Child> childList = childRepository.findByParentId(parent.getId());
if(childList != null && childList.size() > 0) {
for(Child child :childList) {
ChildVo childVO= new ChildVO();
childVO.setId(child.getId);
childVO.setName(child.getName());
childVO.setParentId(child.getParentId())
ParentVO.getChildList().add(childVO);
}
}
What the other way I can do to get the Child table data and make this Job faster.
You are basically doing a join operation on the application side, and that's the cause of your performance issue.
The pattern you are implementing is similar to the driving query pattern, where a processor is used to enrich items returned by the reader. This pattern is known to suffer from the n+1 problem, which performs poorly in some circumstances.
I recommend you do the join on the database side. Relational database systems are well optimized for this kind of operations, and if your application grabs data already joined on the db side, you will notice a big performance boost.
Related
This is the Step
#Bean
public Step processEodBatchUpdateActualTableStep() {
log.debug("[processEodBatchJob] Start Update Process for Actual Table");
return stepBuilderFactory.get(JobConfigurationConstants.PROCESS_EOD_FILE_UPDATE_STEP_NAME)
.<ExtensionQRMerchantTrxHistEntity, TransactionHistoryExtEntity>chunk(1000)
.reader(updateItemReader())
.processor(new ExtensionToTrxnHistExtConverter(mapper))
.writer(new UpdateActualTable(dataSource).updateActualTable())
.build();
}
This is the reader
#Bean
public JdbcCursorItemReader<ExtensionQRMerchantTrxHistEntity> updateItemReader(){
log.info("[UPDATE Reader] Read all records from temp table");
JdbcCursorItemReader<ExtensionQRMerchantTrxHistEntity> reader = new JdbcCursorItemReader<>();
reader.setSql("SELECT * FROM ext_qr_merchant_trx_hist eqmth " +
"WHERE EXISTS " +
"(SELECT 1 FROM t_trxn_detail_ext ttde WHERE eqmth.trx_ref_no = ttde.ref_no AND eqmth.trx_amt = ttde.amount " +
"AND eqmth.trx_dt = ttde.trxn_date);");
reader.setDataSource(dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new RowMapper<ExtensionQRMerchantTrxHistEntity>() {
#Override
public ExtensionQRMerchantTrxHistEntity mapRow(#NonNull ResultSet rs, int rowNum) throws SQLException {
ExtensionQRMerchantTrxHistEntity entity = new ExtensionQRMerchantTrxHistEntity();
entity.setTransactionDate(rs.getTimestamp(1));
entity.setTransactionRefNo(rs.getString(2));
entity.setTransactionAmount(rs.getBigDecimal(3));
entity.setQrString(rs.getString(4));
return entity;
}
});
return reader;
}
This is the processor
#Slf4j
#RequiredArgsConstructor
public class ExtensionToTrxnHistExtConverter implements ItemProcessor<ExtensionQRMerchantTrxHistEntity, TransactionHistoryExtEntity> {
private final DuitNowRppDTOMapper mapper;
#Override
public TransactionHistoryExtEntity process(#NonNull ExtensionQRMerchantTrxHistEntity entity) throws Exception {
log.info("[Processor] Setting ExtensionQRMerchantTrxHistEntity to TransactionHistoryExtEntity");
return setTransactionHistory(entity);
}
private TransactionHistoryExtEntity setTransactionHistory(ExtensionQRMerchantTrxHistEntity tempEntity){
//Set output
TransactionHistoryExtEntity outputEntity =new TransactionHistoryExtEntity();
//Parse QR String
DuitNowRppDTO dto = mapper.mapFromQRDestination(tempEntity.getQrString());
//Set current date
Date now = new Date();
//Set field for Insert new record
UUID uuid = UUID.randomUUID();
outputEntity.setId(uuid);
outputEntity.setCreateDate(now);
outputEntity.setCreateBy(Constants.SYSTEM);
//Set field for updating record
outputEntity.setUpdateDate(now);
outputEntity.setUpdateBy(Constants.SYSTEM);
//replace field from temp table
outputEntity.setCurrencyCode(dto.getTransactionCurrencyCode());
outputEntity.setTransactionDate(tempEntity.getTransactionDate());
outputEntity.setReferenceNumber(tempEntity.getTransactionRefNo());
outputEntity.setAmount(tempEntity.getTransactionAmount());
return outputEntity;
}
}
This is the writer
#Slf4j
#RequiredArgsConstructor
public class UpdateActualTable {
private final DataSource dataSource;
public JdbcBatchItemWriter<TransactionHistoryExtEntity> updateActualTable() {
log.info("[Update] Using Batch Item Writer to UPDATE to Actual Table");
JdbcBatchItemWriter<TransactionHistoryExtEntity> itemWriter = new JdbcBatchItemWriter<>();
itemWriter.setDataSource(dataSource);
itemWriter.setSql("UPDATE t_trxn_detail_ext " +
"SET " +
"update_by = ?, update_dt = ? " +
"WHERE ref_no = ? AND amount = ? AND trxn_date = ?");
itemWriter.setItemPreparedStatementSetter((entity, preparedStatement) -> {
// insert
preparedStatement.setString(1, entity.getUpdateBy());
preparedStatement.setString(2, entity.getUpdateDate().toString());
//where
preparedStatement.setString(3, entity.getReferenceNumber());
preparedStatement.setBigDecimal(4, entity.getAmount());
preparedStatement.setString(5, entity.getTransactionDate().toString());
});
return itemWriter;
}
}
The performance of updating 100k records is slow compared to insertion of 100k records. I tried changing the update to insert statement in the writer and it manages to insert 100k records in less than 40-45 seconds. Update however, is updating 1k records out of 100k records per 2 minutes. What is causing this issue?
Does the chunk size, 1k in my case, affects the performance? I set the chunk size as a constant throughout the testing of inserting and updating using the same reader and processor.
I would first test the update query outside the Spring batch job (using a sql client for example) to make sure the performance hit is due to Spring Batch or not.
Typically, updates are slower than inserts (as they require an extra check for the existence of the row to update), but this could be related to missing indexes on your table.
I have developed a Spring Batch Job which read from Kafka topic using KafkaItemReader class. I want to commit the offset only when the messages read in defined chunk are Processed and written successfully to an Output .dat file.
#Bean
public Job kafkaEventReformatjob(
#Qualifier("MaintStep") Step MainStep,
#Qualifier("moveFileToFolder") Step moveFileToFolder,
#Qualifier("compressFile") Step compressFile,
JobExecutionListener listener)
{
return jobBuilderFactory.get("kafkaEventReformatJob")
.listener(listener)
.incrementer(new RunIdIncrementer())
.flow(MainStep)
.next(moveFileToFolder)
.next(compressFile)
.end()
.build();
}
#Bean
Step MainStep(
ItemProcessor<IncomingRecord, List<Record>> flatFileItemProcessor,
ItemWriter<List<Record>> flatFileWriter)
{
return stepBuilderFactory.get("mainStep")
.<InputRecord, List<Record>> chunk(5000)
.reader(kafkaItemReader())
.processor(flatFileItemProcessor)
.writer(writer())
.listener(basicStepListener)
.build();
}
//Reader reads all the messages from akfka topic and sending back in form of IncomingRecord.
#Bean
KafkaItemReader<String, IncomingRecord> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
List<Integer> partitions = new ArrayList<>();
partitions.add(0);
partitions.add(1);
return new KafkaItemReaderBuilder<String, IncomingRecord>()
.partitions(partitions)
.consumerProperties(props)
.name("records")
.saveState(true)
.topic(topic)
.pollTimeout(Duration.ofSeconds(40L))
.build();
}
#Bean
public ItemWriter<List<Record>> writer() {
ListUnpackingItemWriter<Record> listUnpackingItemWriter = new ListUnpackingItemWriter<>();
listUnpackingItemWriter.setDelegate(flatWriter());
return listUnpackingItemWriter;
}
public ItemWriter<Record> flatWriter() {
FlatFileItemWriter<Record> fileWriter = new FlatFileItemWriter<>();
String tempFileName = "abc";
LOGGER.info("Output File name " + tempFileName + " is in working directory ");
String workingDir = service.getWorkingDir().toAbsolutePath().toString();
Path outputFile = Paths.get(workingDir, tempFileName);
fileWriter.setName("fileWriter");
fileWriter.setResource(new FileSystemResource(outputFile.toString()));
fileWriter.setLineAggregator(lineAggregator());
fileWriter.setForceSync(true);
fileWriter.setFooterCallback(customFooterCallback());
fileWriter.close();
LOGGER.info("Successfully created the file writer");
return fileWriter;
}
#StepScope
#Bean
public TransformProcessor processor() {
return new TransformProcessor();
}
==============================================================================
Writer Class
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#AfterStep
public void afterStep(StepExecution stepExecution) {
this.stepExecution.setWriteCount(count);
}
#Override
public void write(final List<? extends List<Record>> lists) throws Exception {
List<Record> consolidatedList = new ArrayList<>();
for (List<Record> list : lists) {
if (!list.isEmpty() && null != list)
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
count += consolidatedList.size(); // to count Trailer record count
}
===============================================================
Item Processor
#Override
public List process(IncomingRecord record) {
List<Record> recordList = new ArrayList<>();
if (null != record.getEventName() and a few other conditions inside this section) {
// setting values of Record Class by extracting from the IncomingRecord.
recordList.add(the valid records which matching the condition);
}else{
return null;
}
Synchronizing a read operation and a write operation between two transactional resources (a queue and a database for instance)
is possible by using a JTA transaction manager that coordinates both transaction managers (2PC protocol).
However, this approach is not possible if one of the resources is not transactional (like the majority of file systems). So unless you use
a transactional file system and a JTA transaction manager that coordinates a kafka transaction manager and a file system transaction manager..
you need another approach, like the Compensating Transaction pattern. In your case, the "undo" operation (compensating action) would be rewinding the offset where it was before the failed chunk.
I have implemented a kafka application using consumer api. And I have 2 regression tests implemented with stream api:
To test happy path: by producing data from the test ( into the input topic that the application is listening to) that will be consumed by the application and application will produce data (into the output topic ) that the test will consume and validate against expected output data.
To test error path: behavior is the same as above. Although this time application will produce data into output topic and test will consume from application's error topic and will validate against expected error output.
My code and the regression-test codes are residing under the same project under expected directory structure. Both time ( for both tests) data should have been picked up by the same listener at the application side.
The problem is :
When I am executing the tests individually (manually), each test is passing. However, If I execute them together but sequentially ( for example: gradle clean build ) , only first test is passing. 2nd test is failing after the test-side-consumer polling for data and after some time it gives up not finding any data.
Observation:
From debugging, it looks like, the 1st time everything works perfectly ( test-side and application-side producers and consumers). However, during the 2nd test it seems that application-side-consumer is not receiving any data ( It seems that test-side-producer is producing data, but can not say that for sure) and hence no data is being produced into the error topic.
What I have tried so far:
After investigations, my understanding is that we are getting into race conditions and to avoid that found suggestions like :
use #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
Tear off broker after each test ( Please see the ".destry()" on brokers)
use different topic names for each test
I applied all of them and still could not recover from my issue.
I am providing the code here for perusal. Any insight is appreciated.
Code for 1st test (Testing error path):
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.ERROR_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationFailurePathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedFailurePathKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate;
//To read from output error
#Autowired
protected Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedFailurePathKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
.
#TestConfiguration
public class AdapterStreamFailurePathTestConfig {
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#Value("${spring.kafka.adapter.application-id}")
private String applicationId;
#Value("${spring.kafka.adapter.group-id}")
private String groupId;
//Producer of records that the program consumes
#Bean
public Map<String, Object> sendEmailCmdProducerConfigs() {
Map<String, Object> results = KafkaTestUtils.producerProps(embeddedKafkaBroker);
results.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.serializer().getClass());
results.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.INPUT_VALUE_SERDE.serializer().getClass());
return results;
}
#Bean
public ProducerFactory<PreferredMediaMsgKey, SendEmailCmd> inputProducerFactory() {
return new DefaultKafkaProducerFactory<>(sendEmailCmdProducerConfigs());
}
#Bean
public KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate() {
return new KafkaTemplate<>(inputProducerFactory());
}
//Consumer of the error output, generated by the program
#Bean
public Map<String, Object> outputErrorConsumerConfig() {
Map<String, Object> props = KafkaTestUtils.consumerProps(
applicationId, Boolean.TRUE.toString(), embeddedKafkaBroker);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.ERROR_VALUE_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
#Bean
public Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer() {
DefaultKafkaConsumerFactory<PreferredMediaMsgKey, ErrorCmd> rpf =
new DefaultKafkaConsumerFactory<>(outputErrorConsumerConfig());
return rpf.createConsumer(groupId, "notification-failure");
}
}
.
#RunWith(SpringRunner.class)
#SpringBootTest(classes = AdapterStreamFailurePathTestConfig.class)
#ActiveProfiles(profiles = "errtest")
public class ErrorPath400Test extends AbstractIntegrationFailurePathTest {
#Autowired
private DataGenaratorForErrorPath400Test datagen;
#Mock
private AdapterHttpClient httpClient;
#Autowired
private ErroredEmailCmdDeserializer erroredEmailCmdDeserializer;
#Before
public void setup() throws InterruptedException {
Mockito.when(httpClient.callApi(Mockito.any()))
.thenReturn(
new GenericResponse(
400,
TestConstants.ERROR_MSG_TO_CHK));
Mockito.when(httpClient.createURI(Mockito.any(),Mockito.any(),Mockito.any())).thenCallRealMethod();
inputProducerTemplate.send(
projectProerties.getInputTopic(),
datagen.getKey(),
datagen.getEmailCmdToProduce());
System.out.println("producer: "+ projectProerties.getInputTopic());
subscribe(outputErrorConsumer , projectProerties.getErrorTopic(), 0);
}
#Test
public void testWithError() throws InterruptedException, InvalidProtocolBufferException, TextFormat.ParseException {
ConsumerRecords<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd> records;
List<ConsumerRecord<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd>> outputListOfErrors = new ArrayList<>();
int attempt = 0;
int expectedRecords = 1;
do {
records = KafkaTestUtils.getRecords(outputErrorConsumer);
records.forEach(outputListOfErrors::add);
attempt++;
} while (attempt < expectedRecords && outputListOfErrors.size() < expectedRecords);
//Verify the recipient event stream size
Assert.assertEquals(expectedRecords, outputListOfErrors.size());
//Validate output
}
#After
public void tearDown() {
outputErrorConsumer.close();
embeddedFailurePathKafkaBroker.destroy();
}
}
2nd test is almost the same in structure. Although this time the test-side-consumer is consuming from application-side-output-topic( instead of error topic). And I named the consumers,broker,producer,topics differently. Like :
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.OUTPUT_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationSuccessPathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey,SendEmailCmd> sendEmailCmdProducerTemplate;
//To read from output regular topic
#Autowired
protected Consumer<PreferredMediaMsgKey, NotifiedEmailCmd> ouputConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
Please let me know if I should provide any more information.,
"port=9092"
Don't use a fixed port; leave that out and the embedded broker will use a random port; the consumer configs are set up in KafkaTestUtils to point to the random port.
You shouldn't need to dirty the context after each test method - use a different group.id for each test and a different topic.
In my case the consumer was not closed properly. I had to do :
#After
public void tearDown() {
// shutdown hook to correctly close the streams application
Runtime.getRuntime().addShutdownHook(new Thread(ouputConsumer::close));
}
to resolve.
My question is reverse of this existing SO question.
Behavior of JdbcPagingItemReader seems reverse of what is being described in that question i.e. job is marked as FAILED if JdbcPagingItemReader doesn't find any records.
Logs indicate that Job is marked as FAILED because reader couldn't fetch pages from page 1 on wards and SELECT fails due to SQLCODE=-313 i.e.
-313 THE NUMBER OF HOST VARIABLES SPECIFIED IS NOT EQUAL TO THE NUMBER OF PARAMETER MARKERS
So overall step is marked as failed leading to job being failed.
For query from page 1 on wards, sort key is included in SELECT like PAYMENT_ID > ? and I guess since there are no PAYMENT_IDs , value for placeholder is not found so error.
How can I ignore this error and mark job as COMPLETE in this particular scenario?
I tried solution specified in Trever Shick's answer in other question and returning
if(stepExecution.getReadCount() == 0 ){
return ExitStatus.COMPLETED;
}
is not fixing the issue.
Both Chunk Size and Reader page sizes are equal to 10 & THROTTLE_LIMIT=20.
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> syncReader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.listener(afterReadListener)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
Not full stack trace but a line from logs ,
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT PAYMENT_ID,INSURED_LAST_NM,INSURED_FIRST_NM,LAST_NM,FIRST_NM,CONTRACT_NUM,CARRIER_CD,CARRIER_GROUP,CONTRACT_NUM,REL_AR_SEQ_NUM,FROM_DOS_DT,THRU_DOS_DT,BILL_AMT,RX_NUM_BP,CERT_NUM_LEFT_BP,MODIFIER,PROC_CD ,DEPOSIT_ID,PRNT_CONTRACT_NUM,REMIT_TYPE_CD AS RMT_TYPE FROM AR.PAYMENTS WHERE (CONTRACT_NUM IN (SELECT CONTRACT_NUM FROM AR.PAYMENTS WHERE RMTST_RF='M' AND DELETE_IND='N' GROUP BY CONTRACT_NUM ) AND DELETE_IND='N' AND RMTST_RF='M') AND ((PAYMENT_ID > ?)) ORDER BY PAYMENT_ID ASC FETCH FIRST 50 ROWS ONLY]; nested exception is com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-313, SQLSTATE=07004, SQLERRMC=null, DRIVER=4.11.77
Column PAYMENT_ID is my sort key and section AND ((PAYMENT_ID > ?)) has been added by spring batch.
My reader beans ,
#Bean
public ItemReader<RemittanceVO> syncReader() {
SynchronizedItemStreamReader<RemittanceVO> syncReader = new SynchronizedItemStreamReader<RemittanceVO>();
syncReader.setDelegate(reader());
return syncReader;
}
#Bean
public ItemStreamReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
Query provider bean ,
#Bean
public PagingQueryProvider queryProvider() throws Exception{
SqlPagingQueryProviderFactoryBean queryProviderBean= new SqlPagingQueryProviderFactoryBean();
queryProviderBean.setDataSource(dataSource);
queryProviderBean.setDatabaseType("DB2");
queryProviderBean.setSelectClause(Constants.REMITTANCES_SELECT_CLAUSE);
queryProviderBean.setFromClause(Constants.REMITTANCES_FROM_CLAUSE);
queryProviderBean.setWhereClause(Constants.REMITTANCES_WHERE_CLAUSE);
queryProviderBean.setSortKey(Constants.REMITTANCES_SORT_KEY);
PagingQueryProvider queryProvider = queryProviderBean.getObject();
return queryProvider;
}
I am not able to recall for sure but I think , I did below in step & job listeners.
I basically wrote a StepExecutionListener and overridden afterStep method as below,
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
if (stepExecution.getReadCount() == 0) {
logger.info(
"!!! Step is marked as FAILED because no rows OR no valid rows were read by reader of this step !");
ExitStatus newExitStatus = ExitStatus.COMPLETED;
stepExecution.setExitStatus(newExitStatus);
return newExitStatus;
}
return null;
}
And also wrote a job level listener - JobExecutionListenerSupport , put something like that in overridden afterJob method ,
#Override
public void afterJob(JobExecution jobExecution) {
.......
.......
.......
.......
boolean zeroRead = true;
for (StepExecution stepExecution : stepExecutions) {
if (stepExecution.getReadCount() != 0) {
zeroRead = false;
}
}
if (zeroRead) {
logger.info("***** JOB is FAILED because read count is Zero *****");
jobExecution.setExitStatus(ExitStatus.COMPLETED);
return;
}
I am trying to launch a job in Spring Batch 2, and I need to pass some information in the job parameters, but I do not want it to count for the uniqueness of the job instance. For example, I'd want these two sets of parameters to be considered unique:
file=/my/file/path,session=1234
file=/my/file/path,session=5678
The idea is that there will be two different servers trying to start the same job, but with different sessions attached to them. I need that session number in both cases. Any ideas?
Thanks!
So, if 'file' is the only attribute that's supposed to be unique and 'session' is used by downstream code, then your problem matches almost exactly what I had. I had a JMSCorrelationId that i needed to store in the execution context for later use and I didn't want it to play into the job parameters' uniqueness. Per Dave Syer, this really wasn't possible, so I took the route of creating the job with the parameters (not the 'session' in your case), and then adding the 'session' attribute to the execution context before anything actually runs.
This gave me access to 'session' downstream but it was not in the job parameters so it didn't affect uniqueness.
References
https://jira.springsource.org/browse/BATCH-1412
http://forum.springsource.org/showthread.php?104440-Non-Identity-Job-Parameters&highlight=
You'll see from this forum that there's no good way to do it (per Dave Syer), but I wrote my own launcher based on the SimpleJobLauncher (in fact I delegate to the SimpleLauncher if a non-overloaded method is called) that has an overloaded method for starting a job that takes a callback interface that allows contribution of parameters to the execution context while not being 'true' job parameters. You could do something very similar.
I think the applicable LOC for you is right here:
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
which is where, after execution context creatin, the execution context is added to. Hopefully this helps you in your implementation.
public class ControlMJobLauncher implements JobLauncher, InitializingBean {
private JobRepository jobRepository;
private TaskExecutor taskExecutor;
private SimpleJobLauncher simpleLauncher;
private JobFilter jobFilter;
public void setJobRepository(JobRepository jobRepository) {
this.jobRepository = jobRepository;
}
public void setTaskExecutor(TaskExecutor taskExecutor) {
this.taskExecutor = taskExecutor;
}
/**
* Optional filter to prevent job launching based on some specific criteria.
* Jobs that are filtered out will return success to ControlM, but will not run
*/
public void setJobFilter(JobFilter jobFilter) {
this.jobFilter = jobFilter;
}
public JobExecution run(final Job job, final JobParameters jobParameters, ExecutionContextContributor contributor)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException, JobFilteredException {
Assert.notNull(job, "The Job must not be null.");
Assert.notNull(jobParameters, "The JobParameters must not be null.");
//See if job is filtered
if(this.jobFilter != null && !jobFilter.launchJob(job, jobParameters)) {
throw new JobFilteredException(String.format("Job has been filtered by the filter: %s", jobFilter.getFilterName()));
}
final JobExecution jobExecution;
JobExecution lastExecution = jobRepository.getLastJobExecution(job.getName(), jobParameters);
if (lastExecution != null) {
if (!job.isRestartable()) {
throw new JobRestartException("JobInstance already exists and is not restartable");
}
logger.info(String.format("Restarting job %s instance %d", job.getName(), lastExecution.getId()));
}
// Check the validity of the parameters before doing creating anything
// in the repository...
job.getJobParametersValidator().validate(jobParameters);
/*
* There is a very small probability that a non-restartable job can be
* restarted, but only if another process or thread manages to launch
* <i>and</i> fail a job execution for this instance between the last
* assertion and the next method returning successfully.
*/
jobExecution = jobRepository.createJobExecution(job.getName(),
jobParameters);
if (contributor != null) {
if (contributor.contributeTo(jobExecution.getExecutionContext())) {
jobRepository.updateExecutionContext(jobExecution);
}
}
try {
taskExecutor.execute(new Runnable() {
public void run() {
try {
logger.info("Job: [" + job
+ "] launched with the following parameters: ["
+ jobParameters + "]");
job.execute(jobExecution);
logger.info("Job: ["
+ job
+ "] completed with the following parameters: ["
+ jobParameters
+ "] and the following status: ["
+ jobExecution.getStatus() + "]");
} catch (Throwable t) {
logger.warn(
"Job: ["
+ job
+ "] failed unexpectedly and fatally with the following parameters: ["
+ jobParameters + "]", t);
rethrow(t);
}
}
private void rethrow(Throwable t) {
if (t instanceof RuntimeException) {
throw (RuntimeException) t;
} else if (t instanceof Error) {
throw (Error) t;
}
throw new IllegalStateException(t);
}
});
} catch (TaskRejectedException e) {
jobExecution.upgradeStatus(BatchStatus.FAILED);
if (jobExecution.getExitStatus().equals(ExitStatus.UNKNOWN)) {
jobExecution.setExitStatus(ExitStatus.FAILED
.addExitDescription(e));
}
jobRepository.update(jobExecution);
}
return jobExecution;
}
static interface ExecutionContextContributor {
boolean CONTRIBUTED_SOMETHING = true;
boolean CONTRIBUTED_NOTHING = false;
/**
*
* #param executionContext
* #return true if the exeuctioncontext was contributed to
*/
public boolean contributeTo(ExecutionContext executionContext);
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.state(jobRepository != null, "A JobRepository has not been set.");
if (taskExecutor == null) {
logger.info("No TaskExecutor has been set, defaulting to synchronous executor.");
taskExecutor = new SyncTaskExecutor();
}
this.simpleLauncher = new SimpleJobLauncher();
this.simpleLauncher.setJobRepository(jobRepository);
this.simpleLauncher.setTaskExecutor(taskExecutor);
this.simpleLauncher.afterPropertiesSet();
}
#Override
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException, JobParametersInvalidException {
return simpleLauncher.run(job, jobParameters);
}
}
Starting from spring batch 2.2.x, there is support for non-identifying parameters. If you are using CommandLineJobRunner, you can specify non-identifying parameters with '-' prefix.
For example:
java org.springframework.batch.core.launch.support.CommandLineJobRunner file=/my/file/path -session=5678
If you are using old version of spring batch, you need to migrate your database schema. See 'Migrating to 2.x.x' section at http://docs.spring.io/spring-batch/getting-started.html.
This is the Jira page of the feature https://jira.springsource.org/browse/BATCH-1412, and here are the change that implement it https://fisheye.springsource.org/changelog/spring-batch?cs=557515df45c0f596588418d53c3f2bae3781c1c3
In more recent versions of Spring Batch (I am using spring-batch-core:4.3.3), you can use the JobParametersBuilder to specify whether a parameter is identifying or not. For example:
new JobParametersBuilder()
.addString("identifying-param-name", paramValue1)
.addString("non-identifying-param-name", paramValue2, false)
.toJobParameters();
The 'false' in the third argument makes the parameter non-identifying.