Why is my update so slow JDBCbatchitemwriter? - postgresql

This is the Step
#Bean
public Step processEodBatchUpdateActualTableStep() {
log.debug("[processEodBatchJob] Start Update Process for Actual Table");
return stepBuilderFactory.get(JobConfigurationConstants.PROCESS_EOD_FILE_UPDATE_STEP_NAME)
.<ExtensionQRMerchantTrxHistEntity, TransactionHistoryExtEntity>chunk(1000)
.reader(updateItemReader())
.processor(new ExtensionToTrxnHistExtConverter(mapper))
.writer(new UpdateActualTable(dataSource).updateActualTable())
.build();
}
This is the reader
#Bean
public JdbcCursorItemReader<ExtensionQRMerchantTrxHistEntity> updateItemReader(){
log.info("[UPDATE Reader] Read all records from temp table");
JdbcCursorItemReader<ExtensionQRMerchantTrxHistEntity> reader = new JdbcCursorItemReader<>();
reader.setSql("SELECT * FROM ext_qr_merchant_trx_hist eqmth " +
"WHERE EXISTS " +
"(SELECT 1 FROM t_trxn_detail_ext ttde WHERE eqmth.trx_ref_no = ttde.ref_no AND eqmth.trx_amt = ttde.amount " +
"AND eqmth.trx_dt = ttde.trxn_date);");
reader.setDataSource(dataSource);
reader.setFetchSize(10);
reader.setRowMapper(new RowMapper<ExtensionQRMerchantTrxHistEntity>() {
#Override
public ExtensionQRMerchantTrxHistEntity mapRow(#NonNull ResultSet rs, int rowNum) throws SQLException {
ExtensionQRMerchantTrxHistEntity entity = new ExtensionQRMerchantTrxHistEntity();
entity.setTransactionDate(rs.getTimestamp(1));
entity.setTransactionRefNo(rs.getString(2));
entity.setTransactionAmount(rs.getBigDecimal(3));
entity.setQrString(rs.getString(4));
return entity;
}
});
return reader;
}
This is the processor
#Slf4j
#RequiredArgsConstructor
public class ExtensionToTrxnHistExtConverter implements ItemProcessor<ExtensionQRMerchantTrxHistEntity, TransactionHistoryExtEntity> {
private final DuitNowRppDTOMapper mapper;
#Override
public TransactionHistoryExtEntity process(#NonNull ExtensionQRMerchantTrxHistEntity entity) throws Exception {
log.info("[Processor] Setting ExtensionQRMerchantTrxHistEntity to TransactionHistoryExtEntity");
return setTransactionHistory(entity);
}
private TransactionHistoryExtEntity setTransactionHistory(ExtensionQRMerchantTrxHistEntity tempEntity){
//Set output
TransactionHistoryExtEntity outputEntity =new TransactionHistoryExtEntity();
//Parse QR String
DuitNowRppDTO dto = mapper.mapFromQRDestination(tempEntity.getQrString());
//Set current date
Date now = new Date();
//Set field for Insert new record
UUID uuid = UUID.randomUUID();
outputEntity.setId(uuid);
outputEntity.setCreateDate(now);
outputEntity.setCreateBy(Constants.SYSTEM);
//Set field for updating record
outputEntity.setUpdateDate(now);
outputEntity.setUpdateBy(Constants.SYSTEM);
//replace field from temp table
outputEntity.setCurrencyCode(dto.getTransactionCurrencyCode());
outputEntity.setTransactionDate(tempEntity.getTransactionDate());
outputEntity.setReferenceNumber(tempEntity.getTransactionRefNo());
outputEntity.setAmount(tempEntity.getTransactionAmount());
return outputEntity;
}
}
This is the writer
#Slf4j
#RequiredArgsConstructor
public class UpdateActualTable {
private final DataSource dataSource;
public JdbcBatchItemWriter<TransactionHistoryExtEntity> updateActualTable() {
log.info("[Update] Using Batch Item Writer to UPDATE to Actual Table");
JdbcBatchItemWriter<TransactionHistoryExtEntity> itemWriter = new JdbcBatchItemWriter<>();
itemWriter.setDataSource(dataSource);
itemWriter.setSql("UPDATE t_trxn_detail_ext " +
"SET " +
"update_by = ?, update_dt = ? " +
"WHERE ref_no = ? AND amount = ? AND trxn_date = ?");
itemWriter.setItemPreparedStatementSetter((entity, preparedStatement) -> {
// insert
preparedStatement.setString(1, entity.getUpdateBy());
preparedStatement.setString(2, entity.getUpdateDate().toString());
//where
preparedStatement.setString(3, entity.getReferenceNumber());
preparedStatement.setBigDecimal(4, entity.getAmount());
preparedStatement.setString(5, entity.getTransactionDate().toString());
});
return itemWriter;
}
}
The performance of updating 100k records is slow compared to insertion of 100k records. I tried changing the update to insert statement in the writer and it manages to insert 100k records in less than 40-45 seconds. Update however, is updating 1k records out of 100k records per 2 minutes. What is causing this issue?
Does the chunk size, 1k in my case, affects the performance? I set the chunk size as a constant throughout the testing of inserting and updating using the same reader and processor.

I would first test the update query outside the Spring batch job (using a sql client for example) to make sure the performance hit is due to Spring Batch or not.
Typically, updates are slower than inserts (as they require an extra check for the existence of the row to update), but this could be related to missing indexes on your table.

Related

Spring Batch taking long time to complete the Job

I am working on a Spring batch application. This application read data from DB, process and send to the Kafka.
I need to read data from two table in parent-child relationship.
Like :
Parent :
- Id, Name
Child :
- Id, Name, Parent_Id
I am using JpaPagingItemReader. I am reading Parent table data from reader and Child data from process.
#Autowired
private JpaTransactionManager transactionManager;
#PersistenceContext
private EntityManager em;
public ItemStreamReader<Parent> reader() {
JpaPagingItemReader<Parent> itemReader = new JpaPagingItemReader<>();
try {
String sqlQuery = "SELECT * FROM PARENT";
JpaNativeQueryProvider<Parent> queryProvider = new JpaNativeQueryProvider<Parent>();
queryProvider.setSqlQuery(sqlQuery);
queryProvider.setEntityClass(Parent.class);
queryProvider.afterPropertiesSet();
itemReader.setEntityManagerFactory(em.getEntityManagerFactory());
itemReader.setPageSize(100);
itemReader.setQueryProvider(queryProvider);
itemReader.afterPropertiesSet();
itemReader.setSaveState(true);
}
catch (Exception e) {
System.out.println("BatchConfiguration.reader() ==> error " + e.getMessage());
}
return itemReader;
}
#Autowired
private ChildRepository childRepository;
#Bean
public ItemProcessor<Parent,ParentVO> opptyProcess() {
return new ItemProcessor<Parent, ParentVO>() {
#Override
public ParentVO process(Parent parent) throws JsonProcessingException {
ParentVO parentVO = new ParentVO();
parentVO.setId(parent.getId());
parentVO.setName(parent.getName());
List<Child> childList = childRepository.findByParentId(parent.getId());
if(childList != null && childList.size() > 0) {
for(Child child :childList) {
ChildVo childVO= new ChildVO();
childVO.setId(child.getId);
childVO.setName(child.getName());
childVO.setParentId(child.getParentId())
ParentVO.getChildList().add(childVO);
}
}
return parentVO;
}
};
}
#Bean
public Step step() {
return stepBuilderFactory.get("step1")
.<Parent, ParentVO>chunk(100)
.reader(reader())
.processor(process())
.writer(writer)
.taskExecutor(threadPool)
.transactionManager(transactionManager)
.throttleLimit(10)
.build();
}
I am testing this app with 20k records. The performance of this app is very slow. Every minute it can read/process/write only 100 records. If I comments the below line it takes 2 minutes to complete the job.
List<Child> childList = childRepository.findByParentId(parent.getId());
if(childList != null && childList.size() > 0) {
for(Child child :childList) {
ChildVo childVO= new ChildVO();
childVO.setId(child.getId);
childVO.setName(child.getName());
childVO.setParentId(child.getParentId())
ParentVO.getChildList().add(childVO);
}
}
What the other way I can do to get the Child table data and make this Job faster.
You are basically doing a join operation on the application side, and that's the cause of your performance issue.
The pattern you are implementing is similar to the driving query pattern, where a processor is used to enrich items returned by the reader. This pattern is known to suffer from the n+1 problem, which performs poorly in some circumstances.
I recommend you do the join on the database side. Relational database systems are well optimized for this kind of operations, and if your application grabs data already joined on the db side, you will notice a big performance boost.

How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file?

I have developed a Spring Batch Job which read from Kafka topic using KafkaItemReader class. I want to commit the offset only when the messages read in defined chunk are Processed and written successfully to an Output .dat file.
#Bean
public Job kafkaEventReformatjob(
#Qualifier("MaintStep") Step MainStep,
#Qualifier("moveFileToFolder") Step moveFileToFolder,
#Qualifier("compressFile") Step compressFile,
JobExecutionListener listener)
{
return jobBuilderFactory.get("kafkaEventReformatJob")
.listener(listener)
.incrementer(new RunIdIncrementer())
.flow(MainStep)
.next(moveFileToFolder)
.next(compressFile)
.end()
.build();
}
#Bean
Step MainStep(
ItemProcessor<IncomingRecord, List<Record>> flatFileItemProcessor,
ItemWriter<List<Record>> flatFileWriter)
{
return stepBuilderFactory.get("mainStep")
.<InputRecord, List<Record>> chunk(5000)
.reader(kafkaItemReader())
.processor(flatFileItemProcessor)
.writer(writer())
.listener(basicStepListener)
.build();
}
//Reader reads all the messages from akfka topic and sending back in form of IncomingRecord.
#Bean
KafkaItemReader<String, IncomingRecord> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
List<Integer> partitions = new ArrayList<>();
partitions.add(0);
partitions.add(1);
return new KafkaItemReaderBuilder<String, IncomingRecord>()
.partitions(partitions)
.consumerProperties(props)
.name("records")
.saveState(true)
.topic(topic)
.pollTimeout(Duration.ofSeconds(40L))
.build();
}
#Bean
public ItemWriter<List<Record>> writer() {
ListUnpackingItemWriter<Record> listUnpackingItemWriter = new ListUnpackingItemWriter<>();
listUnpackingItemWriter.setDelegate(flatWriter());
return listUnpackingItemWriter;
}
public ItemWriter<Record> flatWriter() {
FlatFileItemWriter<Record> fileWriter = new FlatFileItemWriter<>();
String tempFileName = "abc";
LOGGER.info("Output File name " + tempFileName + " is in working directory ");
String workingDir = service.getWorkingDir().toAbsolutePath().toString();
Path outputFile = Paths.get(workingDir, tempFileName);
fileWriter.setName("fileWriter");
fileWriter.setResource(new FileSystemResource(outputFile.toString()));
fileWriter.setLineAggregator(lineAggregator());
fileWriter.setForceSync(true);
fileWriter.setFooterCallback(customFooterCallback());
fileWriter.close();
LOGGER.info("Successfully created the file writer");
return fileWriter;
}
#StepScope
#Bean
public TransformProcessor processor() {
return new TransformProcessor();
}
==============================================================================
Writer Class
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#AfterStep
public void afterStep(StepExecution stepExecution) {
this.stepExecution.setWriteCount(count);
}
#Override
public void write(final List<? extends List<Record>> lists) throws Exception {
List<Record> consolidatedList = new ArrayList<>();
for (List<Record> list : lists) {
if (!list.isEmpty() && null != list)
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
count += consolidatedList.size(); // to count Trailer record count
}
===============================================================
Item Processor
#Override
public List process(IncomingRecord record) {
List<Record> recordList = new ArrayList<>();
if (null != record.getEventName() and a few other conditions inside this section) {
// setting values of Record Class by extracting from the IncomingRecord.
recordList.add(the valid records which matching the condition);
}else{
return null;
}
Synchronizing a read operation and a write operation between two transactional resources (a queue and a database for instance)
is possible by using a JTA transaction manager that coordinates both transaction managers (2PC protocol).
However, this approach is not possible if one of the resources is not transactional (like the majority of file systems). So unless you use
a transactional file system and a JTA transaction manager that coordinates a kafka transaction manager and a file system transaction manager..
you need another approach, like the Compensating Transaction pattern. In your case, the "undo" operation (compensating action) would be rewinding the offset where it was before the failed chunk.

SpringBatch - How to track whether Update is happening successfully or not

I have 2 ItemWriters, one for DB Insert and one for DB Update.
With use of ClassifierCustomItemWriter which I am calling the respective ItemWriter for new record and update the existing records.
Here i have concern. How to know the update has been happened or not ? For Example, if the "Application ID" not exists in the DB , the ItemWriter will not throw any error, but i want to know that update has not happened for this record and log it.
How can i track that ?
#Bean
public ClassifierCompositeItemWriter<TRSBatchEntryFormRequest> classifierCompositeItemWriter(ItemWriter<TRSBatchEntryFormRequest> databaseTableItemWriter, ItemWriter<TRSBatchEntryFormRequest> databaseTableUpdateItemWriter) {
ClassifierCompositeItemWriter<TRSBatchEntryFormRequest> classifierCompositeItemWriter = new ClassifierCompositeItemWriter<>();
classifierCompositeItemWriter.setClassifier((Classifier<TRSBatchEntryFormRequest, ItemWriter<? super TRSBatchEntryFormRequest>>) trsBatchEntryFormRequest -> {
if (trsBatchEntryFormRequest.getForm_status().equals("New")) {
return databaseTableItemWriter;
} else {
return databaseTableUpdateItemWriter;
}
});
return classifierCompositeItemWriter;
}
// Writer for DB
#Bean
public ItemWriter<TRSBatchEntryFormRequest> databaseTableItemWriter(DataSource springBatchDatasource) {
JdbcBatchItemWriter<TRSBatchEntryFormRequest> databaseItemWriter = new JdbcBatchItemWriter<TRSBatchEntryFormRequest>();
databaseItemWriter.setDataSource(springBatchDatasource);
logger.info("INSERT QUERY....: " + QUERY_INSERT_TRSEntryForms);
databaseItemWriter.setSql(QUERY_INSERT_TRSEntryForms);
databaseItemWriter.setItemSqlParameterSourceProvider(new TRSDBInputProvider());
return databaseItemWriter;
}
//Update Writer for DB
#Bean
public ItemWriter<TRSBatchEntryFormRequest> databaseTableUpdateItemWriter(DataSource springBatchDatasource) {
JdbcBatchItemWriter<TRSBatchEntryFormRequest> databaseItemWriter = new JdbcBatchItemWriter<TRSBatchEntryFormRequest>();
databaseItemWriter.setDataSource(springBatchDatasource);
logger.info("UPDATE QUERY....: " + QUERY_UPDATE_TRSEntryForms);
databaseItemWriter.setSql(QUERY_UPDATE_TRSEntryForms);
databaseItemWriter.setItemSqlParameterSourceProvider(new TRSDBInputProvider());
return databaseItemWriter;
}
​
Thanks
You can't track that with a CompositeItemWriter. What you can do is use a custom item writer like the following:
import java.util.List;
import org.springframework.batch.item.ItemWriter;
import org.springframework.jdbc.core.JdbcTemplate;
public class TRSBatchEntryFormRequestItemWriter implements ItemWriter<TRSBatchEntryFormRequest> {
private static final String INSERT_ITEM = "insert into item...";
private static final String UPDATE_ITEM = "update item set...";
private JdbcTemplate jdbcTemplate;
public TRSBatchEntryFormRequestItemWriter(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
#Override
public void write(List<? extends TRSBatchEntryFormRequest> items) throws Exception {
for (TRSBatchEntryFormRequest item : items) {
int updated = jdbcTemplate.update(UPDATE_ITEM);
if (updated == 0) {
jdbcTemplate.update(INSERT_ITEM);
}
}
}
}
The idea is to try to issue an update. If the operation returns 0, this means no rows have been updated and the item does not exist in the database, and you can issue an insert in that case.
Hope this helps.

How to commit a transaction on multiple sql queries in Rx-java2 jdbc?

I am trying to insert sql records into multiple records with rxjava2-jdbc. Please let me know how can I achieve that. I tried below steps but it was unsuccessful.
Case 1)
public class DatabaseRepository {
private Database db;
public DatabaseRepository() throws Exception{
NonBlockingConnectionPool pool =
Pools.nonBlocking()
.maxPoolSize(Runtime.getRuntime().availableProcessors() * 5)
.connectionProvider(ConnectionProvider.from("jdbc:oracle:thin:#//rcld19-scan.test.com:1522/TGCD01", "test", "testPassword"))
.build();
this.db = Database.from(pool);
}
public Flowable<Integer> insertIntoMultipleTables() {
Flowable<Integer> insertIntoEmployee=db.update(insert into employee(name, designation) values ("Employee_1","Manager"))
.counts()
.doOnError(e -> {
log.error("Exception while inserting record to employee table: {}", e.getMessage());
});
return db.update(insert into department(name, no_of_employees) values("Management",1))
.dependsOn(insertIntoEmployee)
.counts()
.doOnError(e -> {
log.error("Exception while inserting record to department table: {}", e.getMessage());
});
}
}
I am trying to insert into multiple tables as part of a single transaction. In this case, failure on insertion of record into department table will not rollback data from first table
Case 2)
public class DatabaseRepository {
private Database db;
public DatabaseRepository() throws Exception{
NonBlockingConnectionPool pool =
Pools.nonBlocking()
.maxPoolSize(Runtime.getRuntime().availableProcessors() * 5)
.connectionProvider(ConnectionProvider.from("jdbc:oracle:thin:#//rcld19-scan.test.com:1522/TGCD01", "test", "testPassword"))
.build();
this.db = Database.from(pool);
}
public Flowable<Tx<Integer>> insertIntoMultipleTables(){
Flowable<Tx<Integer>> insertIntoEmployee= db.update(insert into employee(name, designation) values ("Employee_1","Manager"))
.transacted()
.counts()
.flatMap(tx ->{
return tx.update(insert into department(name, no_of_employees) values("Management",1))
.counts()
.doOnError(e -> log.error("Exception while inserting record to department table: {}",
e.getMessage()));
})
.doOnError(e -> {
log.error("Exception while inserting record to employee table: {}", e.getMessage());
});
}
}
This code is not working as a transaction. Any SQL error in one of the insertion, is not rolling back the records inserted into other table
My requirement is using reactive java2-jdbc i need to insert records into multiple database tables, I am not able to find any valid examples in Git. Please let me know if I need to do anything differently.

Spring Batch - how to COMPLETE a job when the JdbcPagingItemReader returns no data

My question is reverse of this existing SO question.
Behavior of JdbcPagingItemReader seems reverse of what is being described in that question i.e. job is marked as FAILED if JdbcPagingItemReader doesn't find any records.
Logs indicate that Job is marked as FAILED because reader couldn't fetch pages from page 1 on wards and SELECT fails due to SQLCODE=-313 i.e.
-313 THE NUMBER OF HOST VARIABLES SPECIFIED IS NOT EQUAL TO THE NUMBER OF PARAMETER MARKERS
So overall step is marked as failed leading to job being failed.
For query from page 1 on wards, sort key is included in SELECT like PAYMENT_ID > ? and I guess since there are no PAYMENT_IDs , value for placeholder is not found so error.
How can I ignore this error and mark job as COMPLETE in this particular scenario?
I tried solution specified in Trever Shick's answer in other question and returning
if(stepExecution.getReadCount() == 0 ){
return ExitStatus.COMPLETED;
}
is not fixing the issue.
Both Chunk Size and Reader page sizes are equal to 10 & THROTTLE_LIMIT=20.
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> syncReader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.listener(afterReadListener)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
Not full stack trace but a line from logs ,
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT PAYMENT_ID,INSURED_LAST_NM,INSURED_FIRST_NM,LAST_NM,FIRST_NM,CONTRACT_NUM,CARRIER_CD,CARRIER_GROUP,CONTRACT_NUM,REL_AR_SEQ_NUM,FROM_DOS_DT,THRU_DOS_DT,BILL_AMT,RX_NUM_BP,CERT_NUM_LEFT_BP,MODIFIER,PROC_CD ,DEPOSIT_ID,PRNT_CONTRACT_NUM,REMIT_TYPE_CD AS RMT_TYPE FROM AR.PAYMENTS WHERE (CONTRACT_NUM IN (SELECT CONTRACT_NUM FROM AR.PAYMENTS WHERE RMTST_RF='M' AND DELETE_IND='N' GROUP BY CONTRACT_NUM ) AND DELETE_IND='N' AND RMTST_RF='M') AND ((PAYMENT_ID > ?)) ORDER BY PAYMENT_ID ASC FETCH FIRST 50 ROWS ONLY]; nested exception is com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-313, SQLSTATE=07004, SQLERRMC=null, DRIVER=4.11.77
Column PAYMENT_ID is my sort key and section AND ((PAYMENT_ID > ?)) has been added by spring batch.
My reader beans ,
#Bean
public ItemReader<RemittanceVO> syncReader() {
SynchronizedItemStreamReader<RemittanceVO> syncReader = new SynchronizedItemStreamReader<RemittanceVO>();
syncReader.setDelegate(reader());
return syncReader;
}
#Bean
public ItemStreamReader<RemittanceVO> reader() {
JdbcPagingItemReader<RemittanceVO> reader = new JdbcPagingItemReader<RemittanceVO>();
reader.setDataSource(dataSource);
reader.setRowMapper(new RemittanceRowMapper());
reader.setQueryProvider(queryProvider);
reader.setPageSize(Constants.SPRING_BATCH_READER_PAGE_SIZE);
return reader;
}
Query provider bean ,
#Bean
public PagingQueryProvider queryProvider() throws Exception{
SqlPagingQueryProviderFactoryBean queryProviderBean= new SqlPagingQueryProviderFactoryBean();
queryProviderBean.setDataSource(dataSource);
queryProviderBean.setDatabaseType("DB2");
queryProviderBean.setSelectClause(Constants.REMITTANCES_SELECT_CLAUSE);
queryProviderBean.setFromClause(Constants.REMITTANCES_FROM_CLAUSE);
queryProviderBean.setWhereClause(Constants.REMITTANCES_WHERE_CLAUSE);
queryProviderBean.setSortKey(Constants.REMITTANCES_SORT_KEY);
PagingQueryProvider queryProvider = queryProviderBean.getObject();
return queryProvider;
}
I am not able to recall for sure but I think , I did below in step & job listeners.
I basically wrote a StepExecutionListener and overridden afterStep method as below,
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
if (stepExecution.getReadCount() == 0) {
logger.info(
"!!! Step is marked as FAILED because no rows OR no valid rows were read by reader of this step !");
ExitStatus newExitStatus = ExitStatus.COMPLETED;
stepExecution.setExitStatus(newExitStatus);
return newExitStatus;
}
return null;
}
And also wrote a job level listener - JobExecutionListenerSupport , put something like that in overridden afterJob method ,
#Override
public void afterJob(JobExecution jobExecution) {
.......
.......
.......
.......
boolean zeroRead = true;
for (StepExecution stepExecution : stepExecutions) {
if (stepExecution.getReadCount() != 0) {
zeroRead = false;
}
}
if (zeroRead) {
logger.info("***** JOB is FAILED because read count is Zero *****");
jobExecution.setExitStatus(ExitStatus.COMPLETED);
return;
}