Does commit-interval change improve batch performance?

Does commit-interval change improve batch performance? - spring-batch

I have a Spring batch described as like below,
I I change commit-interval from 1 to 10000, will this change improve the performance?
<batch:job id="weeklyPartnerPointAddJob" restartable="true">
<batch:step id="weeklyPartnerPointAddStep" parent="noTransactionStep">
<batch:tasklet task-executor="asyncTaskExecutor" throttle-limit="1">
<batch:chunk reader="selectPartnerListForPointReader" processor="weeklyPartnerPointAccumulateItemProcessor" commit-interval="1" />
</batch:tasklet>
</batch:step>
The weeklyPartnerPointAccumulateItemProcessor processor manipulate the read input and update the record inside the processor. so I didn't create a ItemWriter for update logic.
The noTransactionStep described like following, which doesn't maintain transaction.
<bean id="noTransactionStep" class="org.springframework.batch.core.step.factory.SimpleStepFactoryBean" abstract="true">
<property name="transactionManager" ref="resourcelessTransactionManager" />
<property name="jobRepository" ref="jobRepository" />
<property name="startLimit" value="10" />
<property name="commitInterval" value="3" />
</bean>

Yes, the commit-interval may impact the performance. There is no best value, it depends on the context. Determining the "best" value for your specific use case is an empirical process, ie you need to try different values and find the one that leads to best performance.
A commit-interval of 1 is too small and will lead to a large number of transactions (equal to the number of input items). This could degrade the performance significantly. A value of 10.000 can lead to long running transactions and probably high memory usage (depending on the use case) which can also degrade the performance. In my experience, a value of 100 or 1000 is a good start.

Related

Spring Batch JdbcCursorItemReader moving cursor backward

In spring batch JdbcCursorItemReader is ResultSet processing is TYPE_FORWARD_ONLY.
In my case, I need to point the cursor to 1 row back, hence I want to set TYPE_SCROLL_SENSITIVE, so that I can go back.
Any idea how to do it in spring batch or some workaround?
<bean id="databaseItemReader"
class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource" ref="dataSource" />
<property name="sql"
value="select * from document udd, field uff where uff.docid = udd.docid AND uff.field_name IN ('address','contractNb','city','locale','login','mobile','name','phone') ORDER BY udd.docid, uff.field_name ASC" />
<property name="rowMapper">
<bean class="com.migration.springbatch.UDocumentResultRowMapper" />
</property>
<property name="verifyCursorPosition" value="false"/>
</bean>

The contract of an ItemReader in Spring Batch is forward only, and the JdbcCursorItemReader is implemented according to this contract (Hence the TYPE_FORWARD_ONLY). So it is not possible to rewind the cursor back once it is opened.
That said, when the underlying resource is transactional (such as a database or a JMS queue) then calling read may return the same logical item on subsequent calls in a rollback scenario.
Please find more details in the documentation here: https://docs.spring.io/spring-batch/4.0.x/reference/html/readersAndWriters.html#itemReader

Multi threading in spring batch

I am new to spring batch.i have a requirement to read and process 500 000 lines from text to csv. My item processor is taking five min to process 100 lines which will result in almost 2 days for processing and writing 500k lines.
How to invoke the item reader and processor concurrently?

You can use "SimpleAsyncTaskExecutor" for parallel processing and use it in your spring application context as follows:
<bean id="taskExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor">
</bean>
And then you can specify this taskExecutor in some specific tasklet as follows:
<tasklet task-executor="taskExecutor">
<chunk reader="deskReader" processor="deskProcessor"
writer="deskWriter" commit-interval="1" />
</tasklet>
Note that you need to define the ItemReader, ItemWriter and ItemProcessor classes as specified here.
Also, the for parallel processing, you can specify the throttle-limit which specifies how many threads how want to run in parallel which is by default 4 if throttle-limit is not being specified.

How to run spring batch job in multiple threads at the same time

I have a spring batch job with the following definition :
<batch:step id="step1">
<batch:tasklet task-executor="simpleTaskExecutor">
<batch:chunk reader="itemReader" processor="itemProcessor"
writer="itemWriter" >
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="itemReader" class="CustomReader">
</bean>
Custom reader , reads a row from database and pass it to processor for further processing.
My problem is i want to have multiple threads at the same time to run this job at the same time ( each read a row and process) . based on documentation i used taskExecutor but it didn't worked.
note : my scenario doesn't fit with partitioner.

What do you mean by "doesn't" work?
If you want to read and process one entry with each thread, you need to have a "commit-interval" of exactly one. (http://docs.spring.io/spring-batch/reference/html/configureStep.html)
But note: since several threads will call the reader and writer (they are singleton instances) in parallel you have to ensure that both are thread-safe. The simplest thing to do this would be to synchronize the read, resp. the write method of the reader and writer.

Spring Batch multithreading vs partitioning

I have a requirement where I want to use the Spring batch framework for below scenario.
I have a table which is partitioned on trade date column.
I want to process the records for this table by using reader, processor and writer of Spring batch framework.
What I want to do is create separate threads for reading, writing and processing based on trade date. Suppose there are 4 trade dates then I want to create 4 separate threads each one for separate trade date. In each thread the reader will read the records from the table for that trade date, enrich the records in processor and then publish/write in writer.
I am new to Spring batch, so I need help in designing the right approach for this by using Spring batch multithreading or partitioning.

Maybe you could use local partitioning like follows:
<batch:job id="MyBatch" xmlns="http://www.springframework.org/schema/batch">
<batch:step id="masterStep">
<batch:partition step="slave" partitioner="splitPartitioner">
<batch:handler grid-size="4" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
</batch:job>
Then create the splitPartitioner using the org.springframework.batch.core.partition.support.Partitioner interface. Then in partition method, split the source data as you like and every ExecutionContext you create, will be executed by it's own thread.

You can use local partitioning or master-slave approach to fix your problem. Write your master and slave steps as follows in Spring configuration.
<batch:job id="tradeProcessor">
<batch:step id="master">
<partition step="slave" partitioner="tradePartitioner">
<handler grid-size="4" task-executor="taskExecutor" />
</partition>
</batch:step>
</batch:job>
<batch:step id="slave">
<batch:tasklet>
<batch:chunk reader="dataReader" writer="dataWriter"
processor="dataProcessor" commit-interval="10">
</batch:chunk>
</batch:tasklet>
</batch:step>
For more details, you can refer the simple example discussed here.

Spring Batch JdbcPagingItemReader - sortKey not respected when pageSize / fetchSize are defined

I'm using a paging Jdbc reader in a spring batch job. I have 16 rows in a table and am expecting to see all the rows but the configuration below is only returning 10.
<bean id="pagingQuery" class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="myDataSource"/>
<property name="selectClause" value="select policy_number,val_date,name,sequence,amount,rate,frequency,start_date,end_date,basis"/>
<property name="fromClause" value="from MyTable"/>
<property name="sortKey" value="policy_number"/>
<property name="whereClause" value="load_id=:jobid"/>
</bean>
<bean id="tableReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="myDataSource"/>
<property name="queryProvider" ref="pagingQuery"/>
<property name="parameterValues">
<map>
<entry key="jobid" value="1"/>
</map>
</property>
<property name="pageSize" value="10"/>
<property name="fetchSize" value="10"/>
<property name="rowMapper">
<bean class="org.springframework.jdbc.core.ColumnMapRowMapper"/>
</property>
</bean>
Without just raising the value of the paging/fetch size to above 16 is there an alternative way to configure this reader to return all rows?

I had same issue I needed to get all the data from table, Rather than jdbc I was using mybatis paging. The fact is that Since the view I was fetching was so complex. I was asked to hit(query) database once(paging requires multiple query to be executed for each page, where clause was an overhead). Hence I increased pagesize to accommodate all data, I run into memory issue( which is obvious). Since I'm not using UI to display data, paging is not really a requirement. Hence I used JDBCCursorItemReader. which queries the table once and keeps the data in database cache or temp table space( not a database expert, but because of mess up I found result set is cached in temp table space). the data is sent back to you using result set( it depends on fetch size) which you read and create object till commit interval. once the commit interval is reached, data is written. Next it again fetches the data from result set(if result set doesn't have data it will get another chunk from database as per fetch size) and creates object(row mapper) till commit interval reaches and then calls item writer. Hence I did not run out of memory. Hence you can configure commit interval based on your memory limitation. I would recommend keep fetchsize and commit-interval same to get better performance. When I tweaked fetchsize by increasing I got better execution time but not sure How It will impact else where. Here is sample JDBC cursor Item Reader
<bean id="cursorReader" class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource" ref="dataSource" />
<property name="sql" value="SELECT C1,C2,C3,C4,C5 FROM KP_TBL_VW" />
<property name="rowMapper" ref="rowMapperDomain" />
<property name="fetchSize" value="50000"/>
<property name="driverSupportsAbsolute" value="true" />
</bean>
Apologize if my db understanding is wrong

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Does commit-interval change improve batch performance? - spring-batch

Related

Spring Batch JdbcCursorItemReader moving cursor backward

Multi threading in spring batch

How to run spring batch job in multiple threads at the same time

Spring Batch multithreading vs partitioning

Spring Batch JdbcPagingItemReader - sortKey not respected when pageSize / fetchSize are defined

Categories

Resources