I have a DAO class to retrieve a set of data from Hibernate.
<batch:step id="firstStep">
<batch:tasklet>
<batch:chunk reader="firstReader" writer="firstWriter"
processor="itemProcessor" commit-interval="2">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="firstReader" class="com.process.MyReader"
scope="step">
</bean>
Inside my reader, I will call DAO to get the data before read.
public class MyReader implements ItemReader<JobInstance>{
private List<JobInstance> jobList;
private String currentDate;
#Autowired
private JobDAO perDAO;
#BeforeRead
public void init() {
//jobList= perDAO.getPersonAJobList(currentDate);
}
#Override
public JobInstance read() throws Exception, UnexpectedInputException,
ParseException, NonTransientResourceException {
return !jobList.isEmpty() ? jobList.remove(0) : null;
}
#Value("#{jobParameters['currentDate']}")
public void setCurrentDate(String currentDate) {
this.currentDate = currentDate;
}
#Override
public void beforeStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
// TODO Auto-generated method stub
return null;
}
}
When I run the batch job, the batch job keep repeating reading and processing.
[org.springframework.batch.repeat.support.RepeatTemplate] [getNextResult] [372] - Repeat operation about to start at count=1
Below is my DAO class
#Autowired
private QueryManager queryManager;
#Autowired
public JobDAO Impl(SessionFactory sessionFactory) {
super(sessionFactory, JobInstance.class);
}
public List<JobInstance> getPersonAJobList(String currentDate) {
String sql = queryManager.getNamedQuery("getJobList");
System.out.println("---------------------- " + sql + " " + currentDate);
SQLQuery query = this.getCurrentSession().createSQLQuery(sql);
query.setParameter("current_date", currentDate);
....
return result;
}
if you fill the list within the #BeforeRead annotated method, the list will be renewed before every read
see http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/annotation/BeforeRead.html
Marks a method to be called before an item is read from an ItemReader
if you need to get the items from a DAO you need to think about the implementation of either
easy way - keep the current implementation, but add a check in BeforeRead to init the list only once
a stateful DAO which fills the list once and removes items for every
read call
a stateless DAO with pagination
a better way is to move the data access (the SQL) into the batch, Spring Batch provides out of the box readers for SQL, Hibernate and even more... see http://docs.spring.io/spring-batch/reference/html/listOfReadersAndWriters.html
The init method should be called only once. The correct way to do this is either to implement the InitializingBean interface and implementing the afterPropertiesSet method, or using the #PostConstruct annotation instead of #BeforeRead.
The use of #BeforeRead is definitely wrong and makes no sense.
As also mentioned in the comments to Michael's answers, you should also consider to use one of the standard readers to get data from a db. If you just get a couple of hundred or thousand entries from getPersonAJobList it won't be a problem, but if you get millions of entries, it would definitely be wrong approach.
What about add an 'init' flag into your reader? Into MyReader.read():
if flag is not setted call jobDAO to fill jobList and set flag
If flag is setted consume jobList items.
Be careful using jobList.remove(0) because your reader seems not to be restartable; you need to maintain last consumed items index into execution-context so a restart will continue from first item of last not commited chunk.
Related
I've stumbled upon a pretty twisted issue with a spring batch recently.
Requirements are as follows :
I have two main steps :
The first one reads some data from an oracle database, from a table to write to another table.
The second one does some other database stuff, based upon a data handled on first step.
From a design standpoint, first step looks like this :
#Bean
public Step myFirstStep(JdbcCursorItemReader<Revision> reader) {
return stepBuilderFactory.get("my-first-step")
.<Revision, Revision>chunk(1)
.reader(readerRevisionNumber)
.writer(compositeItemWriter())
.listener(executionContextPromotionListener())
.build();
Composite item writer :
#Bean
public CompositeItemWriter<Revision> compositeItemWriter() {
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(somewriter(), someOtherwriter(), aWriterThatIsSupposedToPassDataToAnotherStep()));
return writer;
}
While the first two writer are not complex, my interest is focused on the third one.
aWriterThatIsSupposedToPassDataToAnotherStep()
As you might have guessed, this one will be used to get some data being processed before, to promote it on my second Step :
#Component
#StepScope
public class AWriterThatIsSupposedToPassDataToAnotherStep implements ItemWriter<SomeEntity> {
private StepExecution stepExecution;
public void write(List<? extends SomeEntity> items) {
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("revisionNumber", items.stream().findFirst().get().getSomeField());
System.out.println("writing : " + items.stream().findFirst().get().getSomeField()+ "to ExecutionContext");
}
#BeforeStep
public void saveStepExecution(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
}
Problem is : As long as this writer is part of a composite writer list (as declared above)
The #BeforeStep of my last writer is never executed, this ends up me unable to transmit my information to execution context.
When replacing my CompositeItemWriter by my single "AWriterThatIsSupposedToPassDataToAnotherStep" inside step definition, it gets executed properly.
Does it have to do anything with some kind of declaration order or something ?
Big Thanks to further help.
Found the solution (with some of my coworkers help), and sourced-in from : https://stackoverflow.com/a/39698653/1957764
You'll need to both declare the writer as part of the composite writer AND a step listener to make it execute the #BeforeStep annotated method.
I'm trying to execute the following SQL statement every time the Database Session gets refreshed. I have a Spring Boot 2.0.1.RELEASE with JPA application and a PostgreSQL Database.
select set_config('SOME KEY', 'SOME VALUE', false);
As the PostgreSQL documentation states the is_local parameter is used to indicate that this configuration value will apply just for the current transaction -if true- or will be attached to the session (as I require) -if false-
The problem is that I'm not aware when Hibernate/Hikari are refreshing the db session, so, in practice, the application start failing when it has a couple of minutes running, as you can imagine...
My approach -that is not working yet- is to implement a EmptyInterceptor, for that I have added a DatabaseCustomizer class to inject my hibernate.session_factory.interceptor properly in a way that Spring can fill out all my #Autowires
DatabaseInterceptor.class
#Component
public class DatabaseInterceptor extends EmptyInterceptor {
#Autowired
private ApplicationContext context;
#Override
public void afterTransactionBegin(Transaction tx) {
PersistenceService pc = context.getBean(PersistenceService.class);
try {
pc.addPostgresConfig("SOME KEY", "SOME VALUE");
System.out.println("Config added...");
} catch (Exception e) {
e.printStackTrace();
}
}
}
DatabaseCustomizer.class
#Component
public class DatabaseCustomizer implements HibernatePropertiesCustomizer {
#Autowired
private DatabaseInterceptor databaseInterceptor;
#Override
public void customize(Map<String, Object> hibernateProperties) {
hibernateProperties.put("hibernate.session_factory.interceptor", databaseInterceptor);
}
}
Obviously, there is a problem with this approach because when I #Override the afterTransactionBegin method to start another transaction I get an Infinite loop.
I tried to look something inside that Transaction tx that could help to be sure that this transaction is not being generated by my own addPostgresConfig but there is not much on it.
Is there something else I could try to achieve this?
Thanks in advance,
I'm constructing a spring-batch job that modifies a given number of records. The list of record ID's are an input parameter of the job. For example, one job might be: Modify the record Id's {1,2,3,4} and set parameters X and Y on related tables.
Since I'm unable to pass a potentialy very long input list (tipical cases, 50K records) to my ItemReader I only pass a MyJobID which then the itemReader uses to load the target ID list.
Problem is, the resulting code appears "wrong" (altough it works) and not in the spirit of spring-batch. Here's the reader:
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
#Component
public class MyItemReader implements ItemReader<Integer> {
#Autowired
private JobService jobService;
private List<Integer> itemsList;
private Long jobId;
#Autowired
public MyItemReader(#Value("#{jobParameters['jobId']}") final Long jobId) {
this.jobId = jobId;
this.itemsList = null;
}
#Override
public Integer read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
// First pass: Load the list.
if (itemsList == null) {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
// Serve one at a time:
if (itemsList.isEmpty()) {
return null;
} else {
return itemsList.remove(0);
}
}
}
I tried to move the first part of the read() method to the constructor but the #Autowired reference is null at that point. Afterwards (on the read method) it's initialized.
Is there a better way to write the ItemReader? I would like to move the "load"Or is this the best solution for this scenario?
Thank you.
Generally, your approach is not "wrong", but probably not ideal.
Firstly, you could move the initialisation to a initMethod which is annotated with #PostConstruct. This method is called after all Autowired fields have been injected:
#PostConstruct
public void afterPropertiesSet() throws Exception {
itemsList = new ArrayList<Integer>();
MyJob myJob = (MyJob) jobService.loadById(jobId);
for (Integer i : myJob.getTargedIdList()) {
itemsList.add(i);
}
}
But there is still the problem, that you load all the data at once. If you have a billion records to process, this could blow up the memory.
So what you should do is to load only a chunk of your data into memory, then return the items one by one in your read method. If all entries of a chunk have been returned, load the next chunk and return its items one by one again. If there is no other chunk to be loaded, then return null from the read method.
This ensures that you have a constant memory footprint regardless of how many records you have to process.
(If you have a look at FlatFileItemReader, you see that it uses a BufferedReader to read the data from the disk. While it has nothing to do with SpringBatch, it is the same principle: it reads a chunk of data from the disk, returns that and if more data is needed, it reads the next chunk of data).
Next problem is the restartability. What happens if the job crashes after doing 90% of the work? How can the job be restarted and only process the missing 10%?
This is actually a feature that springbatch provides, all you have to do is to implement the ItemStream interface and implement the methods open(), update(), close().
If you consider this two points - load data in chunks instead all at once and implement ItemStream interface - you'll end up having a reader that is in the spring spirit.
have a task to write header to file only if some data exist, other words if reader return nothing file created by writer should be empty.
Unfortunately FlatFileItemWriter implementation, in version 3.0.7, has only private access fields and methods and nested class that store all info about writing process, so I cannot just take and overwrite write() method. I need to copy-paste almost all content of FlatFileItemWriter to add small piece of new functionality.
Any idea how to achieve this more elegantly in Spring Batch?
So, finally found a less-more elegant solution.
The solution is to use LineAggregators, and seems in the current implementation of FlatFileItemWriter this is only one approach that you can use safer when inheriting this class.
I use separate line aggregator only for a header, but the solution can be extended to use multiple aggregators.
Also in my case header is just predefined string, thus I use PassThroughLineAggregator by default that just return my string to FlatFileItemWriter.
public class FlatFileItemWriterWithHeaderOnData extends FlatFileItemWriter {
private LineAggregator lineAggregator;
private LineAggregator headerLineAggregator = new PassThroughLineAggregator();
private boolean applyHeaderAggregator = true;
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(headerLineAggregator, "A HeaderLineAggregator must be provided.");
super.afterPropertiesSet();
}
#Override
public void setLineAggregator(LineAggregator lineAggregator) {
this.lineAggregator = lineAggregator;
super.setLineAggregator(lineAggregator);
}
public void setHeaderLineAggregator(LineAggregator headerLineAggregator) {
this.headerLineAggregator = headerLineAggregator;
}
#Override
public void write(List items) throws Exception {
if(applyHeaderAggregator){
LineAggregator initialLineAggregator = lineAggregator;
super.setLineAggregator(headerLineAggregator);
super.write(getHeaderItems());
super.setLineAggregator(initialLineAggregator);
applyHeaderAggregator = false;
}
super.write(items);
}
private List<String> getHeaderItems() throws ItemStreamException {
// your actual implementation goes here
return Arrays.asList("Id,Name,Details");
}
}
PS. This solution assumed that if method write() called then some data exist.
Try this in your writer
writer.setShouldDeleteIfEmpty(true);
If you have no data, there is no file.
In other case, you write your header and your items
I'm thinking of a way as below.
BeforeStep() (or a Tasklet) if there is no Data at all, you set a flag such as "noData" is 'true'. Otherwise will be 'false'
And you have 2 writers, one with Header and another one without Header. In this case you can have a base Writer acts as a parent and then 2 writers inherits it. The only difference between them is one with Header and one doesn't have HeaderCallBack.
Base on the flag, you can switch to either 'Writer with Header' or 'Writer without Header'
Thanks,
Nghia
Scenario
To make it simple, let's suppose I have an ItemReader that returns me 25 rows.
The first 10 rows belong to student A
The next 5 belong to student B
and the 10 remaining belong to student C
I want to aggregate them together logically say by studentId and flatten them to end up with one row per student.
Problem
If I understand correctly, setting the commit interval to 5 will do the following:
Send 5 rows to the Processor (which will aggregate them or do any business logic I tell it to).
After Processed will write 5 rows.
Then it will do it again for the next 5 rows and so on.
If that is true, then for the next five I will have to check the already written ones, get them out aggregate them to the ones that I am currently processing and write them again.
I personally do no like that.
What is the best practice to handle a situation like this in Spring Batch?
Alternative
Sometimes I feel that it is much easier to write a regular Spring JDBC main program and then I have full control of what I want to do. However, I wanted to take advantage of of the job repository state monitoring of the job, ability to restart, skip, job and step listeners....
My Spring Batch Code
My module-context.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:batch="http://www.springframework.org/schema/batch"
xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
<description>Example job to get you started. It provides a skeleton for a typical batch application.</description>
<batch:job id="job1">
<batch:step id="step1" >
<batch:tasklet transaction-manager="transactionManager" start-limit="100" >
<batch:chunk reader="attendanceItemReader"
processor="attendanceProcessor"
writer="attendanceItemWriter"
commit-interval="10"
/>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="attendanceItemReader" class="org.springframework.batch.item.database.JdbcCursorItemReader">
<property name="dataSource">
<ref bean="sourceDataSource"/>
</property>
<property name="sql"
value="select s.student_name ,s.student_id ,fas.attendance_days ,fas.attendance_value from K12INTEL_DW.ftbl_attendance_stumonabssum fas inner join k12intel_dw.dtbl_students s on fas.student_key = s.student_key inner join K12INTEL_DW.dtbl_schools ds on fas.school_key = ds.school_key inner join k12intel_dw.dtbl_school_dates dsd on fas.school_dates_key = dsd.school_dates_key where dsd.rolling_local_school_yr_number = 0 and ds.school_code = ? and s.student_activity_indicator = 'Active' and fas.LOCAL_GRADING_PERIOD = 'G1' and s.student_current_grade_level = 'Gr 9' order by s.student_id"/>
<property name="preparedStatementSetter" ref="attendanceStatementSetter"/>
<property name="rowMapper" ref="attendanceRowMapper"/>
</bean>
<bean id="attendanceStatementSetter" class="edu.kdc.visioncards.preparedstatements.AttendanceStatementSetter"/>
<bean id="attendanceRowMapper" class="edu.kdc.visioncards.rowmapper.AttendanceRowMapper"/>
<bean id="attendanceProcessor" class="edu.kdc.visioncards.AttendanceProcessor" />
<bean id="attendanceItemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<property name="resource" value="file:target/outputs/passthrough.txt"/>
<property name="lineAggregator">
<bean class="org.springframework.batch.item.file.transform.PassThroughLineAggregator" />
</property>
</bean>
</beans>
My supporting classes for the Reader.
A PreparedStatementSetter
package edu.kdc.visioncards.preparedstatements;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import org.springframework.jdbc.core.PreparedStatementSetter;
public class AttendanceStatementSetter implements PreparedStatementSetter {
public void setValues(PreparedStatement ps) throws SQLException {
ps.setInt(1, 7);
}
}
and a RowMapper
package edu.kdc.visioncards.rowmapper;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.springframework.jdbc.core.RowMapper;
import edu.kdc.visioncards.dto.AttendanceDTO;
public class AttendanceRowMapper<T> implements RowMapper<AttendanceDTO> {
public static final String STUDENT_NAME = "STUDENT_NAME";
public static final String STUDENT_ID = "STUDENT_ID";
public static final String ATTENDANCE_DAYS = "ATTENDANCE_DAYS";
public static final String ATTENDANCE_VALUE = "ATTENDANCE_VALUE";
public AttendanceDTO mapRow(ResultSet rs, int rowNum) throws SQLException {
AttendanceDTO dto = new AttendanceDTO();
dto.setStudentId(rs.getString(STUDENT_ID));
dto.setStudentName(rs.getString(STUDENT_NAME));
dto.setAttDays(rs.getInt(ATTENDANCE_DAYS));
dto.setAttValue(rs.getInt(ATTENDANCE_VALUE));
return dto;
}
}
My processor
package edu.kdc.visioncards;
import java.util.HashMap;
import java.util.Map;
import org.springframework.batch.item.ItemProcessor;
import edu.kdc.visioncards.dto.AttendanceDTO;
public class AttendanceProcessor implements ItemProcessor<AttendanceDTO, Map<Integer, AttendanceDTO>> {
private Map<Integer, AttendanceDTO> map = new HashMap<Integer, AttendanceDTO>();
public Map<Integer, AttendanceDTO> process(AttendanceDTO dto) throws Exception {
if(map.containsKey(new Integer(dto.getStudentId()))){
AttendanceDTO attDto = (AttendanceDTO)map.get(new Integer(dto.getStudentId()));
attDto.setAttDays(attDto.getAttDays() + dto.getAttDays());
attDto.setAttValue(attDto.getAttValue() + dto.getAttValue());
}else{
map.put(new Integer(dto.getStudentId()), dto);
}
return map;
}
}
My concerns from code above
In the Processor, I create a HashMap and as I process the rows I check whether I already have that Student in the Map, if it's not there I add it. If it's already there I grab the it get the values that I am interested in and add them with the row that I am currently processing.
After that, Spring Batch Framework writes to a File according to my configuration
My question is as follows:
I do not want it to go to the writer. I want to process all the remaining rows. How do I keep this Map that I have created in memory for the next set of rows that need to go through this same Processor? Everytime, a row is processed through AttendanceProcessor the Map is initialized. Should I put the Map initialization in a static block?
In my application I created a CollectingJdbcCursorItemReader that extends the standard JdbcCursorItemReader and performs exactly what you need. Internally it uses my CollectingRowMapper: an extension of the standard RowMapper that maps multiple related rows to one object.
Here is the code of the ItemReader, the code of CollectingRowMapper interface, and an abstract implementation of it, is available in another answer of mine.
import java.sql.ResultSet;
import java.sql.SQLException;
import org.springframework.batch.item.ReaderNotOpenException;
import org.springframework.batch.item.database.JdbcCursorItemReader;
import org.springframework.jdbc.core.RowMapper;
/**
* A JdbcCursorItemReader that uses a {#link CollectingRowMapper}.
* Like the superclass this reader is not thread-safe.
*
* #author Pino Navato
**/
public class CollectingJdbcCursorItemReader<T> extends JdbcCursorItemReader<T> {
private CollectingRowMapper<T> rowMapper;
private boolean firstRead = true;
/**
* Accepts a {#link CollectingRowMapper} only.
**/
#Override
public void setRowMapper(RowMapper<T> rowMapper) {
this.rowMapper = (CollectingRowMapper<T>)rowMapper;
super.setRowMapper(rowMapper);
}
/**
* Read next row and map it to item.
**/
#Override
protected T doRead() throws Exception {
if (rs == null) {
throw new ReaderNotOpenException("Reader must be open before it can be read.");
}
try {
if (firstRead) {
if (!rs.next()) { //Subsequent calls to next() will be executed by rowMapper
return null;
}
firstRead = false;
} else if (!rowMapper.hasNext()) {
return null;
}
T item = readCursor(rs, getCurrentItemCount());
return item;
}
catch (SQLException se) {
throw getExceptionTranslator().translate("Attempt to process next row failed", getSql(), se);
}
}
#Override
protected T readCursor(ResultSet rs, int currentRow) throws SQLException {
T result = super.readCursor(rs, currentRow);
setCurrentItemCount(rs.getRow());
return result;
}
}
You can use it just like the classic JdbcCursorItemReader: the only requirement is that you provide it a CollectingRowMapper instead of the classic RowMapper.
I always follow this pattern:
I make my reader scope to be "step", and in #PostConstruct I fetch
the results, and put them in a Map
In processor, I convert the associatedCollection into writable list,
and send the writable list
In ItemWriter, I persist the writable item(s) depending on the case
because you changed your question i add a new answer
if the students are ordered then there is no need for list/map, you could use exactly one studentObject on the processor to keep the "current" and aggregate on it until there is a new one (read: id change)
if the students are not ordered you will never know when a specific student is "finished" and you'd have to keep all students in a map which can't be written until the end of the complete read sequence
beware:
the processor needs to know when the reader is exhausted
its hard to get it working with any commit-rate and "id" concept if you aggregate items that are somehow identical the processor just can't know if the currently processed item is the last one
basically the usecase is either solved at reader level completely or at writer level (see other answer)
private SimpleItem currentItem;
private StepExecution stepExecution;
#Override
public SimpleItem process(SimpleItem newItem) throws Exception {
SimpleItem returnItem = null;
if (currentItem == null) {
currentItem = new SimpleItem(newItem.getId(), newItem.getValue());
} else if (currentItem.getId() == newItem.getId()) {
// aggregate somehow
String value = currentItem.getValue() + newItem.getValue();
currentItem.setValue(value);
} else {
// "clone"/copy currentItem
returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
// replace currentItem
currentItem = newItem;
}
// reader exhausted?
if(stepExecution.getExecutionContext().containsKey("readerExhausted")
&& (Boolean)stepExecution.getExecutionContext().get("readerExhausted")
&& currentItem.getId() == stepExecution.getExecutionContext().getInt("lastItemId")) {
returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
}
return returnItem;
}
basically you talk about batch processing with changing IDs(1), where the batch has to keep track of the change
for spring/spring-batch we talk about:
ItemWriter which checks the list of items for an id change
before the change the items are stored in a temporary datastore(2) (List, Map, whatever), and are not written out
when the id changes, the aggregating/flattening business code runs on the items in the datastore and one item should be written, now the datastore can be used for the next items with the next id
this concept needs a reader which tells the step "i'm exhausted" to properly flush the temporary datastore on end of items (file/database)
here a rough and simple code example
#Override
public void write(List<? extends SimpleItem> items) throws Exception {
// setup with first sharedId at startup
if (currentId == null){
currentId = items.get(0).getSharedId();
}
// check for change of sharedId in input
// keep items in temporary dataStore until id change of input
// call delegate if there is an id change or if the reader is exhausted
for (SimpleItem item : items) {
// already known sharedId, add to tempData
if (item.getSharedId() == currentId) {
tempData.add(item);
} else {
// or new sharedId, write tempData, empty it, keep new id
// the delegate does the flattening/aggregating
delegate.write(tempData);
tempData.clear();
currentId = item.getSharedId();
tempData.add(item);
}
}
// check if reader is exhausted, flush tempData
if ((Boolean) stepExecution.getExecutionContext().get("readerExhausted")
&& tempData.size() > 0) {
delegate.write(tempData);
// optional delegate.clear();
}
}
(1)assuming the items are ordered by an ID (can be composite too)
(2)a hashmap spring bean for thread safety
Use Step Execution Listener and store the records as map to the StepExecutionContext , you can then group them in the writer or writer listener and write it at a time