How can I read using FlatFileReader but write only to ExecutionContext - spring-batch

I want to read read a text file to build a map and place it into the ExecutionContext for later reference.
I thought to start out using chunk-processng to read the file, the process it, but I don't need the FlatFileItemWriter to write to a file. However, bean initializing requires I set a resource on the writer.
Am I going about this wrong? Is chunk=process the wrong approach. Creating a tasklet my be wiser, but I liked that SpringBatch would read my file for me. With a tasklet, I'd have to write the code to open and process the text file. Right?
Advice on how to proceed would be greatly appreciated.

What I wound up doing (I'm new) was create a Tasklet, and have it also implement the StepExecutionListener interface. Worked like a charm. It's reading a comma-delimited file by lines, plucking out the second column. I created an 'enum' for my ExecutionContext map keys. Basically, this below:
public class ProcessTabcPermitsTasklet implements Tasklet, StepExecutionListener {
private Resource resource;
private int linesToSkip;
private Set<String> permits = new TreeSet<String>();
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
BufferedReader reader = new BufferedReader((new FileReader(resource.getFile())));
String line = null;
int lines = 0;
while ((line = reader.readLine()) != null) {
if (++lines <= linesToSkip)
continue;
String[] s = StringUtils.commaDelimitedListToStringArray(line);
permits.add(s[TABC_COLUMNS.PERMIT.ordinal()]);
}
return RepeatStatus.FINISHED;
}
/**
* #param file
* the file to set
*/
public void setResource(Resource resource) {
this.resource = resource;
}
/**
* #param linesToSkip
* the linesToSkip to set
*/
public void setLinesToSkip(int linesToSkip) {
this.linesToSkip = linesToSkip;
}
public ExitStatus afterStep(StepExecution stepExecution) {
stepExecution.getExecutionContext().put(EXECUTION_CONTEXT.TABC_PERMITS.toString(), permits);
return ExitStatus.COMPLETED;
}
}

Related

Accessing the resource read by FlatFileReader in SkipPolicy SpringBatch [duplicate]

I have a job with Spring Batch which I read some files with BeanIO, and I would handle invalid files, so I created a SkipPolicy class.
public class FileVerificationSkipper implements SkipPolicy {
private static final FluentLogger LOGGER = LoggerService.init(FileVerificationSkipper.class);
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FileNotFoundException) {
return false;
}
if (exception instanceof BeanReaderException && skipCount <= 10) {
LOGGER.all().logKey("Error on read file: ").value(exception).asError();
return true;
}
else {
return false;
}
}
}
On my reader step I access the name like this: #Value("#{jobParameters['input.file.name']}") String inputFile
I would like to log the filename, how could I do that?
Debugging how Spring Batch inject the parameters I found the solution.
I just need to add #StepScope in the class and create the variable where I want to inject the parameter:
#Component
#StepScope
#RequiredArgsConstructor
public class FileVerificationSkipper implements SkipPolicy {
#Value("#{jobParameters['input.file.name']}")
private String inputFile;
...
}

Access to stepExecution inside FlatFileFooterCallback

I am creating a fixed length file I have to attached the number of files that are read in to the footer. I need to access the the stepExecution to get the write count, I followed this FlatFileFooterCallback - how to get access to StepExecution For Count. StepExecution is null??
FlatFileFooterCallback
public class LexisNexisRequestFileFooter implements FlatFileFooterCallback {
#Value("#{StepExecution}")
private StepExecution stepExecution;
int totalItemsWritten = 0;
#Override
public void writeFooter(Writer writer) throws IOException {
System.out.println(stepExecution.getWriteCount());
String julianDate = createJulianDate();
String SAT = "##!!SAT#"+julianDate+totalItemsWritten+" \r\n";
String SIT = "##!!SIT#"+julianDate+totalItemsWritten+" \r\n";
String footer = SAT+SIT;
writer.write(footer);
}
}
Configuration file
#Bean
#StepScope
public FlatFileFooterCallback customFooterCallback() {
return new LexisNexisRequestFileFooter();
}
Writer file
// Create writer instance
FlatFileItemWriter<LexisNexisRequestRecord> writer = new FlatFileItemWriter<>();
LexisNexisRequestFileFooter lexisNexisRequestFileFooter = new LexisNexisRequestFileFooter();
writer.setFooterCallback(lexisNexisRequestFileFooter);
// Set output file location
writer.setResource(new FileSystemResource("homeData.txt"));
// All job reptitions should append to same output file
writer.setAppendAllowed(true);
writer.setEncoding("ascii");
In your writer configuration, you are creating the footer callback manually here:
LexisNexisRequestFileFooter lexisNexisRequestFileFooter = new LexisNexisRequestFileFooter();
writer.setFooterCallback(lexisNexisRequestFileFooter);
and not injecting the step scoped bean. Your item writer bean definition method should be something like:
#Bean
public FlatFileItemWriter writer() {
// Create writer instance
FlatFileItemWriter<LexisNexisRequestRecord> writer = new FlatFileItemWriter<>();
writer.setFooterCallback(customFooterCallback());
// Set output file location
writer.setResource(new FileSystemResource("homeData.txt"));
// All job reptitions should append to same output file
writer.setAppendAllowed(true);
writer.setEncoding("ascii");
}

Multiple files of different data structure formats as input in Spring Batch

Based on my research, I know that Spring Batch provides API to handling many different kinds of data file formats.
But I need clarification on how do we supply multiple files of different format in one chunk / Tasklet.
For that, I know that there is MultiResourceItemReader can process multiple files but AFAIK all the files have to be of the same format and data structure.
So, the question is how can we supply multiple files of different data formats as input in a Tasklet ?
Asoub is right and there is no out-of-the-box Spring Batch reader that "reads it all!". However with just a handful of fairly simple and straight forward classes you can make a java config spring batch application that will go through different files with different file formats.
For one of my applications I had a similar type of use case and I wrote a bunch of fairly simple and straight forward implementations and extensions of the Spring Batch framework to create what I call a "generic" reader. So to answer your question: below you will find the code I used to go through different kind of file formats using spring batch. Obviously below you will find the stripped implementation, but it should get you going in the right direction.
One line is represented by a Record:
public class Record {
private Object[] columns;
public void setColumnByIndex(Object candidate, int index) {
columns[index] = candidate;
}
public Object getColumnByIndex(int index){
return columns[index];
}
public void setColumns(Object[] columns) {
this.columns = columns;
}
}
Each line contains multiple columns and the columns are separated by a delimiter. It does not matter if file1 contains 10 columns and/or if file2 only contains 3 columns.
The following reader simply maps each line to a record:
#Component
public class GenericReader {
#Autowired
private GenericLineMapper genericLineMapper;
#SuppressWarnings({ "unchecked", "rawtypes" })
public FlatFileItemReader reader(File file) {
FlatFileItemReader<Record> reader = new FlatFileItemReader();
reader.setResource(new FileSystemResource(file));
reader.setLineMapper((LineMapper) genericLineMapper.defaultLineMapper());
return reader;
}
}
The mapper takes a line and converts it to an array of objects:
#Component
public class GenericLineMapper {
#Autowired
private ApplicationConfiguration applicationConfiguration;
#SuppressWarnings({ "unchecked", "rawtypes" })
public DefaultLineMapper defaultLineMapper() {
DefaultLineMapper lineMapper = new DefaultLineMapper();
lineMapper.setLineTokenizer(tokenizer());
lineMapper.setFieldSetMapper(new CustomFieldSetMapper());
return lineMapper;
}
private DelimitedLineTokenizer tokenizer() {
DelimitedLineTokenizer tokenize = new DelimitedLineTokenizer();
tokenize.setDelimiter(Character.toString(applicationConfiguration.getDelimiter()));
tokenize.setQuoteCharacter(applicationConfiguration.getQuote());
return tokenize;
}
}
The "magic" of converting the columns to the record happens in the FieldSetMapper:
#Component
public class CustomFieldSetMapper implements FieldSetMapper<Record> {
#Override
public Record mapFieldSet(FieldSet fieldSet) throws BindException {
Record record = new Record();
Object[] row = new Object[fieldSet.getValues().length];
for (int i = 0; i < fieldSet.getValues().length; i++) {
row[i] = fieldSet.getValues()[i];
}
record.setColumns(row);
return record;
}
}
Using yaml configuration the user provides an input directory and a list of file names and ofcourse the appropriate delimiter and character to quote a column if the column contains the delimiter. Here is an exmple of such a yaml configuration:
#Component
#ConfigurationProperties
public class ApplicationConfiguration {
private String inputDir;
private List<String> fileNames;
private char delimiter;
private char quote;
// getters and setters ommitted
}
And then the application.yml:
input-dir: src/main/resources/
file-names: [yourfile1.csv, yourfile2.csv, yourfile3.csv]
delimiter: "|"
quote: "\""
And last but not least, putting it all together:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
private GenericReader genericReader;
#Autowired
private NoOpWriter noOpWriter;
#Autowired
private ApplicationConfiguration applicationConfiguration;
#Bean
public Job yourJobName() {
List<Step> steps = new ArrayList<>();
applicationConfiguration.getFileNames().forEach(f -> steps.add(loadStep(new File(applicationConfiguration.getInputDir() + f))));
return jobBuilderFactory.get("yourjobName")
.start(createParallelFlow(steps))
.end()
.build();
}
#SuppressWarnings("unchecked")
public Step loadStep(File file) {
return stepBuilderFactory.get("step-" + file.getName())
.<Record, Record> chunk(10)
.reader(genericReader.reader(file))
.writer(noOpWriter)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
For demonstration purposes you can just put all the classes in one package. The NoOpWriter simply logs the 2nd column of my test files.
#Component
public class NoOpWriter implements ItemWriter<Record> {
#Override
public void write(List<? extends Record> items) throws Exception {
items.forEach(i -> System.out.println(i.getColumnByIndex(1)));
// NO - OP
}
}
Good luck :-)
I don't think there is an out-of-the-box Spring batch reader for multiple input format.
You'll have to build your own. Of course you can reuse already existing FileItemReader as delegates in your custom file reader, and for each file type/format, use the right one.

Pass current step output to next step and write to flatfile

I need to prepare two set of List and write them into FlatFile. The first set will be only simple retrieving from SQL and before write into FlatFile will do some string formatting. Another set of data slightly complex, first I need to get data from some table and insert into a temp table. The data will grab from this temp table and similarly need to perform some string formatting and also updating the temp file. Finally, both set data write into FlatFile.
Come into Spring Batch, I will have 3 steps.
First Step
First Reader read from DB
First Processor string formatting
First Writer write into file
Second Step
BeforeRead Retrieve and Insert to Temp table
Second Reader read from temp table
Second Processor string formatting and update temp table status
Second Writer write into file
Third Step
MUltiResourceItemReader read two files
Write into Final File
Tasklet
Delete both file and purge the temp table.
My question now is for first and second step if I don't write into file, possible to pass the data into third step?
Taking in account what Hansjoerg Wingeier said, below are custom implementations of ListItemWriter and ListItemReader which lets you define a name property. This property is used as a key to store the list in the JobExecutionContext.
The reader :
public class CustomListItemReader<T> implements ItemReader<T>, StepExecutionListener {
private String name;
private List<T> list;
#Override
public T read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
if (list != null && !list.isEmpty()) {
return list.remove(0);
}
return null;
}
#Override
public void beforeStep(StepExecution stepExecution) {
list = (List<T>) stepExecution.getJobExecution().getExecutionContext().get(name);
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return null;
}
public void setName(String name) {
this.name = name;
}
}
The writer :
public class CustomListItemWriter<T> implements ItemWriter<T>, StepExecutionListener {
private String name;
private List<T> list = new ArrayList<T>();
#Override
public void write(List<? extends T> items) throws Exception {
for (T item : items) {
list.add(item);
}
}
#Override
public void beforeStep(StepExecution stepExecution) {}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
stepExecution.getJobExecution().getExecutionContext().put(name, list);
return null;
}
public void setName(String name) {
this.name = name;
}
}
Normally, you don't want to do that.
If you just have a couple of hundred entries, it would work. You could, for instance, write a special class, that implements the reader and writer interface. When writing, just store the data in a list, when reading, read the entries from the list. Just instantiate it as a bean and use it in both steps (1 and 2) as your writer. by simply make the write method synchronized, it would even work when step 1 and 2 are executed in parallel.
But the problem is, that this solution doesn't scale with the amount of your input data. the more data you read, the more memory you need.
This is one of the key concepts of batch-processing: having a constant memory usage regardless of the amount of data that has to be processed.

Spring Batch process an encoded zipped file

I’m investigating the use of spring batch to process records from an encoded zipped file. The records are variable length with nested variable length data fields encoded within them.
I’m new to Spring and Spring Batch, this is how I plan to structure the batch configuration.
The ItemReader would need to read a single record from the zipped (*.gz) file input stream into a POJO (byte array), the length of this record would be contained in the first two bytes of the stream.
The ItemProcessor will decode the byte array and store info in relevant attributes in the POJO.
The ItemWriter would populate a database.
My initial problem is understanding how to set up the ItemReader, I’ve looked at some of the examples of using a FlatFileItemReader, but my difficulty is the expectation to have a Line Mapper. I don't see how I can do that in my case (no concept of a line in the file).
There are some articles indicating the use of a custom BufferedReaderFactory, but great to see a worked example of this.
Help would be appreciated.
if the gzipped file is a simple txt file, you only need a custum BufferedReaderFactory, the linemaper then gets the String of the current line
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPInputStream;
import org.springframework.batch.item.file.BufferedReaderFactory;
import org.springframework.core.io.Resource;
public class GZipBufferedReaderFactory implements BufferedReaderFactory {
/** Default value for gzip suffixes. */
private List<String> gzipSuffixes = new ArrayList<String>() {
{
add(".gz");
add(".gzip");
}
};
/**
* Creates Bufferedreader for gzip Resource, handles normal resources
* too.
*
* #param resource
* #param encoding
* #return
* #throws UnsupportedEncodingException
* #throws IOException
*/
#Override
public BufferedReader create(Resource resource, String encoding)
throws UnsupportedEncodingException, IOException {
for (String suffix : gzipSuffixes) {
// test for filename and description, description is used when
// handling itemStreamResources
if (resource.getFilename().endsWith(suffix)
|| resource.getDescription().endsWith(suffix)) {
return new BufferedReader(new InputStreamReader(new GZIPInputStream(resource.getInputStream()), encoding));
}
}
return new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));
}
public List<String> getGzipSuffixes() {
return gzipSuffixes;
}
public void setGzipSuffixes(List<String> gzipSuffixes) {
this.gzipSuffixes = gzipSuffixes;
}
}
simple itemreader configuration:
<bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="#{jobParameters['input.file']}" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper" />
</property>
<property name="strict" value="true" />
<property name="bufferedReaderFactory">
<bean class="your.custom.GZipBufferedReaderFactory" />
</property>
</bean>
Tested that this simple configuration of reading lines from a zipped & encoded file in S3 works.
Key points:
Implement a BufferedReaderFactory that uses Apache's GZIPInputStreamFactory, and set that as the bufferedReaderFactory on the FlatFileItemReader.
Configure a SimpleStorageResourceLoader from Spring Cloud with an AmazonS3Client, and use it to get the zipped flat file in S3. Set that as the resource on the FlatFileItemReader.
Note: reading into a string can be easily replaced by reading into a POJO.
GZIPBufferedReaderFactory.java
Using Apache's GZIPInputStreamFactory
public class GZIPBufferedReaderFactory implements BufferedReaderFactory {
private final GZIPInputStreamFactory gzipInputStreamFactory;
public GZIPBufferedReaderFactory(GZIPInputStreamFactory gzipInputStreamFactory) {
this.gzipInputStreamFactory = gzipInputStreamFactory;
}
#Override
public BufferedReader create(Resource resource, String encoding) throws IOException {
return new BufferedReader(new InputStreamReader(gzipInputStreamFactory.create(resource.getInputStream()), encoding));
}
}
AWSConfiguration.java
#Configuration
public class AWSConfiguration {
#Bean
public AmazonS3Client s3Client(AWSCredentialsProvider credentials, Region region) {
ClientConfiguration clientConfig = new ClientConfiguration();
AmazonS3Client client = new AmazonS3Client(credentials, clientConfig);
client.setRegion(region);
return client;
}
}
How you configure the AWSCredentialsProvider and Region beans can vary and I will not detail that here since there is documentation elsewhere.
BatchConfiguration.java
#Configuration
#EnableBatchProcessing
public class SignalsIndexBatchConfiguration {
#Autowired
public AmazonS3Client s3Client;
#Bean
public GZIPInputStreamFactory gzipInputStreamFactory() {
return new GZIPInputStreamFactory();
}
#Bean
public GZIPBufferedReaderFactory gzipBufferedReaderFactory(GZIPInputStreamFactory gzipInputStreamFactory) {
return new GZIPBufferedReaderFactory(gzipInputStreamFactory);
}
#Bean
public SimpleStorageResourceLoader simpleStorageResourceLoader() {
return new SimpleStorageResourceLoader(s3Client);
}
#Bean
#StepScope
protected FlatFileItemReader<String> itemReader(
SimpleStorageResourceLoader simpleStorageResourceLoader,
GZIPBufferedReaderFactory gzipBufferedReaderFactory) {
FlatFileItemReader<String> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setBufferedReaderFactory(gzipBufferedReaderFactory);
flatFileItemReader.setResource(simpleStorageResourceLoader.getResource("s3://YOUR_FLAT_FILE.csv"));
flatFileItemReader.setLineMapper(new PassThroughLineMapper());
return flatFileItemReader;
}
#Bean
public Job job(Step step) {
return jobBuilderFactory.get("job").start(step).build();
}
#Bean
protected Step step(GZIPInputStreamFactory gzipInputStreamFactory) {
return stepBuilderFactory.get("step")
.<String, String> chunk(200)
.reader(itemReader(simpleStorageResourceLoader(), gzipBufferedReaderFactory(gzipInputStreamFactory)))
.processor(itemProcessor())
.faultTolerant()
.build();
}
/*
* These components are some of what we
* get for free with the #EnableBatchProcessing annotation
*/
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public JobRepository jobRepository;
/*
* END Freebies
*/
#Bean
public JobLauncher jobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
}
From the feature request ticket to spring batch (https://jira.spring.io/browse/BATCH-1750):
public class GZIPResource extends InputStreamResource implements Resource {
public GZIPResource(Resource delegate) throws IOException {
super(new GZIPInputStream(delegate.getInputStream()));
}
}
The custom GZipBufferedReaderFactory won't work with other than FlatFileItemReader.
Edit: lazy version. This doesn't try to open the file until getInputStream is called. This avoids exceptions due to that the file doesn't exist if you create the Resource at the program initialization (e.g. with autowiring).
public class GzipLazyResource extends FileSystemResource implements Resource {
public GzipLazyResource(File file) {
super(file);
}
public GzipLazyResource(String path) {
super(path);
}
#Override
public InputStream getInputStream() throws IOException {
return new GZIPInputStream(super.getInputStream());
}
}
Edit2: this only works for input Resources
Adding another similar method getOutputStream won't work because spring uses the FileSystemResource.getFile, not the FileSystemResource.getOutputStream.
My confusion was based around the file handling in the custom ItemReader, if I was to open and process the file in the read() method, I would have to keep track of where I was in the file etc. I managed to tackle this by creating a BufferedInputStream (BufferedInputStream(new GZIPInputStream(new FileInputStream(file)) in the constructor of the custom ItemReader, then process that stream in the read() method with each iteration of the step.