Spring Batch: File not being read - spring-batch

I am trying to create an application that uses the spring-batch-excel extension to be able to read Excel files uploaded through a web interface by it's users in order to parse the Excel file for addresses.
When the code runs, there is no error, but all I get is the following in my log. Even though I have log/syso throughout my Processor and Writer (these are never being called, and all I can imagine is it's not properly reading the file, and returning no data to process/write). And yes, the file has data, several thousand records in fact.
Job: [FlowJob: [name=excelFileJob]] launched with the following parameters: [{file=Book1.xlsx}]
Executing step: [excelFileStep]
Job: [FlowJob: [name=excelFileJob]] completed with the following parameters: [{file=Book1.xlsx}] and the following status: [COMPLETED]
Below is my JobConfig
#Configuration
#EnableBatchProcessing
public class AddressExcelJobConfig {
#Bean
public BatchConfigurer configurer(EntityManagerFactory entityManagerFactory) {
return new CustomBatchConfigurer(entityManagerFactory);
}
#Bean
Step excelFileStep(ItemReader<AddressExcel> excelAddressReader,
ItemProcessor<AddressExcel, AddressExcel> excelAddressProcessor,
ItemWriter<AddressExcel> excelAddressWriter,
StepBuilderFactory stepBuilderFactory) {
return stepBuilderFactory.get("excelFileStep")
.<AddressExcel, AddressExcel>chunk(1)
.reader(excelAddressReader)
.processor(excelAddressProcessor)
.writer(excelAddressWriter)
.build();
}
#Bean
Job excelFileJob(JobBuilderFactory jobBuilderFactory,
#Qualifier("excelFileStep") Step excelAddressStep) {
return jobBuilderFactory.get("excelFileJob")
.incrementer(new RunIdIncrementer())
.flow(excelAddressStep)
.end()
.build();
}
}
Below is my AddressExcelReader
The late binding works fine, there is no error. I have tried loading the resource given the file name, in addition to creating a new ClassPathResource and FileSystemResource. All are giving me the same results.
#Component
#StepScope
public class AddressExcelReader implements ItemReader<AddressExcel> {
private PoiItemReader<AddressExcel> itemReader = new PoiItemReader<AddressExcel>();
#Override
public AddressExcel read()
throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
return itemReader.read();
}
public AddressExcelReader(#Value("#{jobParameters['file']}") String file, StorageService storageService) {
//Resource resource = storageService.loadAsResource(file);
//Resource testResource = new FileSystemResource("upload-dir/Book1.xlsx");
itemReader.setResource(new ClassPathResource("/upload-dir/Book1.xlsx"));
itemReader.setLinesToSkip(1);
itemReader.setStrict(true);
itemReader.setRowMapper(excelRowMapper());
}
public RowMapper<AddressExcel> excelRowMapper() {
BeanWrapperRowMapper<AddressExcel> rowMapper = new BeanWrapperRowMapper<>();
rowMapper.setTargetType(AddressExcel.class);
return rowMapper;
}
}
Below is my AddressExcelProcessor
#Component
public class AddressExcelProcessor implements ItemProcessor<AddressExcel, AddressExcel> {
private static final Logger log = LoggerFactory.getLogger(AddressExcelProcessor.class);
#Override
public AddressExcel process(AddressExcel item) throws Exception {
System.out.println("Converting " + item);
log.info("Convert {}", item);
return item;
}
}
Again, this is never coming into play (no logs generated). And if it matters, this is how I'm launching my job from a FileUploadController from a #PostMapping("/") to handle the file upload, which first stores the file, then runs the job:
#PostMapping("/")
public String handleFileUpload(#RequestParam("file") MultipartFile file, RedirectAttributes redirectAttributes) {
storageService.store(file);
try {
JobParameters jobParameters = new JobParametersBuilder()
.addString("file", file.getOriginalFilename().toString()).toJobParameters();
jobLauncher.run(job, jobParameters);
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException e) {
e.printStackTrace();
}
redirectAttributes.addFlashAttribute("message",
"You successfully uploaded " + file.getOriginalFilename() + "!");
return "redirect:/";
}
And last by not least
Here is my AddressExcel POJO
import lombok.Data;
#Data
public class AddressExcel {
private String address1;
private String address2;
private String city;
private String state;
private String zip;
public AddressExcel() {}
}
UPDATE (10/13/2016)
From Nghia Do's comments, I also created my own RowMapper instead of using the BeanWrapper to see if that was the issue. Still the same results.
public class AddressExcelRowMapper implements RowMapper<AddressExcel> {
#Override
public AddressExcel mapRow(RowSet rs) throws Exception {
AddressExcel temp = new AddressExcel();
temp.setAddress1(rs.getColumnValue(0));
temp.setAddress2(rs.getColumnValue(1));
temp.setCity(rs.getColumnValue(2));
temp.setState(rs.getColumnValue(3));
temp.setZip(rs.getColumnValue(4));
return temp;
}
}

All it seems I needed was to add the following to my ItemReader:
itemReader.afterPropertiesSet();
itemReader.open(new ExecutionContext());

Related

Accessing the resource read by FlatFileReader in SkipPolicy SpringBatch [duplicate]

I have a job with Spring Batch which I read some files with BeanIO, and I would handle invalid files, so I created a SkipPolicy class.
public class FileVerificationSkipper implements SkipPolicy {
private static final FluentLogger LOGGER = LoggerService.init(FileVerificationSkipper.class);
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FileNotFoundException) {
return false;
}
if (exception instanceof BeanReaderException && skipCount <= 10) {
LOGGER.all().logKey("Error on read file: ").value(exception).asError();
return true;
}
else {
return false;
}
}
}
On my reader step I access the name like this: #Value("#{jobParameters['input.file.name']}") String inputFile
I would like to log the filename, how could I do that?
Debugging how Spring Batch inject the parameters I found the solution.
I just need to add #StepScope in the class and create the variable where I want to inject the parameter:
#Component
#StepScope
#RequiredArgsConstructor
public class FileVerificationSkipper implements SkipPolicy {
#Value("#{jobParameters['input.file.name']}")
private String inputFile;
...
}

Access job filename parameter in SkipPolicy of Spring Batch

I have a job with Spring Batch which I read some files with BeanIO, and I would handle invalid files, so I created a SkipPolicy class.
public class FileVerificationSkipper implements SkipPolicy {
private static final FluentLogger LOGGER = LoggerService.init(FileVerificationSkipper.class);
#Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FileNotFoundException) {
return false;
}
if (exception instanceof BeanReaderException && skipCount <= 10) {
LOGGER.all().logKey("Error on read file: ").value(exception).asError();
return true;
}
else {
return false;
}
}
}
On my reader step I access the name like this: #Value("#{jobParameters['input.file.name']}") String inputFile
I would like to log the filename, how could I do that?
Debugging how Spring Batch inject the parameters I found the solution.
I just need to add #StepScope in the class and create the variable where I want to inject the parameter:
#Component
#StepScope
#RequiredArgsConstructor
public class FileVerificationSkipper implements SkipPolicy {
#Value("#{jobParameters['input.file.name']}")
private String inputFile;
...
}

How to check if a job is still running or is finished regardless of finalization status

Working in Spring Batch (3) with Spring Boot(1.5) project. I have an end of day job "endOfDayJob" that is asynchronously execute through a web controller, in the controller i am returning the job execution id.
Below the code for configuration class. Highlight here that i am implementing BatchConfigurer interface and creating a async JobLauncer with SimpleAsyncTaskExecutor.
#Configuration
#EnableBatchProcessing
#EnableAsync
public class BatchConfiguration implements BatchConfigurer {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private DataSource dataSource;
#Bean
public Job endOfDayJob() throws Exception {
SimpleJobBuilder simpleJobBuilder = jobBuilderFactory.get("endOfDayJob")
.incrementer(new RunIdIncrementer())
.start(init())
.next(updateInventory())
.next(generateSalesReport())
.next(cleanup())
.next(sendReport());
return simpleJobBuilder.build();
}
#Bean
public Step init() {
return stepBuilderFactory.get("initStep").tasklet(initTasklet()).build();
}
#Bean
public Step updateInventory() {
return stepBuilderFactory.get("updateInventoryStep").tasklet(updateInventoryTasklet()).build();
}
#Bean
public Step generateSalesReport() {
return stepBuilderFactory.get("generateSalesReportStep").tasklet(generateSalesReportTasklet()).build();
}
#Bean
public Step cleanup() {
return stepBuilderFactory.get("cleanupStep").tasklet(cleanupTasklet()).build();
}
#Bean
public Step sendReport() {
return stepBuilderFactory.get("sendReportStep").tasklet(sendReportTasklet()).build();
}
#Override
public JobRepository getJobRepository() throws Exception {
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(getTransactionManager());
factory.setIsolationLevelForCreate("ISOLATION_READ_COMMITTED");
factory.setTablePrefix("BATCH_");
return factory.getObject();
}
#Override
public PlatformTransactionManager getTransactionManager() throws Exception {
return new DataSourceTransactionManager(dataSource);
}
#Override
public JobLauncher getJobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Override
public JobExplorer getJobExplorer() throws Exception {
JobExplorerFactoryBean jobExplorerFactoryBean = new JobExplorerFactoryBean();
jobExplorerFactoryBean.setDataSource(dataSource);
jobExplorerFactoryBean.setTablePrefix("BATCH_");
jobExplorerFactoryBean.afterPropertiesSet();
return jobExplorerFactoryBean.getObject();
}
}
Here the code for the web controller.
#RestController
#RequestMapping("/api/job")
public class WebController {
private static final Logger logger = LoggerFactory.getLogger(WebController.class);
#Autowired
private DataSource dataSource;
#Autowired
private BatchConfiguration batchConfiguration;
#Autowired
private JobLauncher jobLauncher;
#GetMapping("/endOfDayJob")
private Long kycrBatch(#RequestParam(value = "odate", required = true) String odate) {
logger.info("ExecutingendOfDayJob with odate = {}", odate);
if (odate == null || odate.isEmpty() || odate.trim().isEmpty()) {
return -1L;
}
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("odate", odate);
long jobExecutionId = -1L;
try {
Job endOfDayJob = this.batchConfiguration.endOfDayJob();
jobParametersBuilder.addDate("runtime", new Date());
jobExecutionId = jobLauncher.run(endOfDayJob, jobParametersBuilder.toJobParameters()).getId();
} catch (Exception e) {
logger.error("Error ocurred executing endOfDayJob with message: {}", e.getMessage());
return -1L;
}
return jobExecutionId;
}
}
Then i want to add new method in the controller to to know if the job ended or not. What is a possible way to check if a job is still running or is already finished regardless of finalization status??
Then i want to add new method in the controller to to know if the job ended or not.
You can inject the JobExplorer in your controller and write something like:
public boolean isRunning(long jobExecutionId) {
JobExecution jobExecution = jobExplorer.getJobExecution(jobExecutionId);
return jobExecution.isRunning();
}

Spring Batch: AsyncItemProcessor and AsyncItemWriter

1) I have a large file (> 100k lines) that needs to be processed. I have a lot of business validation and checks against external systems for each line item. The code is being migrated from a legacy app and i just put these business logic into the AsyncitemProcessor, which also persists the data into the DB. Is this a good practise to create/save records in the ItemProcessor (in lieu of ItemWriter) ?
2) Code is ::
#Configuration
#EnableAutoConfiguration
#ComponentScan(basePackages = "com.liquidation.lpid")
#EntityScan(basePackages = "com.liquidation.lpid.entities")
#EnableTransactionManagement
public class SimpleJobConfiguration {
#Autowired
public JobRepository jobRepository;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("myFtpSessionFactory")
private SessionFactory myFtpSessionFactory;
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Bean
public ThreadPoolTaskExecutor lpidItemTaskExecutor() {
ThreadPoolTaskExecutor tExec = new ThreadPoolTaskExecutor();
tExec.setCorePoolSize(10);
tExec.setMaxPoolSize(10);
tExec.setAllowCoreThreadTimeOut(true);
return tExec;
}
#BeforeStep
public void beforeStep(StepExecution stepExecution){
String name = stepExecution.getStepName();
System.out.println("name: " + name);
}
#Bean
public SomeItemWriterListener someItemWriterListener(){
return new SomeItemWriterListener();
};
#Bean
#StepScope
public FlatFileItemReader<FieldSet> lpidItemReader(#Value("#{stepExecutionContext['fileResource']}") String fileResource) {
System.out.println("itemReader called !!!!!!!!!!! for customer data" + fileResource);
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<FieldSet>();
reader.setResource(new ClassPathResource("/data/stage/"+ fileResource));
reader.setLinesToSkip(1);
DefaultLineMapper<FieldSet> lineMapper = new DefaultLineMapper<FieldSet>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
reader.setSkippedLinesCallback(new LineCallbackHandler() {
public void handleLine(String line) {
if (line != null) {
tokenizer.setNames(line.split(","));
}
}
});
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(new PassThroughFieldSetMapper());
lineMapper.afterPropertiesSet();
reader.setLineMapper(lineMapper);
return reader;
}
#Bean
public ItemWriter<FieldSet> lpidItemWriter() {
return new LpidItemWriter();
}
#Autowired
private MultiFileResourcePartitioner multiFileResourcePartitioner;
#Bean
public Step masterStep() {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep().getName(), multiFileResourcePartitioner)
.step(slaveStep())
.gridSize(4)
.taskExecutor(lpidItemTaskExecutor())
.build();
}
#Bean
public ItemProcessListener<FieldSet,String> processListener(){
return new LpidItemProcessListener();
}
#SuppressWarnings("unchecked")
#Bean
public Step slaveStep() {
return stepBuilderFactory.get("slaveStep")
.<FieldSet,FieldSet>chunk(5)
.faultTolerant()
.listener(new ChunkListener())
.reader(lpidItemReader(null))
.processor(asyncItemProcessor())
.writer(asyncItemWriter()).listener(someItemWriterListener()).build();
}
#Bean
public AsyncItemWriter<FieldSet> asyncItemWriter(){
AsyncItemWriter<FieldSet> asyncItemProcessor = new AsyncItemWriter<>();
asyncItemProcessor.setDelegate(lpidItemWriter());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public ItemProcessor<FieldSet, FieldSet> processor() {
return new lpidCheckItemProcessor();
}
#Bean
public AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor() {
AsyncItemProcessor<FieldSet, FieldSet> asyncItemProcessor = new AsyncItemProcessor<FieldSet, FieldSet>();
asyncItemProcessor.setDelegate(processor());
asyncItemProcessor.setTaskExecutor(lpidItemTaskExecutor());
try {
asyncItemProcessor.afterPropertiesSet();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return asyncItemProcessor;
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job").incrementer(new RunIdIncrementer()).start(masterStep()).build();
}
}
The itemwriter runs before the itemprocessor has completed. My understanding is: for every chunk, the item reader reads the data, item processor will churn through each item, and at the end of the chunk, the item writer gets called (which in my case,it does not do anything since the itemprocessor persists the data). But the itemwriter gets called before the item processor gets completed and my job never completes. What am i doing incorrectly here? (I looked at previous issues around it and the solution was to wrap the writer around the AsyncItemWriter(), which i am doing) .
Thanks
Sundar

How to get MDC logging working for Spring Batch

In Spring Batch it would be great to keep track of the execution thread through logging. However, MDC does not seem to work.
MDC.put("process", "batchJob");
logger.info("{}; status={}", getJobName(), batchStatus.name());
Anyone got MDC working in Spring Batch?
I solved it by adding a JobExecutionListener like that:
public class Slf4jBatchJobListener implements JobExecutionListener {
private static final String DEFAULT_MDC_UUID_TOKEN_KEY = "Slf4jMDCFilter.UUID";
private final Logger logger = LoggerFactory.getLogger(getClass());
public void beforeJob(JobExecution jobExecution) {
String token = UUID.randomUUID().toString().toUpperCase();
MDC.put(DEFAULT_MDC_UUID_TOKEN_KEY, token);
logger.info("Job {} with id {} starting...", jobExecution.getJobInstance().getJobName(), jobExecution.getId());
}
public void afterJob(JobExecution jobExecution) {
logger.info("Job {} with id {} ended.", jobExecution.getJobInstance().getJobName(), jobExecution.getId());
MDC.remove(DEFAULT_MDC_UUID_TOKEN_KEY);
}
}
Because some jobs are multi-threaded, I had to add also a TaskDecorator in order to copy the DMC from the parent thread to the subthread like this:
public class Slf4JTaskDecorator implements TaskDecorator {
#Override
public Runnable decorate(Runnable runnable) {
Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
try {
MDC.setContextMap(contextMap);
runnable.run();
} finally {
MDC.clear();
}
};
}
}
Set the TaskDecorator to the TaskExecutor:
#Bean
public TaskExecutor taskExecutor(){
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor("spring_batch");
taskExecutor.setConcurrencyLimit(maxThreads);
taskExecutor.setTaskDecorator(new Slf4JTaskDecorator());
return taskExecutor;
}
And lastly, update the logging pattern in properties:
logging:
pattern:
level: "%5p %X{Slf4jMDCFilter.UUID}"