Spring boot reading CSV file using "FlatFileItemReader" in Service method - spring-batch

I am going to develop an application where I am trying to read CSV file using Spring FlatFileItemReader object. I like to use a Service in where a method will be called to execute reading process. But I am not directly using configuration bean object of FlatFileItemReader and others. An example of my prototype bellow.
public void executeCsvReading(){
FlatFileItemReader<SomeModel> reader = new FlatFileItemReader<SomeModel>();
reader.setResource(new ClassPathResource("someFile.csv"));
reader.setLineMapper(new DefaultLineMapper<SomeModel>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] { "somefield1", "somefield2" });
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<SomeModel>() {
{
setTargetType(SomeModel.class);
}
});
}
});
// But how do I start this Job?
Job job = jobBuilderFactory
.get("readCSVFilesJob")
.incrementer(new RunIdIncrementer())
.start(step)
.build();
Step step = stepBuilderFactory.get("step1").<SomeModel, SomeModel>chunk(5)
.reader(reader)
.writer(new WriteItemsOn()) // Call WriteItemsOn class constructor
.build();
}
public class WriteItemsOn implements ItemWriter<User> {
#Override
public void write(List<? extends SomeModel> items) throws Exception {
// TODO Auto-generated method stub
for (SomeModel item : items) {
System.out.println("Item is " + item.someMethod()");
}
}
}

// But how do I start this Job?
To start the job, you need to use a JobLauncher. For example:
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
//jobLauncher.setJobRepository(yourJobRepository);
//jobLauncher.setTaskExecutor(yourTaskExecutor);
jobLauncher.afterPropertiesSet();
jobLauncher.run(job, new JobParameters());
However, with this approach, you will need to configure infrastructure beans required by Spring Batch (JobRepository, JobLauncher, etc) yourself.
I would recommend to use a typical Spring Batch job configuration and run your job from the method executeCsvReading. Here is an example:
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
#EnableBatchProcessing
public class MyJob {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
public MyJob(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
}
#Bean
public Step step() {
return stepBuilderFactory.get("step")
.tasklet((contribution, chunkContext) -> {
System.out.println("hello world");
return RepeatStatus.FINISHED;
})
.build();
}
#Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(step())
.build();
}
}
With this job configuration in place, you can load the Spring application context and launch your job as follows:
public void executeCsvReading() {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
Note that loading the Spring application context can be done outside the method executeCsvReading so that it is not loaded each time this method is called. With this approach, you don't have to configure infrastructure beans required by Spring Batch yourself, they will be automatically created and added to the application context. It is of course possible to override them if needed.
Heads up: If you put the MyJob configuration class in the package of your Spring Boot app, Spring Boot will by default execute the job at startup. You can disable this behaviour by adding spring.batch.job.enabled=false to your application properties.
Hope this helps.

Related

Why does JobBuilder says could not autowire?

I am following this Udemy course(Batch Processing with Spring Batch & Spring Boot
) for Spring Batch. In the course JBF(JobBuilderFactory) is depracated so I googled what to use instead and it says use JobBuilder.
Right now jobBuilder and stepBuilder are underlined red and says could not autowired.
package com.example.springbatch;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;
#SpringBootApplication
//1st step
#EnableBatchProcessing
//2nd //use of this?
#ComponentScan("com.example.config") //job and steps will go in this packet
public class SpringBatchApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchApplication.class, args);
}
}
package com.example.config;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration //add annot
public class SampleJob { //3rd creating first samp job
//5th Create the job, spring batch provides one class - job builder. And create the obj named jobBuilder
#Autowired
private JobBuilder jobBuilder;
#Autowired
private StepBuilder stepBuilder;
#Bean //4th define #Bean for the first job
public Job firstJob() { //Job(interface) imports core.Job
//6th use
return jobBuilder.get("First Job") //use get First Job is 1st job name
.start(firstStep()) //Inside the start, pass in your step. Job can hava single or multiple step
.build();
}
#Bean //7th Adding the Step interface. Autowire it as well.
Step firstStep() {
return stepBuilder.get("First Step")
.tasklet(firstTask()) //Will need to call Tasklet.
.build(); //call build to create the step.
}
//8th
private Tasklet firstTask() {
return new Tasklet() {
#Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
System.out.println("This is first tasklet step");
return RepeatStatus.FINISHED; //need this
}
};
}
}
I tried to search on google and this is suppose to print System.out.println("This is first tasklet step");
The course is probably using Spring Batch 4. In Spring Batch 5, those builder factories were deprecated for removal and are not exposed as beans in the application context by the #EnableBatchProcessing annotation. Here is the relevant section in the migration guide about that: JobBuilderFactory and StepBuilderFactory bean exposure/configuration.
The typical migration path from v4 to v5 in that regard is as follows:
// Sample with v4
#Configuration
#EnableBatchProcessing
public class MyJobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Bean
public Job myJob(Step step) {
return this.jobBuilderFactory.get("myJob")
.start(step)
.build();
}
}
// Sample with v5
#Configuration
#EnableBatchProcessing
public class MyJobConfig {
#Bean
public Job myJob(JobRepository jobRepository, Step step) {
return new JobBuilder("myJob", jobRepository)
.start(step)
.build();
}
}

Should Job/Step/Reader/Writer all be bean?

As fars as all the examples from the Spring Batch reference doc , I see that those objects like job/step/reader/writer are all marked as #bean, like the following:
#Bean
public Job footballJob() {
return this.jobBuilderFactory.get("footballJob")
.listener(sampleListener())
...
.build();
}
#Bean
public Step sampleStep(PlatformTransactionManager transactionManager) {
return this.stepBuilderFactory.get("sampleStep")
.transactionManager(transactionManager)
.<String, String>chunk(10)
.reader(itemReader())
.writer(itemWriter())
.build();
}
I have a scenario that the server side will receive requests and run job concurrently(different job names or same job name with different jobparameters). The usage is to new a job object(including steps/reader/writers) in concurrent threads, so I propabaly will not state the job method as #bean and new a job each time.
And there is actually a differenence on how to transmit parameters to object like reader. If using #bean , parameters must be put in e.g. JobParameters to be late binding into object using #StepScope, like the following example:
#StepScope
#Bean
public FlatFileItemReader flatFileItemReader(#Value(
"#{jobParameters['input.file.name']}") String name) {
return new FlatFileItemReaderBuilder<Foo>()
.name("flatFileItemReader")
.resource(new FileSystemResource(name))
}
If not using #bean , I can just transmit parameter directly with no need to put data into JobParameter,like the following
public FlatFileItemReader flatFileItemReader(String name) {
return new FlatFileItemReaderBuilder<Foo>()
.name("flatFileItemReader")
.resource(new FileSystemResource(name))
}
Simple test shows that no #bean works. But I want to confirm formally:
1、 Is using #bean at job/step/reader/writer mandatory or not ?
2、 if it is not mandatory, when I new a object like reader, do I need to call afterPropertiesSet() manually?
Thanks!
1、 Is using #bean at job/step/reader/writer mandatory or not ?
No, it is not mandatory to declare batch artefacts as beans. But you would want to at least declare the Job as a bean to benefit from Spring's dependency injection (like injecting the job repository reference into the job, etc) and be able to do something like:
ApplicationContext context = new AnnotationConfigApplicationContext(MyJobConfig.class);
Job job = context.getBean(Job.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
jobLauncher.run(job, new JobParameters());
2、 if it is not mandatory, when I new a object like reader, do I need to call afterPropertiesSet() manually?
I guess that by "when I new a object like reader" you mean create a new instance manually. In this case yes, if the object is not managed by Spring, you need to call that method yourself. If the object is declared as a bean, Spring will call
the afterPropertiesSet() method automatically. Here is a quick sample:
import org.springframework.beans.factory.InitializingBean;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
#Configuration
public class TestAfterPropertiesSet {
#Bean
public MyBean myBean() {
return new MyBean();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(TestAfterPropertiesSet.class);
MyBean myBean = context.getBean(MyBean.class);
myBean.sayHello();
}
static class MyBean implements InitializingBean {
#Override
public void afterPropertiesSet() throws Exception {
System.out.println("MyBean.afterPropertiesSet");
}
public void sayHello() {
System.out.println("Hello");
}
}
}
This prints:
MyBean.afterPropertiesSet
Hello

Can multi job Spring Batch app load minimum set of beans? [duplicate]

I have a Spring Batch project with multiple jobs (job A, job B, job C,...). When I run a particular job A, I got the log of the job A shows that all of the beans of job B, C,... are created too. Is there any way to avoid the creation of the other beans when job A is launched.
I have tried to use #Lazy annotation but it 's seem not working.
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
#Qualifier("springDataSource")
public DataSource springDataSource;
#Autowired
#Qualifier("batchJobDataSource")
public DataSource batchJobDataSource;
}
#Configuration
#PropertySource("classpath:partner.properties")
public class B extends BatchConfiguration {
#Value("${partnerId}")
private String partnerId;
#Lazy
#Bean
public Job ProcessB(JobCompletionNotificationListener listener) {
return jobBuilderFactory
.get("ProcessB")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(ProcessStepB())
.build();
}
#Lazy
#Bean
public Step (ProcessStepB() {
return stepBuilderFactory
.get("(ProcessStepB")
.<PartnerDTO, PartnerDTO> chunk(1)
.reader(getPartner())
.processor(process())
.writer(save())
.build();
}
#Lazy
#Bean(destroyMethod = "")
public Reader getPartner() {
return new Reader(batchJobDataSource,partnerId);
}
#Lazy
#Bean
public Processor process() {
return new Processor();
}
#Lazy
#Bean
HistoryWriter historyWriter() {
return new HistoryWriter(batchJobDataSource);
}
#Lazy
#Bean
UpdateWriter updateWriter() {
return new UpdateWriter(batchJobDataSource);
}
#Lazy
#Bean
public CompositeItemWriter<PartnerDTO> saveTransaction() {
List<ItemWriter<? super PartnerDTO>> delegates = new ArrayList<>();
delegates.add(updateWriter());
delegates.add(historyWriter());
CompositeItemWriter<PartnerDTO> itemWriter = new CompositeItemWriter<>();
itemWriter.setDelegates(delegates);
return itemWriter;
}
}
I have also put the #Lazy over the #Configuration but it does work too.
That should not be an issue. But here are a few ideas to try:
Use Spring profiles to isolate job beans
If you use Spring Boot 2.2+, try to activate the lazy bean initialization mode
Package each job in its own jar. This is the best option IMO.

Using Spring Batch to write to a Cassandra Database

As of now, I'm able to connect to Cassandra via the following code:
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;
public static Session connection() {
Cluster cluster = Cluster.builder()
.addContactPoints("IP1", "IP2")
.withCredentials("user", "password")
.withSSL()
.build();
Session session = null;
try {
session = cluster.connect("database_name");
session.execute("CQL Statement");
} finally {
IOUtils.closeQuietly(session);
IOUtils.closeQuietly(cluster);
}
return session;
}
The problem is that I need to write to Cassandra in a Spring Batch project. Most of the starter kits seem to use a JdbcBatchItemWriter to write to a mySQL database from a chunk. Is this possible? It seems that a JdbcBatchItemWriter cannot connect to a Cassandra database.
The current itemwriter code is below:
#Bean
public JdbcBatchItemWriter<Person> writer() {
JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<Person>();
writer.setItemSqlParameterSourceProvider(new
BeanPropertyItemSqlParameterSourceProvider<Person>());
writer.setSql("INSERT INTO people (first_name, last_name) VALUES
(:firstName, :lastName)");
writer.setDataSource(dataSource);
return writer;
}
Spring Data Cassandra provides repository abstractions for Cassandra that you should be able to use in conjunction with the RepositoryItemWriter to write to Cassandra from Spring Batch.
It is possible to extend Spring Batch to support Cassandra by customising ItemReader and ItemWriter.
ItemWriter example:
public class CassandraBatchItemWriter<Company> implements ItemWriter<Company>, InitializingBean {
protected static final Log logger = LogFactory.getLog(CassandraBatchItemWriter.class);
private final Class<Company> aClass;
#Autowired
private CassandraTemplate cassandraTemplate;
#Override
public void afterPropertiesSet() throws Exception { }
public CassandraBatchItemWriter(final Class<Company> aClass) {
this.aClass = aClass;
}
#Override
public void write(final List<? extends Company> items) throws Exception {
logger.debug("Write operations is performing, the size is {}" + items.size());
if (!items.isEmpty()) {
logger.info("Deleting in a batch performing...");
cassandraTemplate.deleteAll(aClass);
logger.info("Inserting in a batch performing...");
cassandraTemplate.insert(items);
}
logger.debug("Items is null...");
}
}
Then you can inject it as a #Bean through #Configuration
#Bean
public ItemWriter<Company> writer(final DataSource dataSource) {
final CassandraBatchItemWriter<Company> writer = new CassandraBatchItemWriter<Company>(Company.class);
return writer;
}
Full source code can be found in Github repo: Spring-Batch-with-Cassandra

Spring Batch Integration using Java DSL / launching jobs

I've a working spring boot/batch projet containing 2 jobs.
I'm now trying to add Integration to poll files from a remote SFTP using only java configuration / java DSL, and then launch a job.
The file polling is working but I've no idea on how to launch a Job in my flow, despite reading these links :
Spring Batch Integration config using Java DSL
and
Spring Batch Integration job-launching-gateway
some code snippets:
#Bean
public SessionFactory SftpSessionFactory()
{
DefaultSftpSessionFactory sftpSessionFactory = new DefaultSftpSessionFactory();
sftpSessionFactory.setHost("myip");
sftpSessionFactory.setPort(22);
sftpSessionFactory.setUser("user");
sftpSessionFactory.setPrivateKey(new FileSystemResource("path to my key"));
return sftpSessionFactory;
}
#Bean
public IntegrationFlow ftpInboundFlow() {
return IntegrationFlows
.from(Sftp.inboundAdapter(SftpSessionFactory())
.deleteRemoteFiles(Boolean.FALSE)
.preserveTimestamp(Boolean.TRUE)
.autoCreateLocalDirectory(Boolean.TRUE)
.remoteDirectory("remote dir")
.regexFilter(".*\\.txt$")
.localDirectory(new File("C:/sftp/")),
e -> e.id("sftpInboundAdapter").poller(Pollers.fixedRate(600000)))
.handle("FileMessageToJobRequest","toRequest")
// what to put next to process the jobRequest ?
For .handle("FileMessageToJobRequest","toRequest") I use the one described here http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html
I would appreciate any help on that, many thanks.
EDIT after Gary comment
I've added, it doesn't compile -of course- because I don't understand how the request is propagated :
.handle("FileMessageToJobRequest","toRequest")
.handle(jobLaunchingGw())
.get();
}
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher());
}
#Autowired
private JobLauncher jobLauncher;
#Bean
public JobExecution jobLauncher(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
I've found a way to launch a job using a #ServiceActivator and adding this to my flow but I'm not sure it's good practice :
.handle("lauchBatchService", "launch")
#Component("lauchBatchService")
public class LaunchBatchService {
private static Logger log = LoggerFactory.getLogger(LaunchBatchService.class);
#Autowired
private JobLauncher jobLauncher;
#ServiceActivator
public JobExecution launch(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
}
.handle(jobLaunchingGw())
// handle result
...
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher());
}
where jobLauncher() is the JobLauncher bean.
EDIT
Your service activator is doing about the same as the JLG; it uses this code.
Your jobLauncher #Bean is wrong.
#Beans are definitions; they don't do runtime stuff like this
#Bean
public JobExecution jobLauncher(JobLaunchRequest req) throws JobExecutionException {
JobExecution execution = jobLauncher.run(req.getJob(), req.getJobParameters());
return execution;
}
Since you are already autowiring a JobLauncher, just use that.
#Autowired
private JobLauncher jobLauncher;
#Bean
public MessageHandler jobLaunchingGw() {
return new JobLaunchingGateway(jobLauncher);
}