How can I use Spring Cloud Task with modular Spring Batch jobs? - spring-batch

I've developed couple of Spring Batch jobs in Spring Boot web application. For easier maintenance, I used #EnableBatchProcessing(modular = true) like this:
#Configuration
#EnableBatchProcessing(modular = true)
public class BatchConfiguration {
#Bean
#ConditionalOnProperty(value="first.batch.enabled", havingValue = "true")
public ApplicationContextFactory firstJobs() {
return new GenericApplicationContextFactory(FirstModule.class);
}
#Bean
#ConditionalOnProperty(value="second.batch.enabled", havingValue = "true")
public ApplicationContextFactory secondJobs() {
return new GenericApplicationContextFactory(SecondModule.class);
}
...
}
and I have #Configuration classes, one for every Job defined respectively in base directories of modules. Everything works fine, but now I want to setup Spring Cloud Dataflow UI to have some basic monitoring of my Jobs.
The problem is, when I try to add #EnableTask annotation to this class BatchConfiguration, Spring is not associating job execution with task execution. It is only working, when I run tests (#SpringBatchTest).
I also tried to add #EnableTask annotation to FirstJobConfiguration class instead, and also add it to both BatchConfiguration and First|JobConfiguration, but with no effect. I also went through official documentation, but found nothing.
Is it possible to use Spring Cloud Task with modular Spring Batch ?
Thanks.

The Spring Batch application can be converted into a Spring Cloud Task application and be managed using Spring Cloud Data Flow.
You should be able to use EnableTask in your SpringBoot application along with EnableBatchProcessing in your configuration.
Please see the SCDF sample which demonstrates this scenario.

Related

Spring Batch fails with Transaction error when triggered by Spring Integration

I have a simple Spring Boot (2.6.4) app that uses Spring Batch and Spring Integration to read data from a file and emit Kafka events. I have configured Spring Integration to poll a configurable directory and trigger the Spring Batch Job as soon as a new file arrives.
As soon as the Job is "kicked" I get this error:
Existing transaction detected in JobRepository. Please fix this and try again (e.g. remove #Transactional annotations from client).
I assume this is triggered by the following Spring Integration configuration:
return IntegrationFlows.from(fileReadingMessageSource,
c -> c.poller(Pollers.fixedDelay(period)
.taskExecutor(taskExecutor)
.maxMessagesPerPoll(maxMessagesPerPoll)
.transactionSynchronizationFactory(transactionSynchronizationFactory())
.transactional(new PseudoTransactionManager())))
The TransactionSynchronizationFactory is configured to move the incoming file into an error or success directory, based on the outcome of the job.
My understanding is that a Spring Batch Job does not like to be executed within the context of a Transaction, since it needs to manage its own transaction to deal with multiple steps failures.
I also understand that I could set the value of setValidateTransactionState of JobRepositoryFactoryBean to false, but I would rather not mess around with the Spring Batch internals.
As suggested elsewhere, I have also tried to use an Async Job Launcher, but no luck.
#Bean
public JobLauncher getJobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.setJobRepository(this.jobRepository);
return jobLauncher;
}
The app is using an in-memory H2 data-source, but that doesn't seem to affect the error.
I'm unable to find a solution to this problem, any recommendation?
Edit - full project here: https://github.com/luciano-fiandesio/spring-batch-and-integration-demo

Spring Batch: How to start a job but not execute it, but execute it in an other java instance

I plan to use Spring Batch. We like to initiate new jobs executions in our pods that are answering the frontend requests.
Pseudo Code:
#PostMapping(path = "/request-report/{id}")
public void requestReport(String id){
this.jobOperator.start("reportJob", new Properties("1"));
}
But we don't want the job to be executed in the frontend pod. For that we like to build a separate micro service pod.
I see the following solutions:
do a rest call from the frontend pod to the spring-batch pod and start the job there. i could do this, but if possible, i like to skip that step and integrate it over the spring batch db.
in the frontend pod i create JobLauncher that has SimpleAsyncTaskExecutor with size zero. So it will never execute a job.
https://docs.spring.io/spring-batch/docs/current/reference/html/job.html#configuringJobLauncher
#Bean
public JobLauncher jobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository());
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
in the front end pod, i do not use the BatchAutoConfiguration but leaf out some stuff, but what?
I think i also have to write some software that scans the job table and check if there is a not started job present, and start is again in the spring-batch-pod.
Thanks for you help!
If you configure your job launcher with a SimpleAsyncTaskExecutor, it will start jobs in separate threads in the same JVM (hence in the same container and pod).
So unless you provide a custom TaskExecutor that launches jobs in separate JVM/container/pod, you would need to change your architecture to use a job queue. The section Launching Batch Jobs through Messages from the reference docs shows how to setup such pattern.

How to explicitly configure TaskBatchExecutionListener in my spring batch application

My spring batch application is not inserting relationship between task and job in TASK_TASK_BATCH table.
Spring doc says :
Associating A Job Execution To The Task In Which It Was Executed
Spring Boot provides facilities for the execution of batch jobs easily
within an über-jar. Spring Boot’s support of this functionality allows
for a developer to execute multiple batch jobs within that execution.
Spring Cloud Task provides the ability to associate the execution of a
job (a job execution) with a task’s execution so that one can be
traced back to the other.
This functionality is accomplished by using the TaskBatchExecutionListener. By default, this listener is auto configured in any context that has both a Spring Batch Job configured (via having a bean of type Job defined in the context) and the spring-cloud-task-batch jar is available within the classpath. The listener will be injected into all jobs."
I have all the required jars in my classpath.It's just that I am creating jobs and tasklets dynamically so not using any annotation. As per the doc TaskBatchExecutionListener is responsible for creating mapping in TASK_TASK_BATCH table by calling taskBatchDao's saveRelationship method.
I am just not able to figure out how to configure TaskBatchExecutionListener explicitly in my spring batch application.
If you have the org.springframework.cloud:spring-cloud-task-batch dependency, and the annotation #EnableTask is present, then your application context contains a TaskBatchExecutionListener bean that you can inject into your class that dynamically creates the jobs and tasklets.
That might look similar to this:
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
TaskBatchExecutionListener taskBatchExecutionListener;
Job createJob() throws Exception {
return jobBuilderFactory
.get("myJob")
.start(createStep())
.listener(taskBatchExecutionListener)
.build();
}
I hope that helps. Otherwise please share some minimal code example to demonstrate what you're trying to do.

Launching Spring Batch Task from Spring Cloud Data Flow

I am having a web based spring batch application. My Batch Job will be kick started on an API call. Here is my method exposed as a web service.
#RequestMapping(value = "/v1/initiateEntityCreation", method = RequestMethod.GET)
public String initiateEntityCreation()
throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersInvalidException, NoSuchJobException, JobInstanceAlreadyExistsException {
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addDate("Date", new Date());
Long executionContext = jobOperator.start("InitiateEntityCreation", String.format("Date=%s", new Date()));
return executionContext.toString();
}
My batch job is working fine and i have a Mysql Instance as my Job Repository. I have integrated Spring Cloud data flow to my batch application. I have my #EnableTask annotation and all necessary dependencies. I have connected my Spring Cloud data local server to my spring batch jon Repository instance.
Here is what my command line argument for SCDF.
java -jar spring-cloud-dataflow-server-local-1.2.3.RELEASE.jar --
spring.datasource.url=jdbc:mysql://localhost:3306/springbatchdb--
spring.datasource.username=root --spring.datasource.password=password --
spring.datasource.driver-class-name=org.mariadb.jdbc.Driver
My Local server is running and capturing all the job execution instances. I have registered my spring batch application to SCDF and defined a task for SCDF with the definition.
When am trying to launch the job from SCDF, am getting "Task successfully executed". But my job is not getting executed.
If I check task executions, am seeing like
StartTime N/A, EndTime N/A and if I drill down to task execution there are not batch jobs that have been run. Please let me know how we can launch a web based spring batch job using Spring Cloud data flow.

Autowiring multiple data sources with jpa and non-jpa characteristics using Spring Boot

I have two database configuration javaconfig classes - one for JPA purposes with transactionManager() and entityManagerFactory() #Bean methods and one config class for non-JPA JDBCTemplate based query submission to access data from that database. The overall idea is to read using JDBCTemplate to read data and persist the data, after transformation, into the JPA based datasource. I am using Spring Boot to enable auto configuration. My test fails with:
java.lang.IllegalArgumentException: Not an managed type:
I have both spring-boot-starter-jdbc and spring-boot-starter-data-jpa in my build.gradle. My gut feeling is that the two data sources are in collision course with each other. How do I enforce the use of each of these datasources for the two use-cases that I mentioned earlier - one for JPA and another for JDBCTemplate purposes?
Details (Added after Dave's reply):
My service classes have been annotated with #Service and my repository classes have #Repository. Service uses repository objects using #Autowired though there are some services that are JDBCTemplate-based for data retrieval.
More complex depiction of my environment goes as follows (logically): JDBCTemplate(DataSource(Database(DB2)))--> Spring Batch Item Reader;Processors; Writer --> Service(Repository(JPADataSource(Database(H2)))). Spring batch item processors connect to both databases using services. For spring batch, I am using a H2 Job repo database (remote) to hold job execution details. Does this make sense? For Spring batch, I am using de.codecentric:spring-boot-starter-batch-web:1.0.0.RELEASE. After getting past the entityManagerFactory bean not found errors, I want to have control over the wiring of the above components.
I don't think it's anything to do with the data sources. The log says you have a JPA repository for a type that is not an #Entity. Repositories are automatically scanned from the package you define #EnableAutoConfiguration by default. So one way to control it is to move the class that has that annotation to a different package. In Boot 1.1 you can also set "spring.data.jpa.repositories.enabled=false" to switch off the scan if you don't want it. Or you can use #EnableJpaRepositories as normal.