Spring Batch repeat step ifinitly - spring-batch

i want to ask you guys if there is a way to make a spring batch always be running and doing the same step over and over again but with a time lapse between this loops if the reader didn't find anything to do.
for example my spring batch read from database then do some updates on the list i got from my database. now i want this spring batch to do the same thing again, if he found new lines in database he will do the update otherwise he should wait some seconds then do the same read again and again.
My solution is this it works but i don't know if its the best practice to do.
i made my step call itself in the next step with causes an infinite loop.
then in my reader if he found data from database he will continue processing otherwise i'm doing a thread.sleep().
<job id="jobUpdate" xmlns="http://www.springframework.org/schema/batch">
<step id="updates" next="updates">
<tasklet>
<chunk reader="reader.." processor="processor..."
writer="writer..." commit-interval="1" />
</tasklet>
</step>
</job>
// my reader waiting code if the list is empty.
if(myList.isEmpty()) {
try {
Thread.sleep(constantes.WAIT_TIME_BATCH_RERUN);
System.out.println("im sleeping");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}

Related

Spring batch jsr 352 how to prevent partitioned job from leaving thread alive which prevent process from ending

Let me explain how my app is set up. First I have a stand alone command line started app that runs a main which in turn calls start on a job operator passing the appropriate params. I understand the start is an async call and once I call start unless I block some how in my main it dies.
My problem I have run into is when I run a partitioned job it appears to leave a few threads alive which prevents the entire processing from ending. When I run a non-partitioned job the process ends normally once the job has completed.
Is this normal and/or expected behavior? Is there a way to tell the partitioned threads to die. It seems that the partitioned threads are blocked waiting on something once the job has completed and they should not be?
I know that I could monitor for batch status in the main and possibly end it but as I stated in another question this adds a ton of chatter to the db and is not ideal.
An example of my job spec
<job id="partitionTest" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
<step id="onlyStep">
<partition>
<plan partitions="2">
<properties partition="0">
<property name="partitionNumber" value="1"></property>
</properties>
<properties partition="1">
<property name="partitionNumber" value="2"></property>
</properties>
</plan>
</partition>
<chunk item-count="2">
<reader id="reader" ref="DelimitedFlatFileReader">
<properties>
<!-- Reads in from file Test.csv -->
<property name="fileNameAndPath" value="#{jobParameters['inputPath']}/CSVInput#{partitionPlan['partitionNumber']}.csv" />
<property name="fieldNames" value="firstName, lastName, city" />
<property name="fullyQualifiedTargetClass" value="com.test.transactionaltest.Member" />
</properties>
</reader>
<processor ref="com.test.partitiontest.Processor" />
<writer ref="FlatFileWriter" >
<properties>
<property name="appendOn" value="true"/>
<property name="fileNameAndPath" value="#{jobParameters['outputPath']}/PartitionOutput.txt" />
<property name="fullyQualifiedTargetClass" value="com.test.transactionaltest.Member" />
</properties>
</writer>
</chunk>
</step>
</job>
Edit:
Ok reading a bit more about this issue and looking into the spring batch code, it appears there is a bug at least in my opinion in the JsrPartitionHandler. Specifically the handle method creates a ThreadPoolTaskExecutor locally but then that thread pool is never cleaned up properly. A shutdown/destroy should be called before that method returns in order to perform some clean up otherwise the threads get left in memory and out of scope.
Please correct me if I am wrong here but that definitely seems like what the problem is.
I am going and try to make a change regarding it and see how it plays out. I'll update after I have done some testing.
I have confirmed this issue to be a bug (still in my opinion atm) in the spring batch core lib.
I have created a ticket over at the spring batch jira site. There is a simple attached java project to the ticket that confirms the issue I am seeing. If any one else runs into the problem they should refer to that ticket.
I have found a temporary work around that just uses a wait/notify scheme and it seems once added that the pooled threads shut down. I'll add each of the classes/code and try and explain what I did.
In main thread/class, this was code that lived in the main method or a method called from main
while(!ThreadNotifier.instance(this).getNotify()){
try {
synchronized(this){
System.out.println("WAIT THREAD IS =======" + Thread.currentThread().getName());
wait();
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
This is the ThreadNotifier class
public class ThreadNotifier {
private static ThreadNotifier tn = null;
private boolean notification = false;
private Object o;
private ThreadNotifier(Object o){
this.o = o;
}
public static ThreadNotifier instance(Object o){
if(tn == null){
tn = new ThreadNotifier(o);
}
return tn;
}
public void setNotify(boolean value){
notification = true;
synchronized(o){
System.out.println("NOTIFY THREAD IS =======" + Thread.currentThread().getName());
o.notify();
}
}
public boolean getNotify(){
return notification;
}
}
And lastly this is a job listener that I used to provide the notification back
public class PartitionWorkAround implements JobListener {
#Override
public void beforeJob() throws Exception {
// TODO Auto-generated method stub
}
#Override
public void afterJob() throws Exception {
ThreadNotifier.instance(null).setNotify(true);
}
}
This best I could come up with until the issue is fixed. For reference I used knowledge learned about guarded blocks here to figure out a way to do this.

Spring Batch Jsr 352, manage processor skip outside/before skip listener

I am trying to find a way to manage a skip scenario in the process listener (or could be read or write listener as well). What I have found is the skip listener seems to be executed after the process listener's on error method. This means that I might be handling the error in some way with out knowledge that it is an exception to be skipped.
Is there some way to know that a particular exception is being skipped out side the skip listener? Something that could be pulled into the process listener or possibly else where.
The best approach I found to do this was just to add property to the step and then wire in the step context where i needed it.
<step id="firstStep">
<properties> <property name="skippableExceptions" value="java.lang.IllegalArgumentException"/> </properties>
</step>
This was not a perfect solution but the skip exceptions only seem to be set in StepFactoryBean and Tasklet and are not directly accessible.
For code in my listeners
#Inject
StepContext stepContext;
.
.
.
Properties p = stepContext.getProperties();
String exceptions = p.getProperty("skippableExceptions");

Spring Batch pause and then continue

I'm writing a job that will read from an excel file, x number of rows and then I'd like it to pause for an hour before it continues with the next x number of rows.
How do I do this?
I have a job.xml file which contains the following. The subscriptionDiscoverer fetches the file and pass it over to the processor. The subscriptionWriter should write another file when the processor is done.
<job id="subscriptionJob" xmlns="http://www.springframework.org/schema/batch" incrementer="jobParamsIncrementer">
<validator ref="jobParamsValidator"/>
<step id="readFile">
<tasklet>
<chunk reader="subscriptionDiscoverer" processor="subscriptionProcessor" writer="subscriptionWriter" commit-interval="1" />
</tasklet>
</step>
</job>
Is there some kind of timer I could use or is it some kind of flow structure? It's a large file of about 160000 rows that should be processed.
I hope someone has a solution they would like to share.
Thank you!
I'm thinking of two possible approaches for you to start with:
Stop the job, and restart again (after an hour) at the last position. You can start by taking a look on how to change the BatchStatus to notify your intent to stop the job. See http://docs.spring.io/spring-batch/2.0.x/cases/pause.html or look at how Spring Batch Admin implements its way of communicating the PAUSE flag (http://docs.spring.io/spring-batch-admin/reference/reference.xhtml). You may need to implement some persistence to store the position (row number) for the job to know where to start processing again. You can use a scheduler as well to restart the job.
-or-
Add a ChunkListener and implement the following in afterChunk(ChunkContext context): Check if x number of rows has been read so far, and if yes, implement your pause mechanism (e.g., a simple Thread.sleep or look for more consistent way of pausing the step). To check for the number of rows read, you may use StepExecution.getReadCount() from ChunkContext.getStepContext().StepExecution().
Do note that afterChunk is called outside the transaction as indicated in the javadoc:
Callback after the chunk is executed, outside the transaction.

Store huge data at Chunk Level in Spring batch

I am new to spring and I am not having much knowledge about Spring ,please help me to solve this out .My use case is : we are using Spring batch with chunk oriented processing to process data.
At the end of each processed chunk (ie, once commit interval is met and values passed to writer), the List of values has to be stored so that once the whole tasklet is completed , the stored list of values has to be used for writing the values to a csv file. If any job failure has happened in the chunk processing then writing list of values to file should not happen.
Is there any way to store the huge data at chunk level and then finally processing those at next step/tasklet or in any other way?
Don't store all data in memory; is a bad pratice for a batch application.
An alternative can be create a standard read/process/write step where you write to your csv file a processed chunk.
When a job error occurs, stop job and delete your csv file (you will get the same result as not write it at all).
I think you reach your goals whitout memory issues.
I would suggest a different approach as from my point of view you are trying to work with spring batch in a way it was not planned to work.
Process the data chunk by chunk and write every chunk to csv using FlatFileItemWriter.
Use a file name that marks it as temp.
Wrap your step with a listener and use OnProcessError hooks.
When hitting OnProcessError log the failed item
Add a conditional flow for success and failure see here
In case of delete temp file
In case of success rename file
You may use SystemCommandTasklet or implement your own tasket for 6 and 7
Your listener will look similar to the one below
#Component
public class PromoteUpdateCountToJobContextListener implements StepListener {
#OnProcessError
public ExitStatus processError(Object item, Exception e){
String failureMessage = String.format("Failed to process due to item %s" ,
item.toString());
Logger.error(failureMessage);
return ExitStatus.FAILED;
}
}
Your Job xml will be similar to:
<batch:job>
<batch:step id="processData">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="someReader"
writer="yourFlatFileItemWriter"/>
</batch:tasklet>
<batch:next on="*" to="renameTempCsv" />
<batch:next on="FAILED" to="deleteTempCsv" />
<batch:listeners>
<batch:listener ref="lineCurserListener" />
</batch:listeners>
</batch:step>
<batch:step id="deleteTempCsv">
<batch:tasklet ref="deleteTempCsvTasklet"/>
</batch:step>
<batch:step id="renameTempCsv">
<batch:tasklet ref="renameTempCsvTasklet"/>
</batch:step>
</batch:job>

schedule a trigger every minute, if job still running then standby and wait for the next trigger

I need to schedule a trigger to fire every minute, next minute if the job is still running the trigger should not fire and should wait another minute to check, if job has finished the trigger should fire
Thanks
In Quartz 2, you'll want to use the DisallowConcurrentExecution attribute on your job class. Then make sure that you set up a key using something similar to TriggerBuilder.Create().WithIdentity( "SomeTriggerKey" ) as DisallowConcurrentExecution uses it to determine if your job is already running.
[DisallowConcurrentExecution]
public class MyJob : IJob
{
...
}
I didnt find any thing about monitor.enter or something like that, thanks any way
the other answer is that the job should implement the 'StatefulJob' interface. As a StatefulJob, another instance will not run as long as one is already running
thanks again
IStatefulJob is the key here. Creating own locking mechanisms may cause problems with the scheduler as you are then taking part in the threading.
If you're using Quartz.NET, you can do something like this in your Execute method:
object execution_lock = new object();
public void Execute(JobExecutionContext context) {
if (!Monitor.TryEnter(execution_lock, 1)) {
return,
}
// do work
Monitor.Exit(execution_lock);
}
I pull this off the top of my head, maybe some names are wrong, but that's the idea: lock on some object while you're executing, and if upon execution the lock is on, then a previous job is still running and you simply return;
EDIT: the Monitor class is in the System.Threading namespace
If you are using spring quartz integration, you can specify the 'concurrent' property to 'false' from MethodInvokingJobDetailFactoryBean
<bean id="positionFeedFileProcessorJobDetail" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean">
<property name="targetObject" ref="xxxx" />
<property name="targetMethod" value="xxxx" />
<property name="concurrent" value="false" /> <!-- This will not run the job if the previous method is not yet finished -->
</bean>