Store huge data at Chunk Level in Spring batch - spring-batch

I am new to spring and I am not having much knowledge about Spring ,please help me to solve this out .My use case is : we are using Spring batch with chunk oriented processing to process data.
At the end of each processed chunk (ie, once commit interval is met and values passed to writer), the List of values has to be stored so that once the whole tasklet is completed , the stored list of values has to be used for writing the values to a csv file. If any job failure has happened in the chunk processing then writing list of values to file should not happen.
Is there any way to store the huge data at chunk level and then finally processing those at next step/tasklet or in any other way?

Don't store all data in memory; is a bad pratice for a batch application.
An alternative can be create a standard read/process/write step where you write to your csv file a processed chunk.
When a job error occurs, stop job and delete your csv file (you will get the same result as not write it at all).
I think you reach your goals whitout memory issues.

I would suggest a different approach as from my point of view you are trying to work with spring batch in a way it was not planned to work.
Process the data chunk by chunk and write every chunk to csv using FlatFileItemWriter.
Use a file name that marks it as temp.
Wrap your step with a listener and use OnProcessError hooks.
When hitting OnProcessError log the failed item
Add a conditional flow for success and failure see here
In case of delete temp file
In case of success rename file
You may use SystemCommandTasklet or implement your own tasket for 6 and 7
Your listener will look similar to the one below
#Component
public class PromoteUpdateCountToJobContextListener implements StepListener {
#OnProcessError
public ExitStatus processError(Object item, Exception e){
String failureMessage = String.format("Failed to process due to item %s" ,
item.toString());
Logger.error(failureMessage);
return ExitStatus.FAILED;
}
}
Your Job xml will be similar to:
<batch:job>
<batch:step id="processData">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="someReader"
writer="yourFlatFileItemWriter"/>
</batch:tasklet>
<batch:next on="*" to="renameTempCsv" />
<batch:next on="FAILED" to="deleteTempCsv" />
<batch:listeners>
<batch:listener ref="lineCurserListener" />
</batch:listeners>
</batch:step>
<batch:step id="deleteTempCsv">
<batch:tasklet ref="deleteTempCsvTasklet"/>
</batch:step>
<batch:step id="renameTempCsv">
<batch:tasklet ref="renameTempCsvTasklet"/>
</batch:step>
</batch:job>

Related

Spring Batch repeat step ifinitly

i want to ask you guys if there is a way to make a spring batch always be running and doing the same step over and over again but with a time lapse between this loops if the reader didn't find anything to do.
for example my spring batch read from database then do some updates on the list i got from my database. now i want this spring batch to do the same thing again, if he found new lines in database he will do the update otherwise he should wait some seconds then do the same read again and again.
My solution is this it works but i don't know if its the best practice to do.
i made my step call itself in the next step with causes an infinite loop.
then in my reader if he found data from database he will continue processing otherwise i'm doing a thread.sleep().
<job id="jobUpdate" xmlns="http://www.springframework.org/schema/batch">
<step id="updates" next="updates">
<tasklet>
<chunk reader="reader.." processor="processor..."
writer="writer..." commit-interval="1" />
</tasklet>
</step>
</job>
// my reader waiting code if the list is empty.
if(myList.isEmpty()) {
try {
Thread.sleep(constantes.WAIT_TIME_BATCH_RERUN);
System.out.println("im sleeping");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}

Veins: Get Tripinfo and emissions in output

I'm using veins 4.6 with omnetpp 5.1.1 and trying to output tripinfo of vehicles using following configurations in .sumocfg file:
<input>
<net-file value="erlangen.net.xml"/>
<route-files value="erlangen.rou.xml"/>
<additional-files value="erlangen.poly.xml"/>
</input>
<time>
<begin value="0"/>
<end value="300"/>
<step-length value="0.1"/>
</time>
<report>
<no-step-log value="true"/>
</report>
<gui_only>
<start value="true"/>
</gui_only>
<emissions>
<device.emissions.probability value="1"/>
</emissions>
<output>
<tripinfo-output value="erlangen.trip_info.xml"/>
<fcd-output value="erlangen.fcd.xml"/>
</output>
I have generated 30 random trips for example network, set emissionClass="HBEFA3/LDV_G_EU4" attribute of vType element. When I run simulation directly in SUMO then on successful completion it generates required trip info file:
<tripinfo id="0" depart="0.00" departLane="4006674#0_0" departPos="5.10" departSpeed="0.00" departDelay="0.00" arrival="202.40" arrivalLane="-4006726#0_0" arrivalPos="281.67" arrivalSpeed="13.76" duration="202.40" routeLength="2214.00" waitSteps="0" timeLoss="28.90" rerouteNo="0" devices="tripinfo_0 emissions_0" vType="passenger" speedFactor="1.00" vaporized="">
<emissions CO_abs="16453.885943" CO2_abs="591255.824603" HC_abs="76.174970" PMx_abs="24.476562" NOx_abs="123.285735" fuel_abs="254.203634" electricity_abs="0"/>
</tripinfo>
...
<tripinfo id="29" depart="29.00" departLane="29900564#4_0" departPos="5.10" departSpeed="0.00" departDelay="0.00" arrival="226.10" arrivalLane="-31241838#0_0" arrivalPos="18.39" arrivalSpeed="22.13" duration="197.10" routeLength="2353.60" waitSteps="0" timeLoss="23.99" rerouteNo="0" devices="tripinfo_29 emissions_29" vType="passenger" speedFactor="1.00" vaporized="">
<emissions CO_abs="16826.605518" CO2_abs="612826.831847" HC_abs="78.478455" PMx_abs="25.328690" NOx_abs="126.946877" fuel_abs="263.477812" electricity_abs="0"/>
</tripinfo>
But when I debug the same as OMNET++ Simulation then it finishes with following notification and trip info file is not generated.
I set the simulation time to 300s in both .sumocfg and omnetpp.ini (sim-time-limit = 300s), screenshots shows that all departed vehicles were arrived at 285.900 s and at the same time simulation stopped with the notification. I have observed this issue multiple time by changing the number of random trips and simulation time again and again but all in vain.
Here it is clearly stated that:
The information is generated for each vehicle as soon as the vehicle arrived at its destination and is removed from the network.
But that is not the case with me. Please guide what i'm doing wrong. Thanks
You most likely ran SUMO via sumo-launchd.py, which creates a temporary copy of your scenario (in /tmp). After the scenario ran, the copy is deleted. This means, if you are logging to the directory that the SUMO simulation is executing in, your logged data will be cleaned along with the temporary copy.
There are three ways of preventing that:
Run sumo-launchd.py with a command line switch that disables deletion of the temporary directory, or
Configure SUMO to store the statistics somewhere else, or
Use a different way of launching SUMO (manually or using the TraCI ScenarioManagerForker)

Spring Batch Jsr 352, manage processor skip outside/before skip listener

I am trying to find a way to manage a skip scenario in the process listener (or could be read or write listener as well). What I have found is the skip listener seems to be executed after the process listener's on error method. This means that I might be handling the error in some way with out knowledge that it is an exception to be skipped.
Is there some way to know that a particular exception is being skipped out side the skip listener? Something that could be pulled into the process listener or possibly else where.
The best approach I found to do this was just to add property to the step and then wire in the step context where i needed it.
<step id="firstStep">
<properties> <property name="skippableExceptions" value="java.lang.IllegalArgumentException"/> </properties>
</step>
This was not a perfect solution but the skip exceptions only seem to be set in StepFactoryBean and Tasklet and are not directly accessible.
For code in my listeners
#Inject
StepContext stepContext;
.
.
.
Properties p = stepContext.getProperties();
String exceptions = p.getProperty("skippableExceptions");

Spring Batch pause and then continue

I'm writing a job that will read from an excel file, x number of rows and then I'd like it to pause for an hour before it continues with the next x number of rows.
How do I do this?
I have a job.xml file which contains the following. The subscriptionDiscoverer fetches the file and pass it over to the processor. The subscriptionWriter should write another file when the processor is done.
<job id="subscriptionJob" xmlns="http://www.springframework.org/schema/batch" incrementer="jobParamsIncrementer">
<validator ref="jobParamsValidator"/>
<step id="readFile">
<tasklet>
<chunk reader="subscriptionDiscoverer" processor="subscriptionProcessor" writer="subscriptionWriter" commit-interval="1" />
</tasklet>
</step>
</job>
Is there some kind of timer I could use or is it some kind of flow structure? It's a large file of about 160000 rows that should be processed.
I hope someone has a solution they would like to share.
Thank you!
I'm thinking of two possible approaches for you to start with:
Stop the job, and restart again (after an hour) at the last position. You can start by taking a look on how to change the BatchStatus to notify your intent to stop the job. See http://docs.spring.io/spring-batch/2.0.x/cases/pause.html or look at how Spring Batch Admin implements its way of communicating the PAUSE flag (http://docs.spring.io/spring-batch-admin/reference/reference.xhtml). You may need to implement some persistence to store the position (row number) for the job to know where to start processing again. You can use a scheduler as well to restart the job.
-or-
Add a ChunkListener and implement the following in afterChunk(ChunkContext context): Check if x number of rows has been read so far, and if yes, implement your pause mechanism (e.g., a simple Thread.sleep or look for more consistent way of pausing the step). To check for the number of rows read, you may use StepExecution.getReadCount() from ChunkContext.getStepContext().StepExecution().
Do note that afterChunk is called outside the transaction as indicated in the javadoc:
Callback after the chunk is executed, outside the transaction.

schedule a trigger every minute, if job still running then standby and wait for the next trigger

I need to schedule a trigger to fire every minute, next minute if the job is still running the trigger should not fire and should wait another minute to check, if job has finished the trigger should fire
Thanks
In Quartz 2, you'll want to use the DisallowConcurrentExecution attribute on your job class. Then make sure that you set up a key using something similar to TriggerBuilder.Create().WithIdentity( "SomeTriggerKey" ) as DisallowConcurrentExecution uses it to determine if your job is already running.
[DisallowConcurrentExecution]
public class MyJob : IJob
{
...
}
I didnt find any thing about monitor.enter or something like that, thanks any way
the other answer is that the job should implement the 'StatefulJob' interface. As a StatefulJob, another instance will not run as long as one is already running
thanks again
IStatefulJob is the key here. Creating own locking mechanisms may cause problems with the scheduler as you are then taking part in the threading.
If you're using Quartz.NET, you can do something like this in your Execute method:
object execution_lock = new object();
public void Execute(JobExecutionContext context) {
if (!Monitor.TryEnter(execution_lock, 1)) {
return,
}
// do work
Monitor.Exit(execution_lock);
}
I pull this off the top of my head, maybe some names are wrong, but that's the idea: lock on some object while you're executing, and if upon execution the lock is on, then a previous job is still running and you simply return;
EDIT: the Monitor class is in the System.Threading namespace
If you are using spring quartz integration, you can specify the 'concurrent' property to 'false' from MethodInvokingJobDetailFactoryBean
<bean id="positionFeedFileProcessorJobDetail" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean">
<property name="targetObject" ref="xxxx" />
<property name="targetMethod" value="xxxx" />
<property name="concurrent" value="false" /> <!-- This will not run the job if the previous method is not yet finished -->
</bean>