How to commit a file(entire file) in spring batch without using chunks - commit interval? - spring-batch

Commit interval will commit the data at specified intervals. I want to commit the entire file at a single shot since my requirement is to validate the file (line by line) and if it fails at any point . roll back. no commit. is there any way to achieve this in spring batch?

You can either set your commit-interval to Integer.MAX_VALUE (231-1) or create your own CompletionPolicy.
Here's how you configure a step to use a custom CompletionPolicy :
<chunk reader="reader" writer="writer" chunk-completion-policy="completionPolicy"/>
<bean id="completionPolicy" class="xx.xx.xx.CompletionPolicy"/>
Then you have to either choose an out-of-the-box CompletionPolicy provided by Spring Batch (a list of implementations is available on previous link) or create your own.

What do you mean by "commit"?
You are talking about validating and not about writing the read data to another file or into database.
As mentioned in the comment by Michael Prarlow, memory problems could arise, if the size of the file changes.
In order to prevent this, I would suggest to start your job with a validation step. Simply read the data chunkwise, check the data line by line in your processor and throw a none-skippable exception, if the line is not valid. Use a passthroughwriter, so nothing is persisted. If there is a problem, the whole job will fail.
If you really have to write the data into a db or another file, you could do this in a second step. Since you have validated your data, you shouldn't observe any problems.
Simple PassThroughItemWriter
public class PassThroughItemWriter<T> implements ItemWriter<T> {
public void write(List<? extends T> items) {
// do nothing
}
}
or, if you use the Java-Api to build your job and steps, you could simply use a lambda:
stepBuilders.get("step")
.<..., ...>chunk(..)
.reader(...)
.processor(...) // your processor with the validation logic
.writer(items -> {}) // empty lambda expression
.build();

Related

How to send message by message to Kafka

I'm new to reactive programming and I try to implement a very basic scenario.
I want to send a message to kafka each time a file is dropped to a specific folder.
I think that I don't understand well the basics things... so please could you help me?
So I have a few questions :
What is the difference between smallrye-reactive-messaging and smallrye-reactive-streams-operators ?
I have this simple code :
#Outgoing( "my-topic" )
public PublisherBuilder<Message<MessageWrapper>> generate() {
if(Objects.isNull(currentMessage)){
//currentMessage is an instance variable which is null when I start the application
return ReactiveStreams.of(new MessageWrapper()).map(Message::of);
}
else {
//currentMessage has been correctly set with the file information
LOGGER.info(currentMessage);
return ReactiveStreams.of(currentMessage).map(Message::of);
}
}
When the code goes in the if statement, everything is ok and I got a JSON serialization of my object will null values. However I don't understand why when my code goes to the else statement, nothing goes to the topic? It seems that the .of instructions of the if statement has broke the streams or something like that...
How to keep a continuous streams that 'react' to the new dropped files ? (or other events like HTTP GET request or something like that) ...
If I don't return an instance of PublisherBuilder but an Integer for example, then my kafka topic will be populated by a very huge stream of Integer value. This is why examples are using some intervals when sending messages...
Should I use some CompletationStage or CompletableFuture ? RxJAva2? It's a bit confusing which lib to use (vertx, smallrye, rxjava2, microprofile, ...)
What are the differences between :
ReactiveStreams.fromCompletionStage
ReactiveStreams.fromProcessor
ReactiveStreams.fromPublisher
ReactiveStreams.fromSubscriber
Which one to use on which scenario ?
Thank you very much !
Let's start with the difference between smallrye-reactive-messaging & smallrye-reactive-streams-operators: smallrye-reactive-streams-operators is the same as smallrye-reactive-messaging but in addition it has a support to MicroProfile-context-propagation. Since most reactive-messaging providers use Vert.x behind the scene, it will process your message in an event-loop style, which means it will run in separate thread. Sometimes you need to propagate some ctx from your base thread into the new thread (ex: populating CDI and Tx context to execute some JPA Entity manager logic). Here where ctx propagation help.
For method signatures. You can take a look at the official documentation of SmallRye-reactive-streams sections 3,4 & 5. Each one has a different use case. It is up to you which flavor do you want to use.
When to use what ? If you are not running within reactive context, you can use the below to send messages.
#Inject
#Channel("my-channel")
Emitter emitter;
For Message consumption you can use method signature like this :
#Incoming("channel-2")
public CompletionStage doSomething(Message anEvent)
Or
#Incoming("channel-2")
public void doSomething(String anEvent)
Hope that helps.

Spring Batch: How do I use decider based on my Reader result?

I am new to Spring Batch. I found out that by use of ExecutionContextPromotionListener, I can set up key value pairs and get them in future steps.
<step id="step1">....</step>
<decision id="decision1">.... />
When I used Tasklet instead of reader, I did following:
Created bean of ExecutionContextPromotionListener with expected keys in my batch config file.
Registered the Listener to my step.
Put the key-value pairs in executionContext retrieved from chunkContext inside my Tasklet like below:
Now the Decider can read from the step execution context as follows and decide.
But, I want to take decision based on Reader from previous step. So, in my decision, how to get the value from Reader? Or is my approach wrong. Please suggest.
simple way is you can make use of Execution context from your step and can pass on the values into the next step .
So, in you first step do something like this
// ...
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("DATA_KEY", dataToShare );
then in your next step you can read this using the execution context.
ExecutionContext jobContext = jobExecution.getExecutionContext();
dataToShare = jobContext.get("DATA_KEY");
you just need to manage the keys - the key that you use to put in first step and read it in next step

Spring step does not run properly when I "fib" the reader, must I use a tasklet?

I'm aware that all spring steps need to have a reader, a writer, and optionally a processor. So even though my step only needs a writer, I am also fibbing a reader that does nothing but make spring happy.
This is based on the solution found here. Is it outdated, or am I missing something?
I have a spring batch job that has two chunked steps. My first step, deleteCount, is just deleting all rows from the table so that the second step has a clean slate. This means my first step doesn't need a reader, so I followed the above linked stackoverflow solution and created a NoOpItemReader, and added it to my stepbuilder object (code at the bottom).
My writer is mapped to a simple SQL statement that deletes all the rows from the table (code is at the bottom).
My table is not being cleared by the deleteCounts step. I suspect it's because I'm fibbing the reader.
I am expecting that deleteCounts will delete all rows from the table, yet it is not - and I suspect it's because of my "fibbed" reader but am not sure what I'm doing wrong.
My delete statement:
<delete id="delete">
DELETE FROM ${schemaname}.DERP
</delete>
My deleteCounts Step:
#Bean
#JobScope
public Step deleteCounts() {
StepBuilder sb = stepBuilderFactory.get("deleteCounts");
SimpleStepBuilder<ProcessedCountData, ProcessedCountData> ssb = sb.<ProcessedCountData, ProcessedCountData>chunk(10);
ssb.reader(noOpItemReader());
ssb.writer(writerFactory.myBatisBatchWriter(COUNT_DATA_DELETE));
ssb.startLimit(1);
ssb.allowStartIfComplete(true);
return ssb.build();
}
My NoOpItemReader, based on the previously linked solution on stackoverflow:
public NoOpItemReader<? extends ProcessedCountData> noOpItemReader() {
return new NoOpItemReader<>();
}
// for steps that do not need to read anything
public class NoOpItemReader<T> implements ItemReader<T> {
#Override
public T read() throws Exception {
return null;
}
}
I left out some mybatis plumbing, since I know that is working (step 2 is much more involved with the mybatis stuff, and step 2 is inserting rows just fine. deleting is so simple, it must be something with my step config...)
Your NoOpItemReader returns null. An ItemReader returning null indicates that the input has been exhausted. Since, in your case, that's all it returns, the framework assumes that there was no input in the first place.

Exchanging data between steps in spring batch

I have a job that is built of the following components
Processing Step - activates an external processing that logs it's result in the DB and returns an internal id so I can take this id and process it further
Logging Step - built of a tasklet, that contains a chunk
The chunk is built of an item reader - that I plan will use #{stepExecutionContext['job.id']} as part of the sql written in the xml file so it will get the relevant logging info
I'm trying to work with the solution suggested here - 11.8 Passing Data to Future Steps but i get this error when i try to add a property on the step execution context or the job execution context
chunkContext.getStepContext().getStepExecutionContext().put("job.id", jobId);
And I get this error:
java.lang.UnsupportedOperationException: null
at java.util.Collections$UnmodifiableMap.put(Collections.java:1342)
at ...
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:395)
Did i miss something?
StepContext available from ChunkContext is a read-only object; use a StepExecutionListener and save the step execution context passed as parameter in method StepExecutionListener.afterStep(StepExecution stepExecution)
I always follow this path:
chunkContext
.getStepContext()
.getStepExecution()
.getJobExecution()
.getExecutionContext()
.put("", "");
I also had problems doing the put in StepExecution.
Maybe it loses data when it goes to the next step.
However, I leave the last sentence to the most expert.
chunkContext.getStepContext().getStepExecutionContext() retrun copy use chunkContext.getStepContext().getStepExecution().getExecutionContext()

spring batch - processor chain

I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.
Requirments:
Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.
My solution and issues:
Data read:
Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.
Process:
Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.
the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following;
- keep separation in your design/solution for reading from the database (itemreader)
- keep separation of each individual processors 'concerns' and configuration
- allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes
the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)
Suggestions:
1) Read the data using JdbcCursorItemReader.
All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.
I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.
<bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
<property name="targetObject" ref="AnExistingDao" />
<property name="targetMethod" value="next" />
</bean>
Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)
If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter
But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:
public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {
private long currentCount = 0;
private final String CONTEXT_COUNT_KEY = "count";
/**
* #return return value of the target method.
*/
public T read() throws Exception {
super.setArguments(new Long[]{currentCount++});
return invokeDelegateMethod();
}
#Override
public void open(ExecutionContext executionContext)
throws ItemStreamException {
currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);
}
#Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
log.info("Update Stream current count : " + currentCount);
}
#Override
public void close() throws ItemStreamException {
// TODO Auto-generated method stub
}
}
Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream
2) Regarding the 7 steps vs 1 step.
I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.
1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!
2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!
3) is the way to go IMO. ;-)