Best way of persisting the processing status of each individual item - spring-batch

In my project I am reading the data for DB table using StoredProcedure reader and calling an API to process and then saving the output using writer. I need to maintain the processing status as Processed or Error for each record that I am reading. As of now I am using the writer to update the input table column STATUS to P (Processed) or E(Error) and add logs the in case of any error to LOGS column.
Can you please suggest if this is the efficient way to maintain the processing status of each record. Does Spring batch provides any default implementation for same?
Thanks

No, Spring Batch does not provide a "default implementation" for such a requirement.
That said, a flag on each item as you did is a reasonable way to address your requirement in my opinion.

Related

Fetch and maintain reference data at Job level in Spring Batch

I am configuring a new Job where I need to read the data from the database and in the processor, the data will be used to call a Rest endpoint with payload. In the payload along with dynamic data, I need to pass reference data which is constant for each record getting processed in the job. This reference data is stored in DB. I am thinking to implement the following approach.
In the beforeJob listener method make a DB call and populate the reference data object and use it for the whole job run.
In the processor make a DB call to get the reference data and cache the query so there will be no DB call to fetch the same data for each record.
Please suggest if these approaches are correct or if there is a better way to implement them in Spring batch.
For performance reasons, I would not recommend doing a DB call in the item processor, unless that is really a requirement.
The first approach seems reasonable to me, since the reference data is constant. You can populate/clear a cache with a JobExecutionListener and use the cache in your chunk-oriented step. Please refer to the following thread for more details and a complete sample: Spring Batch With Annotation and Caching.

If many Kafka streams updates domain model (a.k.a materialized view)?

I have a materialized view that is updated from many streams. Every one enrich it partially. Order doesn't matter. Updates comes in not specified time. Is following algorithm is a good approach:
Update comes and I check what is stored in materialized view via get(), that this is an initial one so enrich and save.
Second comes and get() shows that partial update exist - add next information
... and I continue with same style
If there is a query/join, object that is stored has a method that shows that the update is not complete isValid() that could be used in KafkaStreams#filter().
Could you share please is this a good plan? Is there any pattern in Kafka streams world that handle this case?
Please advice.
Your plan looks good , you have the general idea, but you'll have to use the lower Kafka Stream API : Processor API.
There is a .transform operator that allow you to access a KeyValueStatestore, inside this operation implementation you are free to decide if you current aggregated value is valid or not.
Therefore send it downstream or returning null waiting for more information.

Is there anyway to check duplicate the message control id (MSH:10) in MSH segment using Mirth connect?

Is there anyway to check duplicate the message control id (MSH:10) in MSH segment using Mirth connect?
MSH|^~&|sss|xxx|INSTANCE2|KKLIU 0063/2021|20190905162034||ADT^A28^ADT_A05|Zx20190905162034|P|2.4|||NE|NE|||||
whenever message enters it needs to be validated whether duplicate of control id Zx20190905162034 is already processed or not?
Mirth will not do this for you, but you can write your own JavaScript transformer to check a database or your own set of previously encountered control ids.
Your JavaScript can make use of any appropriate Java classes.
The database check (you can implement this using code template) is the easier way out. You might want to designate the column storing MSH:10 values as a primary key or define an index on it. Queries against unique entries would be faster. Other alternatives include periodically redeploying the Channel while reading all MSH:10 values already in the database and placing them in a global map variable or maintained in an API that you can make a GET request to when processing every message. Any of the options depends on the number of records we are speaking about.

How can I force to RepositoryItemReader to read the newly inserted record or unprocessed record only

I have a batch job which is reading record from the Azure SQL database. The use case is there will be continuous writing of record in the database and my spring batch job has to run in every 5 min and read the record which is newly inserted and so far not has been procced from the last job . But I am not sure if there is inbuilt method in RepositoryItemReader or I have to implement hack solution for it
#Bean
public RepositoryItemReader<Booking> bookingReader() {
RepositoryItemReader<Booking> bookingReader = new RepositoryItemReader<>();
bookingReader.setRepository(bookingRepository);
bookingReader.setMethodName("findAll");
bookingReader.setSaveState(true);
bookingReader.setPageSize(2);
Map<String, Sort.Direction> sort = new HashMap<String, Sort.Direction>();
bookingReader.setSort(sort);
return bookingReader;
}
You need to add a column to your database called "STATUS". When the data is inserted into your table, the status should be "NOT PROCESSED". When your ItemReader reads data change the status to "IN PROCESS" when your ItemProcessor and ItemWriter completes its task change the status to "PROCESSED". In this way you can make sure your ItemReader reads only "NOT PROCESSED" data.
Note: If you are running your batch job using multiple threads using Task Executor, please use synchronized method in your reader to read 'NOT PROCESSED" records and to change the status to "IN PROGRESS". In this way you can make sure that multiple threads will not fetch the same data.
If table altering is not an option then another approach would be to use Spring Batch meta-data tables as much as you can.
Before job completion you simply store timestamp or some sort of indicator into a job execution context that tells you where to begin on next job iteration.
This can be "out of the box" solution.

spring batch passing param from ItemProcessor to next ItemReader sql

I have following requirement.I am generating unique id from ItemProcessor and writing the same to database using JdbcItemWriter.
I wanted to pass this unique id as a query param in next JdbcItemReader,so that i can select all the records from database based on this unique id.
currently i am using max(uniqueid) from database.I have tried using {jobParameters['unqueid']} but it didn't worked.
Please let me know how to pass value from ItemProcessor to DataBaseItemReader.
I think using step execution context might work for you here. There is the option for setting some transient data on the step execution context and having that be available to other components in the same step.
There is a previous answer here that elaborates a bit more on this and a quick google search on "spring batch step execution context" also provides quite a few q/a on the subject.