Disable Retry When commitInterval = 1 - spring-batch

The behavior for the batch processing of our business entities we would like is to rollback the failed transaction and not try again. I have read through the forum and it appears that it is not possible. We have set the commitInterval=1 and tried the Never Retry Policy for this special case but to no avail. I have read the rational is that the writer does not know if the list of items received is the initial or subsequent processing in the case of a failure.
Have I summarized this correctly and Spring batch does not currently support the behavior we are looking for?

Sounds like a candidate for Skip Logic
https://docs.spring.io/spring-batch/reference/html/configureStep.html
Check out these two sections in particular:
5.1.5 Configuring Skip Logic
5.1.7 Controlling Rollback

Related

Suppress Firebase Firestore function triggers in batch write

Is it possible to suppers function trigger for a specific entry in the write batch ?
batch.update(docRef1,docUpdate1)
batch.update(docRef2,docUpdate2,{ skipTriggers: true}
batch.set(docRef3,docSet,{ skipTriggers: true}
It would be greater to have the option to skip triggers when updating large number of documents because it takes a lot of time and the current solution that requires to delete the trigger/s makes the production application unusable until the update process is completed
No. There is no feature to suppers function trigger for a specific entry in the write batch.
Have a look at this document for batched writes
If you think it’s a valid feature request you may raise here with a clear description why you are thinking it’s a feature request. Good feature requests will solve common problems or enable new use cases.

How do i Re-run pipeline with only failed activities/Dataset in Azure Data Factory V2?

I am running a pipeline where i am looping through all the tables in INFORMATION.SCHEMA.TABLES and copying it onto Azure Data lake store.My question is how do i run this pipeline for the failed tables only if any of the table fails to copy?
Best approach I’ve found is to code your process to:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
Hope this helps,
J

Spring Integration JDBC inbound channel adapter - avoiding duplicate reads

I have a Spring Integration jdbc:inbound-channel-adapter which reads from a database. An important requirement is that the same rows are not read twice. One possible approach may be to use the update attribute to set a flag on the rows read using the same where clause as for the query attribute. The concern however would be that if an exception occurs further on in the workflow (transforming the result set using the row mapper, marshalling to XML, and then placing on an outbound queue for an external system), those rows would not be re-read when the application came back up. Is there a better strategy to use in this case with Spring Integration?
Another question would be that, given the above requirement, would Spring Batch offer a more robust solution, and if so, how would this be implemented?
Thanks
Looks like you should use the short TX and channel shift technique:
<int-jdbc:inbound-channel-adapter channel="executorChannel"/>
<int:channel id="executorChannel">
<int:dispatcher task-executor="executor"/>
</int:channel>
Having that your message will be shifted to the different Thread outside of JDBC TX. And the last one will be commited always. So, any downstrem issues won't affect you row in DB - they will be marked as processed and won't be read one more time.

Cannot find a record just created in a different thread with JPA

I am using the Play! framework, and have a difficulty with in the following scenario.
I have a server process which has a 'read-only' transaction. This to prevent any possible database lock due to execution as it is a complicated procedure. There are one or two record to be stored, but I do that as a job, as I found doing them in the main thread could result in a deadlock under higher load.
However, in one occasion I need to create an object and subsequently use it.
However, when I create the object using a Job, wait for the resulting id (with a Promise return) and then search in the database for it, it cannot be found.
Is there an easy way to have the JPA search 'afresh' in the DB at this point? I implemented a 5 sec. pause to test, so I am sue it is not because the procedure hadn't finished yet.
Check if there is a transaction wrapped around your INSERT and if there is one check that the transaction is COMMITed.

Preventing update loops for multiple databases using CDC

We have a number of legacy systems that we're unable to make changes to - however, we want to start taking data changes from these systems and applying them automatically to other systems.
We're thinking of some form of service bus (no specific tech picked yet) sitting in the middle, and a set of bus adapters (one per legacy application) to translate between database specific concepts and general update messages.
One area I've been looking at is using Change Data Capture (CDC) to monitor update activity in the legacy databases, and use that information to construct appropriate messages. However, I have a concern - how best could I, as a consumer of CDC information, distinguish changes applied by the application vs changes applied by the bus adapter on receipt of messages - because otherwise, the first update that gets distributed by the bus will get re-distributed by every receiver when they apply that change to their own system.
If I was implementing "poor mans" CDC - i.e. triggers, then those triggers execute within the context/transaction/connection of the original DML statements - so I could either design them to ignore one particular user (the user applying incoming updates from the bus), or set and detect a session property to similar ignore certain updates.
Any ideas?
If I understand your question correctly, you're trying to define a message routing structure that works with a design you've already selected (using an enterprise service bus) and a message implementation that you can use to flow data off your legacy systems that only forward-ports changes to your newer systems.
The difficulty is you're trying to apply changes in such a way that they don't themselves generate a CDC message from the clients receiving the data bundle from your legacy systems. In fact, all you're concerned about is having your newer systems consume the data and not propagate messages back to your bus, creating unnecessary crosstalk that might exponentiate, overloading your infrastructure.
The secret is how MSSQL's CDC features reconcile changes as they propagate through the network. Specifically, note this caveat:
All the changes are logged in terms of LSN or Log Sequence Number. SQL
distinctly identifies each operation of DML via a Log Sequence Number.
Any committed modifications on any tables are recorded in the
transaction log of the database with a specific LSN provided by SQL
Server. The __$operationcolumn values are: 1 = delete, 2 = insert, 3 =
update (values before update), 4 = update (values after update).
cdc.fn_cdc_get_net_changes_dbo_Employee gives us all the records net
changed falling between the LSN we provide in the function. We have
three records returned by the net_change function; there was a delete,
an insert, and two updates, but on the same record. In case of the
updated record, it simply shows the net changed value after both the
updates are complete.
For getting all the changes, execute
cdc.fn_cdc_get_all_changes_dbo_Employee; there are options either to
pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the
changes, but for updates, it provides the after updated values. Hence
we find two records for updates. We have one record showing the first
update when Jason was updated to Nichole, and one record when Nichole
was updated to EMMA.
While this documentation is somewhat terse and difficult to understand, it appears that changes are logged and reconciled in LSN order. Competing changes should be discarded by this system, allowing your consistency model to work effectively.
Note also:
CDC is by default disabled and must be enabled at the database level
followed by enabling on the table.
Option B then becomes obvious: institute CDC on your legacy systems, then use your service bus to translate these changes into updates that aren't bound to CDC (using, for example, raw transactional update statements). This should allow for the one-way flow of data that you seek from the design of your system.
For additional methods of reconciling changes, consider the concepts raised by this Wikipedia article on "eventual consistency". Best of luck with your internal database messaging system.