Custom counter in itemProcessor - spring-batch

How would one implement a custom counter in itemProcessor? A basic counter could work as defined here, but I need the counter not to include retried items or items in rolled back chuncks. Maybe there is itemStream like interface for itemProcessor that I haven't found yet. Using SpringBatch 2.1.7.
EDIT: The batch configuration can be found here (Using compositeProcessor). I've tried to implement the counter like follows (with no luck):
Setting itemProcessListener for all processors and in afterProcess(I,O) increment the counters for each processor in a cache (cache also is in each processor). Then using itemWriteListener for all processors and in afterWrite() flush the cache to stepExecution. But this don't work as the itemProcessListener is not working with the compositeProcessors child processors as I would have expected. Any other idea?
EDIT: I've removed the compositeProcessor and tried to use only a single processor and found out that the itemProcessListener.afterProcess will be called too many times. I am guessing that this is related to the chunk-processing-mode vs. single-processing-mode. So some of the not retried items of a chunk will be re-processed. I've also tried to use RetryListener (to disable counter increments if a retry is in progress), but was not able to configure it. The open and close would not have been called on RetryListener.

I think StepExecution domain object should fits your request.
Intercept it with a StepExecutionListener and access wanted properties.

I was able to solve this counter issue; However I had change the batch a bit. First thing to remove was the compositeItemProcessor. Then I needed to update SpringBatch version to 2.2.7 (To get ChunkListener.afterChunkError). Now when processor increments a counter, I'll cache the counter to processor. I also use ChunkListener.afterChunkError to clear the cache, so when items are reprocessed, the old counter values will be cleared. When ItemWriteListener.afterWrite() occurs I'll flush the cache to stepExecutionContext. This way I was able to overcome the retry increments counters problem.

Related

Deleting previous events with esper cep

While sending events to the runtime if some condition applies I want to ignore all past events and start fresh (with the same runtime, pattern, listeners, etc.), and that could happen many times so i need a relatively fast way of doing that. Is there a function that deletes from the runtime all the events that were already sent (I couldn't find such function in the documentation)? (I know I could create a new runtime but that is very time consuming)
Yes the contexts; You define the start condition and end condition. When the end condition is reached the runtime throws away all state, doc link.

Disable Spring Batch single-item processing in skip situation

I have a job that processes items in chunks (of 1000). The items are marshalled into a single JSON payload and posted to a remote service as a batch (all 1000 in one HTTP POST). Sometime the remote service bogs down and the connection times out. I set up skip for this
return steps.get("sendData")
.<DataRecord, DataRecord> chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.skipLimit(10)
.skip(IOException.class)
.build();
If a chunk fails, batch retries the chunk, but one item at a time (in order to find out which item caused the failure) but in my case no one item caused the failure, it is the case that the entire chunk succeeeds or fails as a chunk and should be retried as a chunk (in fact, dropping to single-item mode causes the remote service to get very angry and it refuses to accept the data. We do not control the remote service).
What's my best way out of this? I was trying to see if I could disable single-item retry mode, but I don't even fully understand where this happens. Is there a custom SkipPolicy or something that I can implement? (the methods there didn't look that helpful)
Or is there some way to have the item reader read the 1000 records but pass it to the writer as a List (1000 input items => one output item)?
Let me walk though this in two parts. First I'll explain why it works the way it does, then I'll propose an option for addressing your issue.
Why Is Retry Item By Item
In your configuration, you've specified that it be fault tolerant. With that, when an exception is thrown in the ItemWriter, we don't know which item caused it so we don't have a way to skip/retry it. That's why, when we do begin the skip/retry logic, we go item by item.
How To Handle Retry By The Chunk
What this comes down to is you need to get to a chunk size of 1 in order for this to work. What that means is that instead of relying on Spring Batch for iterating over the items within a chunk for the ItemProcessor, you'll have to do it yourself. So your ItemReader would return a List<DataRecord> and your ItemProcessor would loop over that list. Your ItemWriter would take a List<List<DataRecord>>. I'd recommend creating a decorator for an ItemWriter that unwraps the outer list before passing it to the main ItemWriter.
This does remove the ability to do true skipping of a single item within that list but it sounds like that's ok for your use case.

Disable Retry When commitInterval = 1

The behavior for the batch processing of our business entities we would like is to rollback the failed transaction and not try again. I have read through the forum and it appears that it is not possible. We have set the commitInterval=1 and tried the Never Retry Policy for this special case but to no avail. I have read the rational is that the writer does not know if the list of items received is the initial or subsequent processing in the case of a failure.
Have I summarized this correctly and Spring batch does not currently support the behavior we are looking for?
Sounds like a candidate for Skip Logic
https://docs.spring.io/spring-batch/reference/html/configureStep.html
Check out these two sections in particular:
5.1.5 Configuring Skip Logic
5.1.7 Controlling Rollback

Spring Integration JDBC inbound channel adapter - avoiding duplicate reads

I have a Spring Integration jdbc:inbound-channel-adapter which reads from a database. An important requirement is that the same rows are not read twice. One possible approach may be to use the update attribute to set a flag on the rows read using the same where clause as for the query attribute. The concern however would be that if an exception occurs further on in the workflow (transforming the result set using the row mapper, marshalling to XML, and then placing on an outbound queue for an external system), those rows would not be re-read when the application came back up. Is there a better strategy to use in this case with Spring Integration?
Another question would be that, given the above requirement, would Spring Batch offer a more robust solution, and if so, how would this be implemented?
Thanks
Looks like you should use the short TX and channel shift technique:
<int-jdbc:inbound-channel-adapter channel="executorChannel"/>
<int:channel id="executorChannel">
<int:dispatcher task-executor="executor"/>
</int:channel>
Having that your message will be shifted to the different Thread outside of JDBC TX. And the last one will be commited always. So, any downstrem issues won't affect you row in DB - they will be marked as processed and won't be read one more time.

Getting past Salesforce trigger governors

I'm trying to write an "after update" trigger that does a batch update on all child records of the record that has just been updated. This needs to be able to handle 15k+ child records at a time. Unfortunately, the limit appears to be 100, which is so far below my needs it's not even close to acceptable. I haven't tried splitting the records into batches of 100 each, since this will still put me at a cap of 10k updates per trigger execution. (Maybe I could just daisy-chain triggers together? ugh.)
Does anyone know what series of hoops I can jump through to overcome this limitation?
Edit: I tried calling following #future function in my trigger, but it never updates the child records:
global class ParentChildBulkUpdater
{
#future
public static void UpdateChildDistributors(String parentId) {
Account[] children = [SELECT Id FROM Account WHERE ParentId = :parentId];
for(Account child : children)
child.Site = 'Bulk Updater Fired';
update children;
}
}
The best (and easiest) route to take with this problem is to use Batch Apex, you can create a batch class and fire it from the trigger. Like #future it runs in a separate thread, but it can process up to 50,000,000 records!
You'll need to pass some information to your batch class before using database.executeBatch so that it has the list of parent IDs to work with, or you could just get all of the accounts of course ;)
I've only just noticed how old this question is but hopefully this answer will help others.
It's worst than that, you're not even going to be able to get those 15k records in the first place, because there is a 1,000 row query limit within a trigger (This scales to the number of rows the trigger is being called for, but that probably doesnt help)
I guess your only way to do it is with the #future tag - read up on that in the docs. It gives you much higher limits. Although, you can only call so many of those in a day - so you may need to somehow keep track of which parent objects have their children updating, and then process that offline.
A final option may be to use the API via some external tool. But you'll still have to make sure everything in your code is batched up.
I thought these limits were draconian at first, but actually you can do a hell of a lot within them if you batch things correctly, we regularly update 1,000's of rows from triggers. And from an architectural point of view, much more than that and you're really talking batch processing anyway which isnt normally activated by a trigger. One things for sure - they make you jump through hoops to do it.
I think Codek is right, going the API / external tool route is a good way to go. The governor limits still apply, but are much less strict with API calls. Salesforce recently revamped their DataLoader tool, so that might be something to look into.
Another thing you could try is using a Workflow rule with an Outbound Message to call a web service on your end. Just send over the parent object and let a process on your end handle the child record updates via the API. One thing to be aware of with outbound messages, it is best to queue up the process on your end somehow, and immediately respond to Salesforce. Otherwise Salesforce will resend the message.
#future doesn't work (does not update records at all)? Weird. Did you try using your function in automated test? It should work and and the annotation should be ignored (during the test it will be executed instantly, test methods have higher limits). I suggest you investigate this a bit more, it seems like best solution to what you want to accomplish.
Also - maybe try to call it from your class, not the trigger?
Daisy-chaining triggers together will not work, I've tried it in the past.
Your last option might be batch Apex (from Winter'10 release so all organisations should have it by now). It's meant for mass data update/validation jobs, things you typically run overnight in normal databases (it can be scheduled). See http://www.salesforce.com/community/winter10/custom-cloud/program-cloud-logic/batch-code.jsp and release notes PDF.
I believe in version 18 of the API the 1000 limit has been removed. (so the documentations says but in some cases I still hit a limit)
So you may be able to use batch apex. With a single APEX update statement
Something like:
List children = new List{};
for(childObect__c c : [SELECT ....]) {
c.foo__c = 'bar';
children.add(c);
}
update(children);;
Besure you bulkify your tigger also see http://sfdc.arrowpointe.com/2008/09/13/bulkifying-a-trigger-an-example/
Maybe a change to your data model is the better option here. Think of creating a formula on the children object where you access the data from the parent. This would be far more efficient probably.