I am working on a project where i am reading items from database A, my chunk size is 2000, in the processor i want to get information from database B, to save time, i want to make one call for the 2000 items only once at reader, but cant do that cuz reader takes one item and returns one item.
Is there away to process over list not single item?
The item processor processes items one by one. The first callback that gives you a list of items is ItemWriteListener#beforeWrite(List items) in which you can do a single call for all items before writing them.
Related
I want to have a periodic celery task to do the following:
Get the list of receipt items not yet scraped from the data base
Call a task to scrap a specific website for every item on the previous list. Each task will return a list of products
Save the products on the data base and update the receipts so it won't be used on the next call of the periodic task
I think I could accomplish that by using chain, chunks and group, but I don't know how to do it. My idea was to chain the list from item 1 as input to item 2, but I would like every item on the list to be a separate task. It seems that "chunks" is a good candidate, but how can I pipe the list from item 1 to it? And how can I use the length of this list to define the number of tasks? Is it possible?
Than the returned values of each one of those tasks would be pipe to a "group" composed of the tasks to save the products to the db and update the receipts.
I have a reorderable list where i read the further information of each row on tap from a json file. Each row displays the corresponding json files name. I read these from a local folder in users device. The list shown here enables the user to reorder the items. The problem is I want to ensure persistent reodering by which I mean my app should remember the ordering user made next time the app is launched. I cannot think of anyway to go with this. Do I store a local json file keeping all the file names and the corresponding row index? What would be a best practice for this. This list is though to have row amount of 50 to 200 so I need a scalable solution.
I'm looking at creating a table that could potentially be loaded with 100s of rows. I was hoping to use the growing option that the tables provide. I have a few questions.
If I have a aggregation which is a total for all the rows in that column, will it be the total of all rows or only of those that have been loaded. Or can this be set with a variable etc.
similar to above the select all feature to tick all the rows, will this select every row even the ones not included, or will it just select the loaded rows. Again is this just a variable that I can set.
This is a first time really using any of the UI5 table elements, and the sap said this which I didn't really understand:
"Show Aggregations
Show aggregations (such as totals) on the table footer (sap.m.Column, aggregation: footer).
Do not show aggregations in “growing” mode. It is not clear, if an aggregation will only aggregate the items loaded into the front end, or all items."
For the growing tables, by default all actions and aggregations will only be processed for the data already loaded. Your citation from SAP means that it is not clear to the end user if the aggregated data refers to the visible data or to all data.
If you want to implement something like "Select all" or "Delete All", it would be better to implement this in the backend. From the guidelines of sap.m.List:
In multiple selection mode, users can (de)select all items using the shortcut CTRL+A. This only affects items that have already been loaded to the front-end server. All other items are not (de)selected before they are loaded (for example, items added via lazy loading with growingScrollToLoad). This conflicts with the guideline that all items the user can reach by scrolling must be (de)selected.
To process all items, listen to the selectionChange event and to its flag selectAll. This indicates whether CTRL+A was triggered. As soon as an action is triggered, process the items accordingly. Depending on the number of items, consider processing them in the back end.
I have a job that processes items in chunks (of 1000). The items are marshalled into a single JSON payload and posted to a remote service as a batch (all 1000 in one HTTP POST). Sometime the remote service bogs down and the connection times out. I set up skip for this
return steps.get("sendData")
.<DataRecord, DataRecord> chunk(1000)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.skipLimit(10)
.skip(IOException.class)
.build();
If a chunk fails, batch retries the chunk, but one item at a time (in order to find out which item caused the failure) but in my case no one item caused the failure, it is the case that the entire chunk succeeeds or fails as a chunk and should be retried as a chunk (in fact, dropping to single-item mode causes the remote service to get very angry and it refuses to accept the data. We do not control the remote service).
What's my best way out of this? I was trying to see if I could disable single-item retry mode, but I don't even fully understand where this happens. Is there a custom SkipPolicy or something that I can implement? (the methods there didn't look that helpful)
Or is there some way to have the item reader read the 1000 records but pass it to the writer as a List (1000 input items => one output item)?
Let me walk though this in two parts. First I'll explain why it works the way it does, then I'll propose an option for addressing your issue.
Why Is Retry Item By Item
In your configuration, you've specified that it be fault tolerant. With that, when an exception is thrown in the ItemWriter, we don't know which item caused it so we don't have a way to skip/retry it. That's why, when we do begin the skip/retry logic, we go item by item.
How To Handle Retry By The Chunk
What this comes down to is you need to get to a chunk size of 1 in order for this to work. What that means is that instead of relying on Spring Batch for iterating over the items within a chunk for the ItemProcessor, you'll have to do it yourself. So your ItemReader would return a List<DataRecord> and your ItemProcessor would loop over that list. Your ItemWriter would take a List<List<DataRecord>>. I'd recommend creating a decorator for an ItemWriter that unwraps the outer list before passing it to the main ItemWriter.
This does remove the ability to do true skipping of a single item within that list but it sounds like that's ok for your use case.
I'm using spring batch and I want to write a job where I have a JPA reader that selects paginated sets of products from the database. Then I have a processor that will perform some operation on every single product (let's say on product A), but performing this operation on product A the item processor will also process some other products too (like product B, product C, etc.). Then the processor will come to product B because it's in line and is given by the reader. But it has already been processed, so it's actually a waste of time/resources to process it again. How should one actually tackle this - is there a modification aware item reader in spring batch? One solution would be in the item processor to check if the product has already been processed, and only if it hasn't been then process it. However checking if the product has been process is actually very resource consuming.
There are two approaches here that I'd consider:
Adjust what you call an "item" - An item is what is returned from the reader. Depending on the design of things, you may want to build a more complex reader that can include the dependent items and therefore only loop through them once. Obviously this is very dependent upon your specific use case.
Use the Process Indicator pattern - The process indicator pattern is what this is for. As you process items, set a flag in the db indicating that they have been processed. Your reader's query is then configured to only read those that have been processed (filtering those out that were updated via the process phase).