I have to create a batch job to do financial reconciliation. Right now i have 3 steps:
step 1 : Read an XML from the third party , convert this in our domains object, write in DB(table 1)
step 2 : Read a flatFile from our transactions datastore, write in DB (Table2)
step 3 : Read both table1 and table 2 in an aggregatorObject, process both list to find differences and set status code, write a status code in table 2
My problem is with step3. I can't find a good solution to have my ItemReader reading from 2 SQL.
I started with a custom ItemReader like this :
package batch.concilliation.readers;
#Component("conciliationReader")
public class TransactionReader implements ItemReader<TransactionsAgragegator>{
private final Logger log = Logger.getLogger(TransactionReader.class);
#Autowired
private ConciliationContext context;
#Autowired
private ServiceSommaireConciliation serviceTransactionThem;
#Autowired
private ServiceTransactionVirement serviceTransactionUs;
#Override
public TransactionsAgragegator read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
TransactionsAgragegator agregator = new TransactionsAgragegator();
SommaireConciliationVirementInterac sommaire = serviceSommaireThem.findByRunNo(context.getRunNo());
List<TransactionVirement> journalSic = serviceTransactionUs.findByTimestamp(sommaire.getBeginDate(), sommaire.getEndDate());
// on place ces deux listes dans l'objet agregteur.
agregator.setListeTransactionThem(sommaire.getPayments());
agregator.setListeTransactionsUs(journalSic);
return aggregator;
}
}
This Reader use two services already implemented (DAO) that read both tables and return domain objects. I take the two lists of transaction from us and from them and put them in an aggregator object. This object would be passed to the ItemProcessor and i could do my business logic... but this reader start an infinite loop since it will never read null.
I read about ItemReaderAdapter, but i still have the same problem of looping over a collection until i get a null.
So in summary, i want to read 2 different tables and get 2 List:
List<TransactionThirdParty>
List<TransactionHome>
then My ItemProcesssor would check to see if both lists are equals or not, is one has more or less transactions then the other..etc
Any Spring Batch expert can suggest something?
The problem here is that your first two steps are chunk oriented but the third one is not. While the first two may have the usual read-process-write cycle, the third step while dependent on the first two is a one time operation. It is no more different then copying a file in batch domain.
So you should not use the ItemReader way here, because you do not have an exit criteria (that is why you never get nulls from the reader, it cannot know when the source is exhausted since it does not deal with a line or record.
That is where TaskletStep helps
The Tasklet is a simple interface that has one method, execute, which
will be a called repeatedly by the TaskletStep until it either returns
RepeatStatus.FINISHED or throws an exception to signal a failure.
So implement your third step as a Tasklet instead of chunk oriented way.
Related
I have a REST API to calculate something upon a request, and if the same request is made again, return the result from the cache, which consist of documents saved in MongoDB. To know if two request is the same, I am hashing some relevant fields in the request. But when same request is made in a quick succession, duplicate documents occur in MongoDB, which later results in "IncorrectResultSizeDataAccessException" when I try to read them.
To solve it I tried to synchronize on hash value in following controller method (tried to cut out non relevant parts):
#PostMapping(
path = "/{myPath}",
consumes = {MediaType.APPLICATION_JSON_UTF8_VALUE},
produces = {MediaType.APPLICATION_JSON_UTF8_VALUE})
#Async("asyncExecutor")
public CompletableFuture<ResponseEntity<?>> retrieveAndCache( ... a,b,c,d various request parameters) {
//perform some validations on request...
//hash relevant equest parameters
int hash = Objects.hash(a, b, c, d);
synchronized (Integer.toString(hash).intern()) {
Optional<Result> resultOpt = cacheService.findByHash(hash);
if (resultOpt.isPresent()) {
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(opt.get().getResult()));
} else {
Result result = ...//perform requests to external services and do some calculations...
cacheService.save(result);
return CompletableFuture.completedFuture(ResponseEntity.status(HttpStatus.OK).body(result));
}
}
}
//cacheService methods
#Transactional
public Optional<Result> findByHash(int hash) {
return repository.findByHash(hash); //this is the part that throws the error
}
I am sure that no hash collision is occuring, its just when the same request is performed in a quick succession duplicate records occur. To my understanding, it shouldn't occur as long as I have only 1 running instance of my spring boot application. Do you see any other reason than there are multiple instances running in production?
You should check the settings of your MongoDB client.
If one thread calls the cacheService.save(result) method, and after that method returns, releases the lock, then another thread calls cacheService.findByHash(hash), it's still possible that it will not find the record that you just saved.
It's possible that e.g. the save method returns as soon as the saved object is in the transaction log, but not fully processed yet. Or the save is processed on the primary node, but the findByHash is executed on the secondary node, where it's not replicated yet.
You could use WriteConcern.MAJORITY, but I'm not 100% sure if it covers everything.
Even better is to let MongoDB do the locking by using findAndModify with FindAndModifyOptions.upsert(true), and forget about the lock in your java code.
I'm extending existing Job. What I need to do is update a list of records from database with data gotten from external service. I don't know how to do it in a loop so I thought about creating a list of Steps each consisting of reader, processor and writer and simply adding them to next() method in a jobBuilder. Looking at documentation it's only possible to add one Step at a time, and I have several thousands rows in the database, thus several thousands Steps. How should I do this?
edit:
in short I need to:
read a list of ids from db,
for every id I need to call external service to get information relevant to this id,
process data from it
save updated row to db
I have a batch job which is reading record from the Azure SQL database. The use case is there will be continuous writing of record in the database and my spring batch job has to run in every 5 min and read the record which is newly inserted and so far not has been procced from the last job . But I am not sure if there is inbuilt method in RepositoryItemReader or I have to implement hack solution for it
#Bean
public RepositoryItemReader<Booking> bookingReader() {
RepositoryItemReader<Booking> bookingReader = new RepositoryItemReader<>();
bookingReader.setRepository(bookingRepository);
bookingReader.setMethodName("findAll");
bookingReader.setSaveState(true);
bookingReader.setPageSize(2);
Map<String, Sort.Direction> sort = new HashMap<String, Sort.Direction>();
bookingReader.setSort(sort);
return bookingReader;
}
You need to add a column to your database called "STATUS". When the data is inserted into your table, the status should be "NOT PROCESSED". When your ItemReader reads data change the status to "IN PROCESS" when your ItemProcessor and ItemWriter completes its task change the status to "PROCESSED". In this way you can make sure your ItemReader reads only "NOT PROCESSED" data.
Note: If you are running your batch job using multiple threads using Task Executor, please use synchronized method in your reader to read 'NOT PROCESSED" records and to change the status to "IN PROGRESS". In this way you can make sure that multiple threads will not fetch the same data.
If table altering is not an option then another approach would be to use Spring Batch meta-data tables as much as you can.
Before job completion you simply store timestamp or some sort of indicator into a job execution context that tells you where to begin on next job iteration.
This can be "out of the box" solution.
To make things short, I have to make a script in Second Life communicating with an AppEngine app updating records in an ndb database. Records extracted from the database are sent as a batch (a page) to the LSL script, which updates customers, then asks the web app to mark these customers as updated in the database.
To create the batch I use a query on a (integer) property update_ver==0 and use fetch_page() to produce a cursor to the next batch. This cursor is also sent as urlsafe()-encoded parameter to the LSL script.
To mark the customer as updated, the update_ver is set to some other value like 2, and the entity is updated via put_async(). Then the LSL script fetches the next batch thanks to the cursor sent earlier.
My rather simple question is: in the web app, since the query property update_ver no longer satisfies the filter, is my cursor still valid ? Or do I have to use another strategy ?
Stripping out irrelevant parts (including authentication), my code currently looks like this (Customer is the entity in my database).
class GetCustomers(webapp2.RequestHandler): # handler that sends batches to the update script in SL
def get(self):
cursor=self.request.get("next",default_value=None)
query=Customer.query(Customer.update_ver==0,ancestor=customerset_key(),projection=[Customer.customer_name,Customer.customer_key]).order(Customer._key)
if cursor:
results,cursor,more=query.fetch_page(batchsize,start_cursor=ndb.Cursor(urlsafe=cursor))
else:
results,cursor,more=query.fetch_page(batchsize)
if more:
self.response.write("more=1\n")
self.response.write("next={}\n".format(cursor.urlsafe()))
else:
self.response.write("more=0\n")
self.response.write("n={}\n".format(len(results)))
for c in results:
self.response.write("c={},{},{}\n".format(c.customer_key,c.customer_name,c.key.urlsafe()))
self.response.set_status(200)
The handler that updates Customer entities in the database is the following. The c= parameters are urlsafe()-encoded entity keys of the records to update and the nv= parameter is the new version number for their update_ver property.
class UpdateCustomer(webapp2.RequestHandler):
#ndb.toplevel # don't exit until all async operations are finished
def post(self):
updatever=self.request.get("nv")
customers=self.request.get_all("c")
for ckey in customers:
cust=ndb.Key(urlsafe=ckey).get()
cust.update_ver=nv # filter in the query used to produce the cursor was using this property!
cust.update_date=datetime.datetime.utcnow()
cust.put_async()
else:
self.response.set_status(403)
Will this work as expected ? Thanks for any help !
Your strategy will work and that's the whole point for using these cursors, because they are efficient and you can get the next batch as it was intended regardless of what happened with the previous one.
On a side note you could also optimise your UpdateCustomer and instead of retrieving/saving one by one you can do things in batches using for example the ndb.put_multi_async.
Following is my use case for spring batch.
Reads the input from web service. Web service will return all records.
Process the records.
Write the processed records one by one.
I'm clear about step 2 and 3 but not able to figure out how to implement a reader which can read all the records in one go. How to pass the records one by one to item processor /writer?
Should I be using tasklet instead of reader/writer?
What will your WebService Returns? A collection of object i guess!
Your ItemReader need to loop on this collection and remove items one-by-one then return null when they are all processed.
What #Kik was saying is the rest is handled by Spring batch based on your commit-interval. if you have a commit0interval of 10 for example, your reader will read 10 items, passed those 10 items to the ItemProc. then pass them again after to the writer.
Hope it clarify
EDIT: 1) In Spring Batch you have more than one option to do what you need.
Easy Option, create a custom MyWsItemReader that implements the ItemReader interface.
-Define a method init() in this class that will call your webService and put the results in a collection attribute of MyWsItemReader.
-Implements the method read() from the interface. (read carfully the contact in the doc - you must return null when you passed all the elements of the collection)
-Then, configure a stepListener around the step and implement the beforeStep() method to call the init() of your MyWsItemReader. You can autowire the reader in the listener to accomplish this.
Alternatively, your MyWsItemReader could also implements the InitializingBean. then you would have to implement the afterPropertySet() where you could call the ws and store the result in a private attribute of MyWsItemReader
regards