Spring Batch - how to run steps in loop with values passed from a previous step/job - spring-batch

I want to achieve the use case where I have :
1. A task that calls an API to get some data and store to DB
2. Fetch all the records stored in DB from step#1 and call a second API with each of the IDs separately and store to DB
3. Fetch all the records stored in DB from step#2 and call a third API with each of the IDs separately and store to DB```
Something like -
1. GET ALL CONTINENTS AND SAVE in DB -
2. FOR EACH CONTINENTs from DB GET ALL COUNTRIES and store in DB -
3. FOR EACH COUNTRY in DB GET ALL STATES in DB
Task A is simple - I can create a Job with a Step along with Reader/Processor/Writer.
Task B - I am confused how to fetch the DB records and then pass them to Step2's reader one by one.

That requirement does not require running a step in a loop, it is not the same logic to repeat. Those are different steps operating on different data sets and doing different things. I see at least three ways to address this:
1. create a single chunk-oriented step following the "driving query pattern"
In this pattern, you would iterate over continent IDs using the item reader, and use an item processor to enrich items with countries and states, then write fully-loaded items to the database. This requires a single-step job.
2. create a job with a sequence of three chunk-oriented steps where steps share data through the job execution context
There are different ways to share data between steps. The typical way to do that is through the job execution context. Each step can add data required by the next step in the job execution context, and the next step can get it from there. In your example, this would be something like:
step 1: read continents and save them to the database, and store their IDs in the EC
step 2: get continent IDs from the EC, fetch/save countries, and store countries IDs in the EC.
step 3: gets countries IDs from the EC and fetch/save states
3. Join data on the database side
A third way to address that requirement is to write a SQL query to join and grab the required data and save it to the database. This would require a single-step job.

Related

Spring Batch: reading from a database and being aware of the previous processed id?

I'm trying to setup Spring Batch to move DB records from Oracle to Cassandra daily.
I know I can manually define JPA repository queries based on additional entity table (like MyBatchProgress where I store previously completed Id + date or something like that), so that the next batch job knows which entity to start with for further operations.
My question is: does Spring Batch provide something like this inbuilt (also by utilising Spring Data JPA)?
Or is this something that I have to write manually in the job reader step where I just pick up the last Id stored in my custom "progress" table?
Thanks in advance!
You can store the last ID in the execution context, which is persisted in the meta-data tables. With that in place, you can make the code that launches the job look for the last job execution, take the ID from its context and pass it as a job parameter to the next job instance.

How to add a list of Steps to Job in spring batch

I'm extending existing Job. What I need to do is update a list of records from database with data gotten from external service. I don't know how to do it in a loop so I thought about creating a list of Steps each consisting of reader, processor and writer and simply adding them to next() method in a jobBuilder. Looking at documentation it's only possible to add one Step at a time, and I have several thousands rows in the database, thus several thousands Steps. How should I do this?
edit:
in short I need to:
read a list of ids from db,
for every id I need to call external service to get information relevant to this id,
process data from it
save updated row to db

MeteorJS - How Send two seperate queries of the same collection from server to client

I am trying to send two separate sets of data from the same collection from server to client. Data is being inserted in the collection on a set interval of 30 seconds. One set of data sent to the client must return all documents over the course of the current day on an hourly basis, while the other set of data simply sends the most recent entry in the collection. I have a graph that needs to display hourly data, as well as fields that need to display the most recent record every 30 seconds, however, I cannot seem to decouple these two data sets. The query for the most recent entry seems to always overwrite the query for the hourly data when attempting to access the data on the client. So my question summed up is: How does one send two separate sets of data of the same collection from server to client, and then access these two separate sets independently on the client?
The answer is simple, you cannot!
The server is always answering to client with a result set that the client asked for. So, if the client needs two separate (different) result sets, then the client must fire up two different queries. Queries that request hourly data OR last (newest) entry.
Use added, changed, removed to modify the results from the two queries so that they are "transformed" into different fields. https://docs.meteor.com/api/pubsub.html#Subscription-added
However, this is probably not your issue. You are almost certainly using the same string as the name argument of your Meteor.publish call, or you are accidentally Meteor.subscribe-ing to the same Meteor.publish twice.
Make two separate Meteor.publish names, one for the most recent and one for the hourly data. Subscribe to each of them separately. The commenter is incorrect.

In CQRS/Eventsourcing which is best approach for a parent to modify the state of all it's children ??

Usecase: Suppose I have the following aggregates
Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
Question: How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
customerState.enabled = false
Solutions: Since CQRS does not allow the write side to query the read side to get a list of CustomerAggregate IDs I thought of the following:
CustomerRootAggregate always store the IDs of all it's CustomerAggregate in the database as json. When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
The Client (Browser - UI) should always send the list of CustomerIds to apply DisableCustomer to. But this will be problematic for a database with thousands of customers.
In the REST API Layer check for the command DisableAllCustomers and fetch all the IDs from the read side and sends DisableAllCustomers(ids) with IDs populated to write side.
Which is a recommended approach or is a better approach ?
Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
For starters, the language "child aggregate" is a bit confusing. Your model includes a "parent" entity that holds a direct reference to a "child" entity, then both of those entities must be part of the same aggregate.
However, you might have a Customer aggregate for each customer, and a CustomerSet aggregate that manages a collection of Id.
How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
The usual answer is that you run a query to get the set of Customers to be disabled, and then you dispatch a disableCustomer command to each.
So both 3 and 2 are reasonable answers, with the caveat that you need to consider what your requirements are if some of the DisableCustomer commands fail.
2 in particular is seductive, because it clearly articulates that the client (human operator) is describing a task, which the application then translates into commands to by run by the domain model.
Trying to pack "thousands" of customer ids into the message may be a concern, but for several use cases you can find a way to shrink that down. For instance, if the task is "disable all", then client can send to the application instructions for how to recreate the "all" collection -- ie: "run this query against this specific version of the collection" describes the list of customers to be disabled unambiguously.
When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
This is close to a right idea, but not quite there. You dispatch a command to the collection aggregate. If it accepts the command, it produces an event that describes the customer ids to be disabled. This domain event is persisted as part of the event stream of the collection aggregate.
Subscribe to these events with an event handler that is responsible for creating a process manager. This process manager is another event sourced state machine. It looks sort of like an aggregate, but it responds to events. When an event is passed to it, it updates its own state, saves those events off in the current transaction, and then schedules commands to each Customer aggregate.
But it's a bunch of extra work to do it that way. Conventional wisdom suggests that you should usually begin by assuming that the process manager approach isn't necessary, and only introduce it if the business demands it. "Premature automation is the root of all evil" or something like that.

Updating entities in ndb while paging with cursors

To make things short, I have to make a script in Second Life communicating with an AppEngine app updating records in an ndb database. Records extracted from the database are sent as a batch (a page) to the LSL script, which updates customers, then asks the web app to mark these customers as updated in the database.
To create the batch I use a query on a (integer) property update_ver==0 and use fetch_page() to produce a cursor to the next batch. This cursor is also sent as urlsafe()-encoded parameter to the LSL script.
To mark the customer as updated, the update_ver is set to some other value like 2, and the entity is updated via put_async(). Then the LSL script fetches the next batch thanks to the cursor sent earlier.
My rather simple question is: in the web app, since the query property update_ver no longer satisfies the filter, is my cursor still valid ? Or do I have to use another strategy ?
Stripping out irrelevant parts (including authentication), my code currently looks like this (Customer is the entity in my database).
class GetCustomers(webapp2.RequestHandler): # handler that sends batches to the update script in SL
def get(self):
cursor=self.request.get("next",default_value=None)
query=Customer.query(Customer.update_ver==0,ancestor=customerset_key(),projection=[Customer.customer_name,Customer.customer_key]).order(Customer._key)
if cursor:
results,cursor,more=query.fetch_page(batchsize,start_cursor=ndb.Cursor(urlsafe=cursor))
else:
results,cursor,more=query.fetch_page(batchsize)
if more:
self.response.write("more=1\n")
self.response.write("next={}\n".format(cursor.urlsafe()))
else:
self.response.write("more=0\n")
self.response.write("n={}\n".format(len(results)))
for c in results:
self.response.write("c={},{},{}\n".format(c.customer_key,c.customer_name,c.key.urlsafe()))
self.response.set_status(200)
The handler that updates Customer entities in the database is the following. The c= parameters are urlsafe()-encoded entity keys of the records to update and the nv= parameter is the new version number for their update_ver property.
class UpdateCustomer(webapp2.RequestHandler):
#ndb.toplevel # don't exit until all async operations are finished
def post(self):
updatever=self.request.get("nv")
customers=self.request.get_all("c")
for ckey in customers:
cust=ndb.Key(urlsafe=ckey).get()
cust.update_ver=nv # filter in the query used to produce the cursor was using this property!
cust.update_date=datetime.datetime.utcnow()
cust.put_async()
else:
self.response.set_status(403)
Will this work as expected ? Thanks for any help !
Your strategy will work and that's the whole point for using these cursors, because they are efficient and you can get the next batch as it was intended regardless of what happened with the previous one.
On a side note you could also optimise your UpdateCustomer and instead of retrieving/saving one by one you can do things in batches using for example the ndb.put_multi_async.