Spring Batch Solution - spring-batch

1
In our application, we have around 100000 customers and need to process some data on monthly basis. Data processing logic for each customer involves, around 7 rest calls to different service. We need to do this in spring batch to achieve performance.
Steps to process data --l
Read all customers List-get the data web service--l
call 7 different micro services to get the balance, type, fees, date etc etc..--l
Write result to S3 bucket
please suggest the design the flow in spring batch

you can create a flow of multiple steps where for each step you can send "serviceName" as parameter. Write a customReader which calls the service based on serviceName. In CustomReader you can decide on the way you want to call the services.
List<Step> steps = new ArrayList<>();
for(each of your service){
createStep(String serviceName);
}
private Step createStep(String serviceName){
return stepBuilderFactory.get(""service calls")
.reader(UorCustomReader)
.processor(YourProcessor)//if needed
.writer(YourCustomCompositeWriter)
}

Related

Is this a good approach for api clone (api scraping)?

currently I am working with a pipeline node.js base with backpressure in order to download all data from several API endpoints, some of these endpoints have data related between them.
The idea is to make a refactor in order to build an application more maintainable than the current in order to improve the updates over the data, and the current approach is the following.
The first step (map) is to download all data from endpoints and push into several topics, some of this data is complex and it is necessary data from one endpoint in order to retrieve data from another endpoint.
And the second step (reduce) is to get that data from all topics and push into a SQL but only the data that we need.
The question are
Could be a good approach to this problem
Is better to use Kafka streams in order to use KSQL to make transforms and only use a microservice to publish into the database.
The architecture schema is the following, and real time is not necessary for this data:
Thanks

Google Data Fusion: "Looping" over input data to then execute multiple Restful API calls per input row

I have the following challenge I would like to solve preferably in Google Data Fusion:
I have one web service that returns about 30-50 elements describing an invoice in a JSON payload like this:
{
"invoice-services": [
{
"serviceId": "[some-20-digit-string]",
// some other stuff omitted
},
[...]
]
}
For each occurrence of serviceId I then need to call another webservice https://example.com/api/v2/services/{serviceId}/items repeatedly where each serviceId comes from the first call. I am only interested in the data from the second call which is to be persisted into BigQuery. This second service call doesn't support wildcards or any other mechanism to aggregate the items - i.e. if I have 30 serviceId from the first call, I need to call the second webservice 30 times.
I have made the first call work, I have made the second call work with a hard coded serviceId and also the persistence into BigQuery. These calls simply use the Data Fusion HTTP adapter.
However, how can I use the output of the first service in such a way that I issue one webservice call for the second service for each row returned from the first call - effectively looping over all serviceId?
I completely appreciate this is very easy in Python Code, but for maintainability and fit with our environment I would prefer to solve this in Data Fusion or need be any of the other -as-a-Service offerings from Google.
Any help is really appreciated!
J
PS: This is NOT a big data problem -I am looking at about 50 serviceId and maybe 300 items.

How to add a list of Steps to Job in spring batch

I'm extending existing Job. What I need to do is update a list of records from database with data gotten from external service. I don't know how to do it in a loop so I thought about creating a list of Steps each consisting of reader, processor and writer and simply adding them to next() method in a jobBuilder. Looking at documentation it's only possible to add one Step at a time, and I have several thousands rows in the database, thus several thousands Steps. How should I do this?
edit:
in short I need to:
read a list of ids from db,
for every id I need to call external service to get information relevant to this id,
process data from it
save updated row to db

Inter-microservices Communication using REST & PUB/SUB

This is still a theory in my mind.
I'm rebuilding my backend by splitting things into microservices. The microservices I'm imagining for starting off are:
- Order (stores order details and status of each order)
- Customer (stores customer details, addresses, orders booked)
- Service Provider (stores service provider details, status & location of each service provider, order(s) currently being processed by the service provider, etc.)
- Payment (stores payment info for each order)
- Channel (communicates with customers via email / SMS / mobile push)
I hope to be able to use PUB/SUB to create a message with corresponding data, which can be used by any other microservice subscribing to that message.
First off, I understand the concept that each microservice should have complete code & data isolation (thus, on different instances / VMs); and that all microservices should communicate strictly using HTTP REST API contracts.
My doubts are as follows:
To show a list of orders, I'll be using the Order DB to get all orders. In each Order document (I'll be using MongoDB for storage), I'll be having a customer_id Foreign Key. Now the issue of resolving customer_name by using customer_id.
If I need to show 100 orders on the page and go with the assumption that each order has a unique customer_id associated with it, then will I need to do a REST API call 100 times so as to get the names of all the 100 customer_ids?
Or, is data replication a good solution for this problem?
I am envisioning something like this w.r.t. PUB/SUB: The business center personnel mark an order as assigned & select the service provider to allot to that order. This creates a message on the cross-server PUB/SUB channel.
Then, the Channel microservice (which is on a totally different instance / VM) captures this message & sends a Push message & SMS to the service provider's device using the data within the message's contents.
Is this possible at all?
UPDATE TO QUESTION 2: I want the Order microservice to be completely independent of any other microservices that will be built upon / side-by-side it. Channel microservice is an example of a microservice that depends upon events taking place within Order microservice.
Also, please guide me as to what all technologies / libraries to use.
What I'll be developing on:
Java
MongoDB
Amazon AWS instances for each microservice.
Would appreciate anyone's help on this.
Thanks!
#1
If I need to show 100 orders and each order has a unique customer_id, will I need to do 100 REST API call?
No, just make 1 request with 100 order_id(s) and return a dictionary of order_id <=> customer_id
#2
It's a single request
POST
/orders/new
{
"selected_service_provider_id" : "123"
...
}
Which can return you order_id and you can print it locally for the customer or track progress or what have you.
On the server side, you receive an order and process it. Processing can include sending an SMS at some stage. This functionality can be implemented inside original service that received this request or as a separate call to another dedicated service.
To your first question, you don't need to do 100 queries, just one with the array of your 100 documents, like the following:
db.collection.find( { _id : { $in : [1,2,3,4] } } );
https://stackoverflow.com/a/7713461/1384539
I know this question is 1 year old, but I would like to add my answer to the first point.
One option would be to use some form of CQRS and store on the OrderDB also some of the user details when creating an order. This way when you have to show the list of orders you already have all the details you need. Also, the order document would represent a photograph of the user state at the moment of the order creation.
Of course, in case you don't have the user details when storing the order, you just need to make a GET call to the User Service, but that would be 1 call, not 100.

Spring batch Item reader to iterate over a rest api call

I have a spring batch job, which needs to fetch details from rest api call and process the particular data on my side. My rest api call will have mainly the below parameters :
StartinIdNumber(offset)
PageSize(limit)
ps: StartinIdNumber serves the same purpose as rownumber or "offset" in this particular API. The API response results are sorted by IdNumber, so by specifying a StartinIdNumber, the API will in turn perform a "where IdNumber >= StartinIdNumber order by IdNumber limit pageSize" in their DB query.
It will return the given number of user details, I need to iterate through all the ids by changing the StartingIdNumber parameter for each request.
I have seen current ItemReader implementations of spring batch framework,which read through database or xml etc. But I didn't come across any reader which helps in my case. Please suggest a way to iterate through the user details as specified above .
Note : If I write my own custom item reader, I have to take care of preserving state (last processed "StartingIdNumer") which is proving challenging to me.
Does implementing ItemStream serves my purpose? Or is there any better way?
Implementing the ItemStream interface and writing my own custom reader served my purpose. It is now state-full as required for me. Thanks.