Inter-microservices Communication using REST & PUB/SUB - rest

This is still a theory in my mind.
I'm rebuilding my backend by splitting things into microservices. The microservices I'm imagining for starting off are:
- Order (stores order details and status of each order)
- Customer (stores customer details, addresses, orders booked)
- Service Provider (stores service provider details, status & location of each service provider, order(s) currently being processed by the service provider, etc.)
- Payment (stores payment info for each order)
- Channel (communicates with customers via email / SMS / mobile push)
I hope to be able to use PUB/SUB to create a message with corresponding data, which can be used by any other microservice subscribing to that message.
First off, I understand the concept that each microservice should have complete code & data isolation (thus, on different instances / VMs); and that all microservices should communicate strictly using HTTP REST API contracts.
My doubts are as follows:
To show a list of orders, I'll be using the Order DB to get all orders. In each Order document (I'll be using MongoDB for storage), I'll be having a customer_id Foreign Key. Now the issue of resolving customer_name by using customer_id.
If I need to show 100 orders on the page and go with the assumption that each order has a unique customer_id associated with it, then will I need to do a REST API call 100 times so as to get the names of all the 100 customer_ids?
Or, is data replication a good solution for this problem?
I am envisioning something like this w.r.t. PUB/SUB: The business center personnel mark an order as assigned & select the service provider to allot to that order. This creates a message on the cross-server PUB/SUB channel.
Then, the Channel microservice (which is on a totally different instance / VM) captures this message & sends a Push message & SMS to the service provider's device using the data within the message's contents.
Is this possible at all?
UPDATE TO QUESTION 2: I want the Order microservice to be completely independent of any other microservices that will be built upon / side-by-side it. Channel microservice is an example of a microservice that depends upon events taking place within Order microservice.
Also, please guide me as to what all technologies / libraries to use.
What I'll be developing on:
Java
MongoDB
Amazon AWS instances for each microservice.
Would appreciate anyone's help on this.
Thanks!

#1
If I need to show 100 orders and each order has a unique customer_id, will I need to do 100 REST API call?
No, just make 1 request with 100 order_id(s) and return a dictionary of order_id <=> customer_id
#2
It's a single request
POST
/orders/new
{
"selected_service_provider_id" : "123"
...
}
Which can return you order_id and you can print it locally for the customer or track progress or what have you.
On the server side, you receive an order and process it. Processing can include sending an SMS at some stage. This functionality can be implemented inside original service that received this request or as a separate call to another dedicated service.

To your first question, you don't need to do 100 queries, just one with the array of your 100 documents, like the following:
db.collection.find( { _id : { $in : [1,2,3,4] } } );
https://stackoverflow.com/a/7713461/1384539

I know this question is 1 year old, but I would like to add my answer to the first point.
One option would be to use some form of CQRS and store on the OrderDB also some of the user details when creating an order. This way when you have to show the list of orders you already have all the details you need. Also, the order document would represent a photograph of the user state at the moment of the order creation.
Of course, in case you don't have the user details when storing the order, you just need to make a GET call to the User Service, but that would be 1 call, not 100.

Related

asynchronous bulk data validations service - GET or POST?

Here is a different scenario for GET or POST confusion. I am working on a web application built with spring-boot microservice architecture where there is a need of validate and update some bulk data from excel sheet.
There can be 500-1000 records in excel sheet with 6 different columns for bulk processing. Once UI submits the excel sheet to server from then the total process is asynchronous. There are microservice to microservice calls which I am getting confused to have GET or POST.
Here is a problem: I have 4 microservices (let's say orchestra-service,A-service,B-service and C-service).
OrchestraService creates a DTO list from excel sheet which will be used in further calls. Orchestra calls 'A'. 'A' validates the data with DB and marks success and failure records in DTO list object and returns the list back to orchestra. Again orchestra calls 'B', it does the similar job like 'A' and returns back to orchestra.
Now orchestra calls 'C' which will update success records into database, updates the file status on database and also creates a new resultant excel sheet with error messages per row which will be emailed to the user later(small report kind of thing).
In above microservice to microservice calls only C is updating database and creating resource on server. All above calls I used POST method because I need the request body to pass my input list to all services.
According to HTTP Standards am I doing right?
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process it should be a POST call.
Please advice me whether:
I should use POST for only 'C' and GET for others or
It should be POST for all as other process involves in data filtering process.
NOTE: service A,B, and C not all services uses all the columns of excel but some of them in combinations. One column having 18 characters long data so I think it can be a problem with GET header limit for bulk operation.
Http Protocol
There is no actual violation on passing information on GET and if that request doesn't mutate between identical requests, then it's fine.
Microservice wise
Now for clarification, are Service A and Service B actually needed ?
Aren't they the same Domain as Service C, and can reside inside of him ?
It's more then good practice to have a Microservice validate its own domain and return a collection of success and failure with the relevant messages.
I had the similar question few years back and here is the possible solution for the first part of your question.
As mentioned by #Oreal Eraki in his answer, I would also question whether you need services A and B. If its just validation and data transformation it can be done in the same domain where the data is actually stored.

How to merge/consolidate responses from multiple RESTful microservices?

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?
The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/
You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html
You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .
Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.

RESTful API with relationships

Lets assume I'm building an API for a restaurant and I have the following resources:
Donut(has_chocolate,has_sprinkles)
and
Receipt(cost,donut_id)
My Web App will want to display a table of receipts for the manager to view. Unfortunately the receipt object by itself isn't useful enough, the manager needs to see whether the donut has chocolate or not.
How best do I do this - I can think of 3 implementations:
1) Do a JOIN and return the receipt resource with the has_chocolate additional field
2) Do a JOIN and return the receipt resource with a donut object containing ALL relevant donut information
3) Pull in a page of receipt objects, collect and de-duplicate the donut_ids, and use them to pull in the required donut objects - either one at a time /donut/id, or all at once /donuts?ids=id1,id2,id3
A RESTful API should have endpoints that resolve to a resource or a collection of resources. You can then do an HTTP verb on that endpoint, in this case a GET. Therefore you need to make a decision about how do you define your resources for the API user?
Are there two resources, donuts and receipts, like in your SQL?
Do you want to define a donut as a resource that has a receipt as one of the fields?
Either is fine, the problem is when you start making routes that are a resource 'with something extra on top'. That starts to become hard for the consumer to understand and not RESTful.
If it were me I would choose option 3.
Define two separate endpoints for collections of donuts and receipts: /v0/donuts/ and /v0/receipts/
Allow the consumer to join on the client side by exposing the neccessary filters /v0/donuts/?ids=1,2,3,8
You could make a /v0/events/< id>/ route but seeing as the use-case is to always get a batch of donuts then this would result in too many round-trips for the consumer.
It also sounds like you want to paginate these collections. In this case you should define a max_page_size, default_page_size and you should return your client a next_page field in the response (which is null if it's the last page). You would also have to decide what to paginate on, probably the id in this case.

Is it a bad practice to return an object in a POST via Web Api?

I'm using Web Api and have a scenario where clients are sending a heartbeat notification every n seconds. There is a heartbeat object which is sent in a POST rather than a PUT, because as I see it they are creating a new heartbeat rather than updating an existing heartbeat.
Additionally, the clients have a requirement that calls for them to retrieve all of the other currently online clients and the number of unread messages that individual client has. It seems to me that I have two options:
Perform the POST followed by a GET, which to me seems cleaner from a pure REST standpoint. I am doing a creation and a retrieval and I think the SOLID principles would prefer to split them accordingly. However, this approach means two round trips.
Have the POST return an object which contains the same information that the GET would otherwise have done. This consolidates everything into a single request, but I'm concerned that this approach would be considered ill-advised. It's not a pure POST.
Option #2 stubbed out looks like this:
public HeartbeatEcho Post(Heartbeat heartbeat)
{
}
HeartbeatEcho is a class which contains properties for the other online clients and the number of unread messages.
Web Api certainly supports option #2, but just because I can do something doesn't mean I should. Is option #2 an abomination, premature optimization, or pragmatism?
The option 2 is not an abomination at all. A POST request creates a new resource, but it's quite common that the resource itself is returned to the caller. For example, if your resources are items in a database (e.g., a Person), the POST request would send the required members for the INSERT operation (e.g., name, age, address), and the response would contain a Person object which in addition to the parameters passed as input it would also have an identifier (the DB primary key) which can be used to uniquely identify the object.
Notice that it's also perfectly valid for the POST request only return the id of the newly created resource, but that's a choice you have, depending on the requirements of the client.
public HttpResponseMessage Post(Person p)
{
var id = InsertPersonInDBAndReturnId(p);
p.Id = id;
var result = this.Request.CreateResponse(HttpStatusCode.Created, p);
result.Headers.Location = new Uri("the location for the newly created resource");
return result;
}
Whichever way solves your business problem will work. You're correct POST for new record vs PUT for update to existing record.
SUGGESTION:
One thing you may want to consider is adding Redis to your stack and the apps can post very fast, then you could use the Pub/Sub functionality for the echo part or Blpop (blocking until record matches criteria). It's super fast and may help you scale and perfectly designed for what you are trying to do.
See: http://redis.io/topics/pubsub/
See: http://redis.io/commands/blpop
I've used both Redis for similar, but also RabbitMQ and with RabbitMQ we added socket.io connection to "stream" the heartbeat in real time without need for long polling.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.