How to merge/consolidate responses from multiple RESTful microservices? - rest

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?

The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/

You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html

You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .

Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.

Related

Designing REST api for returning count of multiple objects

I have a question on how should I design my rest api(s) given that I need to return count of different objects in my application. There are multiple approaches that could be thought off
Defining a single api that generally accepts the object identifier in the request body (json) and returns back the count for each of the object identifiers in the response. The drawback is the api is too generic and possibly not restful since there is no resource. The url would look like GET /counts
Define individual api's for each of the resources for which count is needed. Assuming I need to return counts for the fields defined in the software, tables, processes, tasks, jobs etc I would define individual api's for each of these resources. It would look like GET /field/count or GET /table/count. A side effect of this would be there would be many web api's for each of the resources we need the count for. Is that bad?
Kindly share your thoughts on the above approaches and any new ways in which I could design such an API that adheres to the REST standards.
Thanks
It totally depends on the client that is consuming the APIs.
Case 1. If its a WebApp which needs count of many tables on a single page then both will lead to bad design where you will have to make hundreds of calls just for counts data. You can club counts in a single API and send that as a response.
Case 2. If the client are individually using the count then i would recommend the 2nd approach where the resource is clearly defined. For the 2nd approach you are making the client intelligent which is not recommended.
As mentioned in the comments REST is a totally subjective topic so there can be multiple view points to every design.

Service which provides an interface to an async service and Idempotency violation

Please keep in mind i have a rudimentary understanding of rest and building services. I am asking this question mostly cause i am trying to decouple a service from invoking a CLI(within the same host) by providing a front to run async jobs in a scalable way.
I want to build a service where you can submit an asynchronous job. The service should be able to tell me status of the job and location of the results.
APIs
1) CreateAsyncJob
Input: JobId,JobFile
Output: 200Ok (if job was submitted successfully)
2) GetAsyncJobStatus
Input: JobId
Output: Status(inProgress/DoesntExist/Completed/Errored)
3)GetAsyncJobOutput
Input: JobId
Output: OutputFile
Question
The second API, GetAsyncJobStatus violates the principles of idempotency.
How is idempotency preserved in such APIs where we need to update the progress of a particular job ?
Is Idempotency a requirement in such situations ?
Based on the link here idempotency is a behaviour demonstrated by an API by producing the same result during it's repeated invocations.
As per my understanding idempotency is at per API method level ( we are more concerned about what would happen if a client calls this API repeatedly). Hence the best way to maintain idempotency would be to segregate read and write operations into separate APIs. This way we can reason more throughly with the idempotent behavior of the individual API methods. Also while this term is gaining traction with RESTful services, the principles hold true even for other API systems.
In the use case you have provided the response to the API call made by the client would differ (depending upon the status of the job).Assuming that this API is read-only and does not perform any write operations on the server, the state on the server would remain the same by invoking only this API - for e.g. if there were 10 jobs in the system in varied states calling this API 100 times for a job id could result in different status every time for the job id (based on it's progress) - however the number of jobs on the server and their corresponding states would be the same.
However if this API were to be implemented in a way that would alter the state of the server in some way - then this call is not necessarily idempotent.
So keep two APIs - getJobStatus(String jobId) and updateJobStatus(String jobId). The getJobStatus is idempotent while updateJobStatus is not.
Hope this helps

architectural design for REST API with views across resources

Looking for some input on a REST API architectural design. I often find that the desired data is the combination of a view across multiple resources. Would you expect the client to combine them, or provide an API that does the combination for the client?
For example, let's say we are writing a REST API for people to become notified about events. Someone will indicate interest in an event in one of 2 ways:
Join an organization that regularly puts on events that the person has interest in
Search for and then mark a particular event run by an organization I wouldn't normally subscribe to
I can retrieve all of the events for user 100 by doing the following long steps:
GET /user/100/organizations returns 123
GET /organizations/123/events returns [15,16,20]
GET /user/100/savedevents returns [35,36]
GET /events/15,16,20,35,36 returns all of the events
But that seems rather heavy for a client. I almost want a client to be able to say, "give me all of the interesting events for this user":
GET /user/100/events
...and then require the server to understand that it has to go through all of steps 1-4 and return them, or, at the very least, return [15,16,20,35,36] so it becomes 2 steps: get event IDs; get event details.
Does this even make sense, to make a view that cuts across multiple resources that way?
EDIT: To explain further. My hesitation is because I can see how /organizations/123/events is a clean resource; if is identical to saying /events?organizations=123, i.e. "give me resource events where organizations=123". Same for /user/100/organizations.
But /user/100/events is not "give me resource events where organizations=123". It is "give me organizations registrations where user=100, retrieve those organization ids, then give me the events where the organization=123, then give me savedevents where user=100."
Each of the three steps itself is a clean resource mapping. Putting them together seems messy. But so does asking a client (especially a Web client), to figure out all that logic!
I was a bit confused by your question, so I'll try to be as comprehensive as possible and hopefully I'll have hit on an answer you need =P.
I often find that the desired data is the combination of a view across
multiple resources. Would you expect the client to combine them, or
provide an API that does the combination for the client?
In a true RESTful environment all cross-sectional views of data would be done by the server, not by the client.
The primary reason for a RESTful design is allow access to the CRUD model (create, read, update, delete) by way of using standard HTTP verbs (e.g. GET, POST, PUT, DELETE). Storing the returns of these methods in some sort of session or cookie or otherwise external method (e.g. "give me data for bob", "give me data on businesses", "give me data from my first two queries") goes above and beyond the REST methodology.
The way you'll want to leverage RESTful development is to find ways of combining resources in meaningful ways so as to provide a RESTful environment where the method calls are consistent; GET reads data, POST creates data, PUT updates data, DELETE deletes data).
So if you wanted to do something like Steps 1 through 4 I'd recommend something like:
GET /user/{userID}/organizations --> {return all affiliated organizations}
GET /user/{userID}/events --> {return all events associated with userID}
GET /organizations/{organization}/events --> {returns all eventID's assoc. with organization}
GET /user/{userID}/savedevents --> {return all eventID's userID saved to their profile}
GET /events/?eventID=(15,16,20,35,36) --> {return all of the events details for those eventID's}
GET /events/{eventID}--> {return events details for {eventID}}
Whereas you might also have:
GET /events/ --> {return a complete listing of all event ID's}
GET /events/{userID} --> {return all events userID is associated with}
POST /event/ --> {create a new event - ID is assigned by the server}
POST /user/ --> {create a new user - ID is assigned by the server}
PUT /user/{userID} --> {update/modify user information}
Then if you want cross-sectional slices of information, you would have a named resource for the cross section (else pass it as arguments). Be explicit with your resources (Random FYI, name your resources as nouns only - not verbs).
You also asked:
To explain further. My hesitation is because I can see how
/organizations/123/events is a clean resource; if is identical to
saying /events?organizations=123, i.e. "give me resource events where
organizations=123". Same for /user/100/organizations.
Essentially both the named resourced and the resource + argument method can provide the same information. Typically I have seen RESTful design API call for arguments only when an important delineation is required (range requests, date requests, some REALLY small unit of data, etc.). If you have some higher-order grouping of data that CAN BE parsed/introspected further then it's a named resource. In your example, I'd have it both API calls, as the RESTful spec calls for providing data via multiple paths and by way of using the established HTTP methods. However, I'd also expand a bit...
/events?organizations=123 --> {return the eventID's associated with org=123}
/organizations/123/events --> {return event DETAILS for events associated with org=123}
Have a read/go at this, by Apigee
There may be several ways to solve this... however, I think that most of the times (if the service is managed by the same provider) it is better to have the logic on the server-side and make REST calls as independent as possible of each other (i.e., the server performing the multiple operations required - normally read data from DBs that are store the data handled in the API resources).
In the example you talk about this would mean your REST API would expose a "user" resource and a sub-resource "events" (which you call "savedevents") he is interested in. With this in mind you would have something like this:
POST /user/{username}/events stores a new event (or multiple events) the user is interested in
GET /user/{username}/events returns all the events the user is interested in
GET /user/{username}/events/{eventid} returns details of a specific event
To "filter" user events per organization (and other filtering operations) you can use "query parameters":
GET /user/{username}/events?organization=123
So, the server (or API call) would perform the operations you describe from step 1 to step 4 in the GET /user/{username}/events. You can still make the other resources ("organizations" and "events") in your API, however they would be used in other contexts (like store new events or organizations, etc.).
HTH

Is it a bad practice to return an object in a POST via Web Api?

I'm using Web Api and have a scenario where clients are sending a heartbeat notification every n seconds. There is a heartbeat object which is sent in a POST rather than a PUT, because as I see it they are creating a new heartbeat rather than updating an existing heartbeat.
Additionally, the clients have a requirement that calls for them to retrieve all of the other currently online clients and the number of unread messages that individual client has. It seems to me that I have two options:
Perform the POST followed by a GET, which to me seems cleaner from a pure REST standpoint. I am doing a creation and a retrieval and I think the SOLID principles would prefer to split them accordingly. However, this approach means two round trips.
Have the POST return an object which contains the same information that the GET would otherwise have done. This consolidates everything into a single request, but I'm concerned that this approach would be considered ill-advised. It's not a pure POST.
Option #2 stubbed out looks like this:
public HeartbeatEcho Post(Heartbeat heartbeat)
{
}
HeartbeatEcho is a class which contains properties for the other online clients and the number of unread messages.
Web Api certainly supports option #2, but just because I can do something doesn't mean I should. Is option #2 an abomination, premature optimization, or pragmatism?
The option 2 is not an abomination at all. A POST request creates a new resource, but it's quite common that the resource itself is returned to the caller. For example, if your resources are items in a database (e.g., a Person), the POST request would send the required members for the INSERT operation (e.g., name, age, address), and the response would contain a Person object which in addition to the parameters passed as input it would also have an identifier (the DB primary key) which can be used to uniquely identify the object.
Notice that it's also perfectly valid for the POST request only return the id of the newly created resource, but that's a choice you have, depending on the requirements of the client.
public HttpResponseMessage Post(Person p)
{
var id = InsertPersonInDBAndReturnId(p);
p.Id = id;
var result = this.Request.CreateResponse(HttpStatusCode.Created, p);
result.Headers.Location = new Uri("the location for the newly created resource");
return result;
}
Whichever way solves your business problem will work. You're correct POST for new record vs PUT for update to existing record.
SUGGESTION:
One thing you may want to consider is adding Redis to your stack and the apps can post very fast, then you could use the Pub/Sub functionality for the echo part or Blpop (blocking until record matches criteria). It's super fast and may help you scale and perfectly designed for what you are trying to do.
See: http://redis.io/topics/pubsub/
See: http://redis.io/commands/blpop
I've used both Redis for similar, but also RabbitMQ and with RabbitMQ we added socket.io connection to "stream" the heartbeat in real time without need for long polling.

RESTful way to create multiple items in one request

I am working on a small client server program to collect orders. I want to do this in a "REST(ful) way".
What I want to do is:
Collect all orderlines (product and quantity) and send the complete order to the server
At the moment I see two options to do this:
Send each orderline to the server: POST qty and product_id
I actually don't want to do this because I want to limit the number of requests to the server so option 2:
Collect all the orderlines and send them to the server at once.
How should I implement option 2? a couple of ideas I have is:
Wrap all orderlines in a JSON object and send this to the server or use an array to post the orderlines.
Is it a good idea or good practice to implement option 2, and if so how should I do it.
What is good practice?
I believe that another correct way to approach this would be to create another resource that represents your collection of resources.
Example, imagine that we have an endpoint like /api/sheep/{id} and we can POST to /api/sheep to create a sheep resource.
Now, if we want to support bulk creation, we should consider a new flock resource at /api/flock (or /api/<your-resource>-collection if you lack a better meaningful name). Remember that resources don't need to map to your database or app models. This is a common misconception.
Resources are a higher level representation, unrelated with your data. Operating on a resource can have significant side effects, like firing an alert to a user, updating other related data, initiating a long lived process, etc. For example, we could map a file system or even the unix ps command as a REST API.
I think it is safe to assume that operating a resource may also mean to create several other entities as a side effect.
Although bulk operations (e.g. batch create) are essential in many systems, they are not formally addressed by the RESTful architecture style.
I found that POSTing a collection as you suggested basically works, but problems arise when you need to report failures in response to such a request. Such problems are worse when multiple failures occur for different causes or when the server doesn't support transactions.
My suggestion to you is that if there is no performance problem, for example when the service provider is on the LAN (not WAN) or the data is relatively small, it's worth it to send 100 POST requests to the server. Keep it simple, start with separate requests and if you have a performance problem try to optimize.
Facebook explains how to do this: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Simple batched requests
The batch API takes in an array of logical HTTP requests represented
as JSON arrays - each request has a method (corresponding to HTTP
method GET/PUT/POST/DELETE etc.), a relative_url (the portion of the
URL after graph.facebook.com), optional headers array (corresponding
to HTTP headers) and an optional body (for POST and PUT requests). The
Batch API returns an array of logical HTTP responses represented as
JSON arrays - each response has a status code, an optional headers
array and an optional body (which is a JSON encoded string).
Your idea seems valid to me. The implementation is a matter of your preference. You can use JSON or just parameters for this ("order_lines[]" array) and do
POST /orders
Since you are going to create more resources at once in a single action (order and its lines) it's vital to validate each and every of them and save them only if all of them pass validation, ie. you should do it in a transaction.
I've actually been wrestling with this lately, and here's what I'm working towards.
If a POST that adds multiple resources succeeds, return a 200 OK (I was considering a 201, but the user ultimately doesn't land on a resource that was created) along with a page that displays all resources that were added, either in read-only or editable fashion. For instance, a user is able to select and POST multiple images to a gallery using a form comprising only a single file input. If the POST request succeeds in its entirety the user is presented with a set of forms for each image resource representation created that allows them to specify more details about each (name, description, etc).
In the event that one or more resources fails to be created, the POST handler aborts all processing and appends each individual error message to an array. Then, a 419 Conflict is returned and the user is routed to a 419 Conflict error page that presents the contents of the error array, as well as a way back to the form that was submitted.
I guess it's better to send separate requests within single connection. Of course, your web-server should support it
You won't want to send the HTTP headers for 100 orderlines. You neither want to generate any more requests than necessary.
Send the whole order in one JSON object to the server, to: server/order or server/order/new.
Return something that points to: server/order/order_id
Also consider using CREATE PUT instead of POST