asynchronous bulk data validations service - GET or POST? - rest

Here is a different scenario for GET or POST confusion. I am working on a web application built with spring-boot microservice architecture where there is a need of validate and update some bulk data from excel sheet.
There can be 500-1000 records in excel sheet with 6 different columns for bulk processing. Once UI submits the excel sheet to server from then the total process is asynchronous. There are microservice to microservice calls which I am getting confused to have GET or POST.
Here is a problem: I have 4 microservices (let's say orchestra-service,A-service,B-service and C-service).
OrchestraService creates a DTO list from excel sheet which will be used in further calls. Orchestra calls 'A'. 'A' validates the data with DB and marks success and failure records in DTO list object and returns the list back to orchestra. Again orchestra calls 'B', it does the similar job like 'A' and returns back to orchestra.
Now orchestra calls 'C' which will update success records into database, updates the file status on database and also creates a new resultant excel sheet with error messages per row which will be emailed to the user later(small report kind of thing).
In above microservice to microservice calls only C is updating database and creating resource on server. All above calls I used POST method because I need the request body to pass my input list to all services.
According to HTTP Standards am I doing right?
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process it should be a POST call.
Please advice me whether:
I should use POST for only 'C' and GET for others or
It should be POST for all as other process involves in data filtering process.
NOTE: service A,B, and C not all services uses all the columns of excel but some of them in combinations. One column having 18 characters long data so I think it can be a problem with GET header limit for bulk operation.

Http Protocol
There is no actual violation on passing information on GET and if that request doesn't mutate between identical requests, then it's fine.
Microservice wise
Now for clarification, are Service A and Service B actually needed ?
Aren't they the same Domain as Service C, and can reside inside of him ?
It's more then good practice to have a Microservice validate its own domain and return a collection of success and failure with the relevant messages.

I had the similar question few years back and here is the possible solution for the first part of your question.
As mentioned by #Oreal Eraki in his answer, I would also question whether you need services A and B. If its just validation and data transformation it can be done in the same domain where the data is actually stored.

Related

What REST method should be used to implement a simple inter-process communication?

This is more a theorical question than a practical one.
We have a backend application that uploads csv files to a frontend application, then and only then the backend sends an empty POST request to tell the frontend to start to process those files to update its database.
For this question it doesn't matter if this is a good design (I think it isn't), what are those files, and what database is: I am only want to know better about the REST "sintax".
I'm referring to wikipedia and restfulapi.net, but I'm not convinced about any alternative, because:
GET: Request sender doesn't receive data;
POST (the currently used): Request sender doesn't want to insert data that are on the request body (just data from external files, if existent. Also they can be insert/update/delete);
PUT: Sounds good, but again, data are not on the request body;
PATCH: Sounds best, but data are not on the body (Also, I am wrong or is it deprecated/unused?);
DELETE: Doesn't always need to delete.
I know it is habit to use POST requests to let machines yell "go!" to each other, but I never thought it was right.
What do you think - in theory - would be the proper method?
The actual reference for the semantics of the HTTP methods is the RFC 7231 and not the ones you referenced in your question.
POST is a catch all method and requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
4.3.3. POST
The POST method requests that the target resource process the
representation enclosed in the request according to the resource's
own specific semantics. For example, POST is used for the following
functions (among others):
Providing a block of data, such as the fields entered into an HTML
form, to a data-handling process;
Posting a message to a bulletin board, newsgroup, mailing list,
blog, or similar group of articles;
Creating a new resource that has yet to be identified by the
origin server; and
Appending data to a resource's existing representation(s).
[...]
Responses to POST requests are only cacheable when they include
explicit freshness information. However, POST caching is not widely implemented.
In these scenarios, the receiving application knows where the CSV files will be and monitors that location. When it finds one, it processes it and then deletes or archives it. The application will likely have its own criteria for considering itself ready to process, e.g. time of day, size of file etc.
If the data load on the front end takes a long time you could "partition" the updates based on "importance". How you define importance would be up to your business rules. You could then POST a list of CSV filenames/locations to the front end. The list would be ordered by importance. The front end could then update its database based on that importance. Scheduling less important data for a more appropriate time of day.
If the backend knows the difference between new users and updated users you could use PUT and POST. The front end could assign higher priority to PUT requests as they relate to new users, perhaps assigning lower priority and staggered syncing for CSV filenames in POST requests.

REST API containing POST and PUT/PATCH calling a compute server generating results files

The server application I'm implementing generates calculation results and stores these in result files in directories on the server. For example, customer/project/scenario/resultfiles. I want to design and implement a resilient REST implementation to retrieve the result files for display in the client browser, delete results files, customers etc and to create result files within a scenario for calculation parameters sent to the server. And possibly to do sensitivity analysis to generate result files within a scenario by varying calculation parameters.
I can use GET to retrieve these files using a URL with query string appname/?customerId=xxx&projectId=xxx etc And DELETE on the directory structure and files also using query strings. What I'm unclear about is the best REST approach to call functions implementing various calculations on the server.
Perhaps this should be a POST for the initial calculation in a scenario as this is creating the results files? Maybe a PUT or a PATCH for the sensitivity analysis or other partial recalculations as this is modifying results in an existing scenario?
There's a fair bit of online discussion about PUT vs PATCH vs POST used for database related activities. I could work up a REST approach based on what I've read for REST database interactions but if there's already standard practice on how to do calculations through a REST API I'd rather use that.
Perhaps this should be a POST for the initial calculation in a scenario as this is creating the results files? Maybe a PUT or a PATCH for the sensitivity analysis or other partial recalculations as this is modifying results in an existing scenario?
You can always just use POST. If we were using HTML representations of resources to guide the client through the protocol, we'd be doing that by following links and submitting forms. In HTML, submitting forms is limited to GET and POSt.
PUT and PATCH have more tightly constrained semantics than POST. Specifically, they are methods that request that the server make its representation match the clients representation (for PUT, we send the entire replacement representation; for PATCH, we just send the changes made by the client).
Technically, there's nothing wrong with the server not accepting the offered edits as is:
A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response. However, there is no guarantee that such a state change will be observable, since the target resource might be acted upon by other user agents in parallel, or might be subject to dynamic processing by the origin server, before any subsequent GET is received. A successful response only implies that the user agent's intent was achieved at the time of its processing by the origin server.
So the server could accept the client's edits, and then immediately apply additional edits of its own.

Avoiding same post request in rest controller in spring boot

Suppose i have entered the fields in my requestDto and made a post call which will save the data in database and if i again hit the post api with same requestDto fields, the entries should not be saved.
How i can achieve this functionality in spring boot
Bear in mind that POST requests are not idempotent: with multiple identical requests you may end up with multiple identical resources getting created.
So, to prevent resources from getting created multiple times, you'll need some sort of validation in your server. You could rely, for example, in a unique constraint in your database and, if the constraint is violated, you can refuse the request with 409.

How to merge/consolidate responses from multiple RESTful microservices?

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?
The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/
You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html
You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .
Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.

RESTful way to create multiple items in one request

I am working on a small client server program to collect orders. I want to do this in a "REST(ful) way".
What I want to do is:
Collect all orderlines (product and quantity) and send the complete order to the server
At the moment I see two options to do this:
Send each orderline to the server: POST qty and product_id
I actually don't want to do this because I want to limit the number of requests to the server so option 2:
Collect all the orderlines and send them to the server at once.
How should I implement option 2? a couple of ideas I have is:
Wrap all orderlines in a JSON object and send this to the server or use an array to post the orderlines.
Is it a good idea or good practice to implement option 2, and if so how should I do it.
What is good practice?
I believe that another correct way to approach this would be to create another resource that represents your collection of resources.
Example, imagine that we have an endpoint like /api/sheep/{id} and we can POST to /api/sheep to create a sheep resource.
Now, if we want to support bulk creation, we should consider a new flock resource at /api/flock (or /api/<your-resource>-collection if you lack a better meaningful name). Remember that resources don't need to map to your database or app models. This is a common misconception.
Resources are a higher level representation, unrelated with your data. Operating on a resource can have significant side effects, like firing an alert to a user, updating other related data, initiating a long lived process, etc. For example, we could map a file system or even the unix ps command as a REST API.
I think it is safe to assume that operating a resource may also mean to create several other entities as a side effect.
Although bulk operations (e.g. batch create) are essential in many systems, they are not formally addressed by the RESTful architecture style.
I found that POSTing a collection as you suggested basically works, but problems arise when you need to report failures in response to such a request. Such problems are worse when multiple failures occur for different causes or when the server doesn't support transactions.
My suggestion to you is that if there is no performance problem, for example when the service provider is on the LAN (not WAN) or the data is relatively small, it's worth it to send 100 POST requests to the server. Keep it simple, start with separate requests and if you have a performance problem try to optimize.
Facebook explains how to do this: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Simple batched requests
The batch API takes in an array of logical HTTP requests represented
as JSON arrays - each request has a method (corresponding to HTTP
method GET/PUT/POST/DELETE etc.), a relative_url (the portion of the
URL after graph.facebook.com), optional headers array (corresponding
to HTTP headers) and an optional body (for POST and PUT requests). The
Batch API returns an array of logical HTTP responses represented as
JSON arrays - each response has a status code, an optional headers
array and an optional body (which is a JSON encoded string).
Your idea seems valid to me. The implementation is a matter of your preference. You can use JSON or just parameters for this ("order_lines[]" array) and do
POST /orders
Since you are going to create more resources at once in a single action (order and its lines) it's vital to validate each and every of them and save them only if all of them pass validation, ie. you should do it in a transaction.
I've actually been wrestling with this lately, and here's what I'm working towards.
If a POST that adds multiple resources succeeds, return a 200 OK (I was considering a 201, but the user ultimately doesn't land on a resource that was created) along with a page that displays all resources that were added, either in read-only or editable fashion. For instance, a user is able to select and POST multiple images to a gallery using a form comprising only a single file input. If the POST request succeeds in its entirety the user is presented with a set of forms for each image resource representation created that allows them to specify more details about each (name, description, etc).
In the event that one or more resources fails to be created, the POST handler aborts all processing and appends each individual error message to an array. Then, a 419 Conflict is returned and the user is routed to a 419 Conflict error page that presents the contents of the error array, as well as a way back to the form that was submitted.
I guess it's better to send separate requests within single connection. Of course, your web-server should support it
You won't want to send the HTTP headers for 100 orderlines. You neither want to generate any more requests than necessary.
Send the whole order in one JSON object to the server, to: server/order or server/order/new.
Return something that points to: server/order/order_id
Also consider using CREATE PUT instead of POST