Batch processing via REST service

Batch processing via REST service - rest

Are there any best practices for performing BATCH operations via REST for POST, PUT, PATCH verbs?
The current paradigm I am following is that the JSON payload is specified in the body for all 3 operations:
a) POST to return the location of the created resource
b) PUT / PATCH return a 201 if the update is successful
For a batch operation, I intend to accept a collection of JSON objects in the payload body but am trying to figure what to return to the client.
While processing the batch, the operation may succeed for some of the items but might fail for others.
Taking this into account, my take is that the best thing to do is to return a collection of objects indicating the Success/Failure status of each item from the payload.
But this deviates from my paradigm outlined in (a) and (b) above.
Instead, does it make sense to return an identifier representing an ID of the Batch operation itself to the client?
The client would then issue a subsequent GET to get the result of the operation it requested.
Does this approach sound reasonable? If so, does it make sense to block the client on the subsequent GET if the operation hasn't completed OR does it make sense to always return the most current state i.e. a collection of responses for each of the items that the client requested to process.
Ideas/Thoughts/Suggestions?
Since REST is an architectural style with no necessarily clear "guidelines" and no
mandate on how the actions for HTTP verbs should be implements, clearly there is no right or wrong answer here.
I am looking for a solution that is elegant, natural and intuitive.

REST operations are supposed to be atomic as seen from the outside. That is, if one part of the request fails, then the whole state of the server should revert to the pre-request state and a 4xx or 5xx response returned (so, for example, the request could be repeated in whole without ill effects if it failed the first time). This however has nothing to do with batch operations per se—such a request could be any kind of request.
Batch operations violate a different REST constraint, that of the uniform interface (defined by HTTP's methods and their operation upon a resource at the specified URL).
If you want to do batch operations, give up on trying to call your API RESTful, because you have already lost out on the benefits that REST imparts, and are just lying to yourself.
If you want to retain those benefits, give up on batch operations.

REST is an architectural style.
I implement APIs to always return the result of what was updated. So in the case of a POST it will return the created entity, with PATCH and PUT it will return the updated entity.
Depending on the batch size I would either return an array of what was processed, or alternatively an array of identifiers of what was processed.
If the batch operation is long running return an identifier for the batch, but make it painfully clear that the endpoint is different from other endpoints
ex. if you create a user by posting to
http://somesite.com/users
send batch requests to
http://somesite.com/batch/users
On a get return the status of the batch operation while it is still running, on complete return an array of record that were updated.
The most important thing is consistency, whatever you choose, always follow the same approach with batch operation throughout the system.

Related

Which HTTP Verb should I use to claim and lock an item in a job queue?

I plan on using an HTTP REST interface to connect to a Job Control service.
One key operation is to request a computational Job.
The caller does not know the ID of the Job; that is what it will be told.
The job will be marked in the database as locked by the service.
The data needed for processing of the job will be returned to the caller.
Later on, when the caller is done processing the job, it will send the results back via another REST call.
Now it knows the ID of the record to be updated.
The second REST call will update the Job record with the results.
and change the Job's status and release the lock.
Only the Success/Fail status needs to be returned.
I am leaning towards using PUT for each operation because no new record is being created; it is being updated in both cases.
Is this proper? Can the first PUT return a large JSON payload with the Job data or does it just return an HTTP status? Should I use a POST instead, even though I am not creating a record, just updating it?
I would have used a GET for the first operation, but a GET is not supposed to change any objects on the service, and I am locking it, which is a change. Is locking a record acceptable in a GET request?

Which HTTP Verb should I use to claim and lock an item in a job queue?
Key idea: a REST API is a facade - your application/service pretends to be an HTTP compliant document store. All of the interesting things that happen are side effects triggered by modifying documents. See Jim Webber, 2011.
With that in mind...
POST is fine. It's okay to use POST.
PUT/PATCH are a good for remote authoring; the client fetches your representation of a resource, makes edits to his local copy, and sends you a copy of the representation (PUT) or a patch document describing the changes (PATCH). The server can then apply those edits to its copy, or not.
So for your specific example, I would expect the client to GET a representation of your resource, change the information in that representation from unlocked to locked, and then to PUT the changed representation back to your server. You server would be expected to update your copy of the representation to match.
It may remind you of a declarative style - the client tells the server what the representation should look like, and it's up to the server to figure out how to do that.
Included for Completeness, NOT Recommened:
The HTTP method registry also includes a method LOCK, with a corresponding UNLOCK. The semantics for these method tokens are defined by the WebDAV specification. If your meaning of LOCK matches that of WebDAV, then using that might be an answer. Note that the specification includes comments like
Any resource that supports the LOCK method MUST, at minimum, support the XML request and response formats defined herein.
Unless you are already in a space where people are expecting to be able to use general-purpose WebDAV clients to interact with your API, that's probably not a good fit.
The HTTP method registry is extendable. So you could define the semantics of your own method token, then push to have it adopted as a standard.

Idempotentency of GET verb in an RESTful API

As it was mentioned here https://restfulapi.net/http-methods/ (and in other places as well):
GET APIs should be idempotent, which means that making multiple
identical requests must produce same result everytime until another
API (POST or PUT) has changed the state of resource on server.
How to make this true in an API that return time for example? or that return data that is affected by time.
In other words, each time I use GET http://ip:port/get-time-now/, it is going to return a different response. However, I did not send any POST or PUT between two sequenced GET's
Does this make the previous statement wrong? Did I misunderstand something?

Idempotency is a promise to clients/intermediaries that the request can be reissued in case of network failures or the like without any further considerations and not so much that the data will never change.
If you take a POST request for example, in case of a network failure you do not know if the previous request reached the server but the response got lost midway or if the initial request didn't even reach the server at all. If you re-issue the request you might create a further resource actually, hence POST is not idempotent. PUT on the other side has the contract that it replaces the current representation with the one contained in the request. If you send the same request twice the content of the resource should be the same after any of the two PUT requests was processed. Note that the actual result can still differ as the service is free to modify the received entity to a corresponding representation. Also, between sending the data via PUT and retrieving it via GET a further client could have updated the state in between, so there is no guarantee that you will actually receive the exact representation you've sent to the service.
Safetiness is an other promise that only GET, HEAD and OPTIONS supports. It promises the invoker that it wont modify any state at all hence clients/intermediaries are safe on issuing such request without having to fear that it will modify any state. In practice this is an important promise to crawlers which blindly invoke any URLs in order to learn their content. In case of violating such promises, i.e. by deleting data while processing a GET request the only one to blame is the service implementor but not the invoker. If a crawler invokes such URLs and hence removes some data it is not the crawlers fault actually but only the service implementor.
As you have a dynamic value in your response, you might want to prevent caching of responses though as otherwise intermediaries might return an old state for your resource

The main basic concept of idempotent and safe methods of HTTP:-
Idempotent Method:- The method can called multiple times with same input and it produce same result.
Safe Method:- The method can called multiple times with same input and it doesn't modify the resource onto the server side.
Http methods are categorized into following 3 groups-
GET,HEAD,OPTIONS are safe and idempotent
PUT,DELETE are not safe but idempotent
POST,PATCH are neither safe & nor idempotent

How to handle network connectivity loss in the middle of REST POST request?

REST POST is used to create resources.
Let's say we have resource url
"http://example.com/cars"
We want to create a new car.
We POST to "http://example.com/cars" with JSON payload containing car properties (color, weight, model, etc).
Server receives the request, creates a new car, sends a response over the network.
At this point network fails (let's say router stops working properly and ignores every packet).
Client fails with TCP timeout (like 90 seconds).
Client has no idea whether car was created or not.
Also client haven't received car resource id, so it can't GET it to check if it was created.
Now what?
How do you handle this?
You can't simply retry creating, because retrying will just create a duplicate (which is bad).

REST POST is used to create resources.
HTTP POST is used for lots of things. REST doesn't particularly care; it just wants resources that support a uniform interface, and hypermedia.
At this point network fails
Bummer!
Now what? How do you handle this? You can't simply retry creating, because retrying will just create a duplicate (which is bad).
This is a general messaging concern, not directly related to REST. The most common solution is to use the Idempotent Receiver pattern. In short, you
need to define your messages so that the receiver has enough information to recognize the request as something that has already been done.
Ideally, this is being supported at the business level.
Idempotent collections of values are often straight forward; we just need to be thinking sets, rather than lists.
Idempotent collections of entities are trickier; if the request includes an identifier for the new entity, or if we can compute one from the data provided, then we can think of our collection as a hash.
If none of those approaches fits, then there's another possibility. Instead of performing an idempotent mutation of the collection, we make the mutation of the collection itself idempotent. Think "compare and swap" - we encode into the request information that identifies the current state of the collection; is that state is still current when the request arrives, then the mutation is applied. If the condition does not hold, then the request becomes a no-op.
Translating this into HTTP, we make a small modification to the protocol for updating the collection resource. First, we GET the current representation; and in the meta data the server provides validators that can be used in subsquent requests. Having obtained the validator, the client evaluates the current representation of the resource to determine if it needs to be changed. If the client decides to make a change, then submits the change with an If-Match or an If-Unmodified-Since header including the validator. The server, before processing the requests, then considers the validator, immediately abandoning the request with 412 Precondition Failed.
Thus, if a conditional state-changing request is lost, the client can at its own discretion repeat the request without concern that server will misunderstand the client's intent.

Retry it a limited number of times, with increasing delays between the attempts, and make sure the transaction concerned is idempotent.
because retrying will just create a duplicate (which is bad).
It is indeed, and it needs fixing, see above. It should be impossible in your system to create two entries with the same attributes. This is easily accomplished at the database level. You can attain idempotence by having the transaction return the same thing whether the entry already existed or was newly created. Or else just have it return EXISTS if the entry already exists, and adjust your client accordingly.

Which verb to use for a REST request which sends data and gets data back?

Search - request contains query parameters e.g. search term and pagination values. No changes/data is persisted to backend.
I currently use GET with query parameters here.
Data conversion - request contains data in format A and server sends data in format B. No changes/data is persisted to backend.
I currently use POST with request parameters here.

For your Data Conversion use case (which seems to be more of a function that working with a representation of something on the server), the answer is more grounded in higher-level HTTP verb principles than RESTful principles. Both cases are non-idempotent: they make no changes to the server, so GET should be used.
This question has a good discussion of the topic, especially this comment:
REST and function don't go well together. If an URL contains function, method, or command, I smell RPC – user1907906

Search - request contains query parameters e.g. search term and pagination values. No changes/data is persisted to backend.
If the request is supposed to generate no changes on the back end, then you are describing a request which is safe, so you should choose the most suitable safe method - GET if you care about the representation, HEAD if you only care about the meta data.
Data conversion - request contains data in format A and server sends data in format B. No changes/data is persisted to backend.
Unless you can cram the source representation into the URL, POST is your only reasonable choice here. There is no method in HTTP for "this is a safe method with a payload".
In practice, you could perhaps get away with using PUT rather than POST -- it's an abuse of the uniform interface, but one that allows you to communicate at least the fact that the semantics are idempotent. The key loophole is:
there is no guarantee that such a state change will be observable, since the target resource might be acted upon by other user agents in parallel, or might be subject to dynamic processing by the origin server, before any subsequent GET is received. A successful response only implies that the user agent's intent was achieved at the time of its processing by the origin server.

High Scale REST API

One of our REST APIs will cause a long-running process to execute. Rather than have the client wait for a long time, we would prefer to return an immediate response.
So, let's consider this use case: An applicant submits an application, for which there will be an eventual result. Since this is a very high-scale platform, we cannot persist the application to storage, but must place it onto a queue for processing.
In this situation, is it acceptable practice to return the URI where the application will eventually live, such as http://example.com/application/abc123?
Similarly, would it be acceptable practice to return the URI of the result document, which represents the decision regarding the application, as part of the representation of the application resource? The result document will not be created for some minutes, and an HTTP GET to its URI (or the URI of the application for that matter) will result in a 404 until they are persisted.
What is the best practice in this kind of situation? Is it acceptable to hand out "future" URIs for resources?

I don't see anything wrong with such design, but have a closer look at the list of HTTP status codes for better responses. IMHO the first request should return 202 Accepted:
The request has been accepted for processing, but the processing has not been completed.
while requests to the URL where the result will eventually be should in the meantime return 204 No Content (?):
The server successfully processed the request, but is not returning any content
And of course it should eventually return 200 OK when processing finishes.

From "RESTful Web Services Cookbook"
Problem
You want to know how to provide resource abstractions for
tasks such as performing computations or validating data.
Solution
Treat the processing function as a resource, and use HTTP GET to fetch
a representation containing the output of the processing function. Use
query parameters to supply inputs to the processing function.
This entails just GET requests on a URI that represents the processing function. Your example 'http://example.com/application/abc123' URI. When returning a response you would include what information you have by now and use HTTP codes to indicate the status of the processing as already suggested by Tomasz.
However..., you should not use this approach, if the subsequent application processing stores or modifies data in any way.
GET requests should never have side effects. If the submittal of the application leads in anyway (even if only after being processed in from queue) to new information / data being stored, you should use a PUT or a POST request with the application's data in the request's body. See "Why shouldn't data be modified on an HTTP GET request?" form more information.
If they application's submittal stores or modifies data, use the pattern for asynchronous processing: a POST or PUT request with the application's details.
For example
POST http://example.com/applications
which returns "201 Created" with the URI of the new application resource.
or
PUT http://example.com/applications/abc123
which returns "201 Created" and
Both would also return any resource information that is already known at that time.
You can then safely perform GET requests on the URI of the new resource as they now only retrieve data - the results of the application processing so far - and no data is stored or modified as a result of the GET.
To indicate the application's processing progress, the GET request can either return some specific status code in the response (queued, processing, accepted, rejected), and/or use the HTTP response codes. In either case a "200 OK" should only be returned when the application's processing is complete.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse