High Scale REST API - rest

One of our REST APIs will cause a long-running process to execute. Rather than have the client wait for a long time, we would prefer to return an immediate response.
So, let's consider this use case: An applicant submits an application, for which there will be an eventual result. Since this is a very high-scale platform, we cannot persist the application to storage, but must place it onto a queue for processing.
In this situation, is it acceptable practice to return the URI where the application will eventually live, such as http://example.com/application/abc123?
Similarly, would it be acceptable practice to return the URI of the result document, which represents the decision regarding the application, as part of the representation of the application resource? The result document will not be created for some minutes, and an HTTP GET to its URI (or the URI of the application for that matter) will result in a 404 until they are persisted.
What is the best practice in this kind of situation? Is it acceptable to hand out "future" URIs for resources?

I don't see anything wrong with such design, but have a closer look at the list of HTTP status codes for better responses. IMHO the first request should return 202 Accepted:
The request has been accepted for processing, but the processing has not been completed.
while requests to the URL where the result will eventually be should in the meantime return 204 No Content (?):
The server successfully processed the request, but is not returning any content
And of course it should eventually return 200 OK when processing finishes.

From "RESTful Web Services Cookbook"
Problem
You want to know how to provide resource abstractions for
tasks such as performing computations or validating data.
Solution
Treat the processing function as a resource, and use HTTP GET to fetch
a representation containing the output of the processing function. Use
query parameters to supply inputs to the processing function.
This entails just GET requests on a URI that represents the processing function. Your example 'http://example.com/application/abc123' URI. When returning a response you would include what information you have by now and use HTTP codes to indicate the status of the processing as already suggested by Tomasz.
However..., you should not use this approach, if the subsequent application processing stores or modifies data in any way.
GET requests should never have side effects. If the submittal of the application leads in anyway (even if only after being processed in from queue) to new information / data being stored, you should use a PUT or a POST request with the application's data in the request's body. See "Why shouldn't data be modified on an HTTP GET request?" form more information.
If they application's submittal stores or modifies data, use the pattern for asynchronous processing: a POST or PUT request with the application's details.
For example
POST http://example.com/applications
which returns "201 Created" with the URI of the new application resource.
or
PUT http://example.com/applications/abc123
which returns "201 Created" and
Both would also return any resource information that is already known at that time.
You can then safely perform GET requests on the URI of the new resource as they now only retrieve data - the results of the application processing so far - and no data is stored or modified as a result of the GET.
To indicate the application's processing progress, the GET request can either return some specific status code in the response (queued, processing, accepted, rejected), and/or use the HTTP response codes. In either case a "200 OK" should only be returned when the application's processing is complete.

Related

What is the RESTFul way of representing URLs when creating the same object using a single URL structure but can call different server methods?

Say I have an endpoint for creating an object that takes time to complete which has the following URL structure: /api/v1/objects.
Business rules dictate that API consumers can either call this endpoint synchronously or asynchronously in the server. The result of this call in the end is the creation of the same object, but response upon submission of the endpoint is different between the two depending on whether it is called synchronously or asynchronously (i.e. if the call is made asynchronously, a consumer may get an identifier with no guarantee if the object will be created or not, while calling the endpoint synchronously will always create the object and return it in the response.)
Right now I have this structure for distinguishing between the synchronous and asynchronous API calls:
POST /api/v1/objects - for creating the object synchronously.
POST /api/v1/objects?async=true - for creating the object
asynchronously.
Is this approach correct and conforms to RESTFul principles?
If you look at the RFC for HTTP (see https://www.rfc-editor.org/rfc/rfc7231#section-5.1.1 ) you can see the Expect header, which basically defines what behaviour the client is expecting from the server.
Considering you can have different 20* responses (200, 201, 202, 204), you can use this to determine whether it should be async or not.
A quick Google search comes up with this example (which I copy and paste here for future reference):
A client may request the server to modify its asynchronous behavior
with the following “Expect” headers:
“Expect: 200-ok/201-created/204-no-content” disables all asynchronous
functionality. The server may return a “417 Expectation Failed” if it
is not willing to wait for an operation to complete.
“Expect: 202-accepted” explicitly request an asynchronous response. The server
may return a “417 Expectation Failed” if it is not willing to perform
the request asynchronously. If no expectation is provided, client must
be prepared to accept a 202 Accepted status for any request other than
GET.

REST Check if resource exists, how to handle on server side?

how to handle resource checking on server side?
For example, my api looks like:
/books/{id}
After googling i found, that i should use HEAD method to check, if resource exists.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
I know, that i can use GET endpoint and use HEAD method to fetch information about resource and server does not return body in this case.
But what should i do on server side?
I have two options.
One endpoint marked as GET. I this endpoint i can use GET method to fetch data and HEAD to check if resource is available.
Two endpoints. One marked as GET, second as HEAD.
Why i'm considering second solution?
Let's assume, that GET request fetch some data from database and process them in some way which takes some time, eg. 10 ms
But what i actually need is only to check if data exists in database. So i can run query like
select count(*) from BOOK where id = :id
and immediately return status 200 if result of query is equal to 1. In this case i don't need to process data so i get a faster response time.
But... resource in REST is a object which is transmitted via HTTP, so maybe i should do processing data but not return them when i use HEAD method?
Thanks in advance for your answer!
You could simply delegate the HEAD handler to the existing GET handler and return the status code and headers only (ignoring the response payload).
That's what some frameworks such as Spring MVC and JAX-RS do.
See the following quote from the Spring MVC documentation:
#GetMapping — and also #RequestMapping(method=HttpMethod.GET), are implicitly mapped to and also support HTTP HEAD. An HTTP HEAD request is processed as if it were HTTP GET except but instead of writing the body, the number of bytes are counted and the "
Content-Length header set.
[...]
#RequestMapping method can be explicitly mapped to HTTP HEAD and HTTP OPTIONS, but that is not necessary in the common case.
And see the following quote from the JAX-RS documentation:
HEAD and OPTIONS requests receive additional automated support. On receipt of a HEAD request an implementation MUST either:
Call a method annotated with a request method designator for HEAD or, if none present,
Call a method annotated with a request method designator for GET and discard any returned entity.
Note that option 2 may result in reduced performance where entity creation is significant.
Note: Don't use the old RFC 2616 as reference anymore. It was obsoleted by a new set of RFCs: 7230-7235. For the semantics of the HTTP protocol, refer to the RFC 7231.
Endpoint should be the same and server side script should make decision what to do based on method. If method is HEAD, then just return suitable HTTP code:
204 if content exists but server don't return it
404 if not exists
4xx or 5xx on other error
If method is GET, then process request and return content with HTTP code:
200 if content exists and server return it
404 if not exists
4xx or 5xx on other error
The important thing is that URL should be the same, just method should be different. If URL will be different then we talking about different resources in REST context.
Your reference for HTTP methods is out of date; you should be referencing RFC 7231, section 4.3.2
The HEAD method is identical to GET except that the server MUST NOT send a message body in the response (i.e., the response terminates at the end of the header section).
This method can be used for obtaining metadata about the selected representation without transferring the representation data and is often used for testing hypertext links for validity, accessibility, and recent modification.
You asked:
resource in REST is a object which is transmitted via HTTP, so maybe i should do processing data but not return them when i use HEAD method?
That's right - the primary difference between GET and HEAD is whether the server returns a message-body as part of the response.
But what i actually need is only to check if data exists in database.
My suggestion would be to use a new resource for that. "Resources" are about making your database look like a web site. It's perfectly normal in REST to have many URI that map to a queries that use the same predicate.
Jim Webber put it this way:
The web is not your domain, it's a document management system. All the HTTP verbs apply to the document management domain. URIs do NOT map onto domain objects - that violates encapsulation. Work (ex: issuing commands to the domain model) is a side effect of managing resources. In other words, the resources are part of the anti-corruption layer. You should expect to have many many more resources in your integration domain than you do business objects in your business domain.

Idempotentency of GET verb in an RESTful API

As it was mentioned here https://restfulapi.net/http-methods/ (and in other places as well):
GET APIs should be idempotent, which means that making multiple
identical requests must produce same result everytime until another
API (POST or PUT) has changed the state of resource on server.
How to make this true in an API that return time for example? or that return data that is affected by time.
In other words, each time I use GET http://ip:port/get-time-now/, it is going to return a different response. However, I did not send any POST or PUT between two sequenced GET's
Does this make the previous statement wrong? Did I misunderstand something?
Idempotency is a promise to clients/intermediaries that the request can be reissued in case of network failures or the like without any further considerations and not so much that the data will never change.
If you take a POST request for example, in case of a network failure you do not know if the previous request reached the server but the response got lost midway or if the initial request didn't even reach the server at all. If you re-issue the request you might create a further resource actually, hence POST is not idempotent. PUT on the other side has the contract that it replaces the current representation with the one contained in the request. If you send the same request twice the content of the resource should be the same after any of the two PUT requests was processed. Note that the actual result can still differ as the service is free to modify the received entity to a corresponding representation. Also, between sending the data via PUT and retrieving it via GET a further client could have updated the state in between, so there is no guarantee that you will actually receive the exact representation you've sent to the service.
Safetiness is an other promise that only GET, HEAD and OPTIONS supports. It promises the invoker that it wont modify any state at all hence clients/intermediaries are safe on issuing such request without having to fear that it will modify any state. In practice this is an important promise to crawlers which blindly invoke any URLs in order to learn their content. In case of violating such promises, i.e. by deleting data while processing a GET request the only one to blame is the service implementor but not the invoker. If a crawler invokes such URLs and hence removes some data it is not the crawlers fault actually but only the service implementor.
As you have a dynamic value in your response, you might want to prevent caching of responses though as otherwise intermediaries might return an old state for your resource
The main basic concept of idempotent and safe methods of HTTP:-
Idempotent Method:- The method can called multiple times with same input and it produce same result.
Safe Method:- The method can called multiple times with same input and it doesn't modify the resource onto the server side.
Http methods are categorized into following 3 groups-
GET,HEAD,OPTIONS are safe and idempotent
PUT,DELETE are not safe but idempotent
POST,PATCH are neither safe & nor idempotent

Which verb to use for a REST request which sends data and gets data back?

Search - request contains query parameters e.g. search term and pagination values. No changes/data is persisted to backend.
I currently use GET with query parameters here.
Data conversion - request contains data in format A and server sends data in format B. No changes/data is persisted to backend.
I currently use POST with request parameters here.
For your Data Conversion use case (which seems to be more of a function that working with a representation of something on the server), the answer is more grounded in higher-level HTTP verb principles than RESTful principles. Both cases are non-idempotent: they make no changes to the server, so GET should be used.
This question has a good discussion of the topic, especially this comment:
REST and function don't go well together. If an URL contains function, method, or command, I smell RPC – user1907906
Search - request contains query parameters e.g. search term and pagination values. No changes/data is persisted to backend.
If the request is supposed to generate no changes on the back end, then you are describing a request which is safe, so you should choose the most suitable safe method - GET if you care about the representation, HEAD if you only care about the meta data.
Data conversion - request contains data in format A and server sends data in format B. No changes/data is persisted to backend.
Unless you can cram the source representation into the URL, POST is your only reasonable choice here. There is no method in HTTP for "this is a safe method with a payload".
In practice, you could perhaps get away with using PUT rather than POST -- it's an abuse of the uniform interface, but one that allows you to communicate at least the fact that the semantics are idempotent. The key loophole is:
there is no guarantee that such a state change will be observable, since the target resource might be acted upon by other user agents in parallel, or might be subject to dynamic processing by the origin server, before any subsequent GET is received. A successful response only implies that the user agent's intent was achieved at the time of its processing by the origin server.

Batch processing via REST service

Are there any best practices for performing BATCH operations via REST for POST, PUT, PATCH verbs?
The current paradigm I am following is that the JSON payload is specified in the body for all 3 operations:
a) POST to return the location of the created resource
b) PUT / PATCH return a 201 if the update is successful
For a batch operation, I intend to accept a collection of JSON objects in the payload body but am trying to figure what to return to the client.
While processing the batch, the operation may succeed for some of the items but might fail for others.
Taking this into account, my take is that the best thing to do is to return a collection of objects indicating the Success/Failure status of each item from the payload.
But this deviates from my paradigm outlined in (a) and (b) above.
Instead, does it make sense to return an identifier representing an ID of the Batch operation itself to the client?
The client would then issue a subsequent GET to get the result of the operation it requested.
Does this approach sound reasonable? If so, does it make sense to block the client on the subsequent GET if the operation hasn't completed OR does it make sense to always return the most current state i.e. a collection of responses for each of the items that the client requested to process.
Ideas/Thoughts/Suggestions?
Since REST is an architectural style with no necessarily clear "guidelines" and no
mandate on how the actions for HTTP verbs should be implements, clearly there is no right or wrong answer here.
I am looking for a solution that is elegant, natural and intuitive.
REST operations are supposed to be atomic as seen from the outside. That is, if one part of the request fails, then the whole state of the server should revert to the pre-request state and a 4xx or 5xx response returned (so, for example, the request could be repeated in whole without ill effects if it failed the first time). This however has nothing to do with batch operations per se—such a request could be any kind of request.
Batch operations violate a different REST constraint, that of the uniform interface (defined by HTTP's methods and their operation upon a resource at the specified URL).
If you want to do batch operations, give up on trying to call your API RESTful, because you have already lost out on the benefits that REST imparts, and are just lying to yourself.
If you want to retain those benefits, give up on batch operations.
REST is an architectural style.
I implement APIs to always return the result of what was updated. So in the case of a POST it will return the created entity, with PATCH and PUT it will return the updated entity.
Depending on the batch size I would either return an array of what was processed, or alternatively an array of identifiers of what was processed.
If the batch operation is long running return an identifier for the batch, but make it painfully clear that the endpoint is different from other endpoints
ex. if you create a user by posting to
http://somesite.com/users
send batch requests to
http://somesite.com/batch/users
On a get return the status of the batch operation while it is still running, on complete return an array of record that were updated.
The most important thing is consistency, whatever you choose, always follow the same approach with batch operation throughout the system.