HTTP Response 412 - can you include content? - rest

I am building a RESTful data store and leveraging Conditional GET and PUT. During a conditional PUT the client can include the Etag from a previous GET on the resource and if the current representation doesn't match the server will return the HTTP status code of 412 (Precondition Failed). Note this is an Atom based server/protocol.
My question is, when I return the 412 status can I also include the new representation of the resource or must the user issue a new GET? The HTTP spec doesn't seem to say yes or no and neither does the Atom spec (although their example shows an empty entity body on the response). It seems pretty wasteful not to return the new representation and make the client specifically GET it. Thoughts?

Although conditional GETs and PUTs are summarized as 'conditional requests' they are very different conceptually. Conditional GETs are a performance optimization and conditional PUTs are a concurrency control mechanism. It is hard to discuss them together.
To your question regarding the conditional GET: If you send GET and include an If-None-Match header the server will send 200 Ok if the resource has changed and 304 Not Modified if it did not (if the condition failed). 412 is only to be used with conditional PUTs.
UPDATE: It seems I misread the question slightly. To your point regarding the 'refresh' of the local copy upon a failed conditional PUT: It might well be that a cache already has the newest version and that your refresh-GET will be served from some cache. Having the server return the current entity with the 412 might actually give you worse performance.

No, technically you should not. Error codes are generally to specify that something has gone wrong. Although nothing would stop you from returning content (and in fact, some errors like a 404 return a pretty page that says You didn't find what you're looking for), the point of the response is not to return other content, but to return something that tells you what was wrong. Technically you also should not return that data because you passed the If-None-Match: etag (I'm assuming that's what you passed?)
On another note, do you really need to optimize away one additional http call?
The more I think about this, the more I'm convinced it's a bad idea - Are you going to return the content on any other errors? PUT semantics are to PUT. GET semantics should be used for GET.

If the number of additional requests incurred, due to an extra request after an update conflict, is significant enough for you to have performance concerns, then I would suggest you might have issues with the granularity of your resources.
Do you really expect millions of times a day multiple users will be editing the same resource simultaneously? Maybe you need to be storing delta changes to the resource instead updating the resource directly. If there really is that much contention for these resources then aren't users going to be constantly working on out of date data.
If your problem was that your resource contains the last-modified date and last-modified user and you had to do a GET after every PUT then I would be more convinced of the need to twist the rules.
However, I think the performance hit of the extra request is worth it for the clarity to the client developer.

Related

What HTTP status code should I use for a REST response that only contains part of the answer?

Say I am building an API that serves the content of old magazines. The client can request the content of the latest edition prior to a given date, something like:
GET /magazines/<some_magazine_id>?date=2015-03-15
The response includes general data about the magazine, such as its name, country of distribution, the date of creation..., and if my database has content available for an edition prior to the date given, it returns it as well.
What HTTP status code should I use if the client requests data for a date before anything I have available? I might have data in the future, when I expand the content of my database. So this is sort of a temporary failure, but it's unclear how long it may take before it is fixed.
Based on other questions, I feel like:
404 is not right: in some cases, I have no data at all about a magazine, in which case I'd return a 404. But this is different. I would like the user to get the partial data, but with an indication that it's only partial.
4xx are client-side errors. I feel like the client didn't do anything wrong.
206 Partial Content seems indicated when returning a range of the content, but not all of it.
30x I thought about using a 302 or similar, and point to the closest edition available, but again, I am not sure that this is right, because I am now pointing to something semantically different from the question asked.
5xx would be errors, and I think should not contain any data.
My best guess would be something like a 2xx No Details Available (Yet) indicating that the request was "somewhat successful", but I can't find anything that seems correct in the list.
I would go with a 200 OK. You did find the magazine and you are returning data about it. While your data is not as complete as it might have been, it is a complete response that can be understood. Presumably you are returning an empty array or a nil reference where the edition(s) would have been?
The problem with many of the more specific responses are that they are really intended for something more low-level. You are not returning partial content, you are returning the full content. It is just that the higher-level application data is not as complete as you might have wished (no old edition found). On the REST/HTTP level the response is just fine.

REST API: How to deal with processing logic

I read (among others) the following blog about API design: https://www.thoughtworks.com/insights/blog/rest-api-design-resource-modeling. It helped me to better understand a lot of aspects, but I have one question remaining:
How do I deal with functionality that processes some data and gives a response directly. Think, verbs like translate, calculate or enrich. Which noun should they have and should they be called by GET, PUT or POST?
P.S. If it should be GET, how to deal with the maximum length of a GET request
This is really a discussion about naming more so than functionality. Its very much possible to have processed logic in your API, you just need to be careful about naming it.
Imaginary API time. Its got this resource: /v1/probe/{ID} and it responds to GET, POST, and DELETE.
Let's say we want to launch our probes out, and then want the probe to give us back the calculated flux variation of something its observing (totally made up thing). While it isn't a real thing, let's say that this has to be calculated on the fly. One of my intrepid teammates decides to plunk the calculation at GET /v1/1324/calculateflux.
If we're following real REST-ful practices... Oops. Suddenly we're not dealing with a noun, are we? If we have GET /v1/probe/1324/calculateflux we've broken RESTful practices because we're now asking for a verb - calculateflux.
So, how do we deal with this?
You'll want to reconsider the name calculateflux. That's no good - it doesn't name a resource on the probe. **In this case, /v1/probe/1324/fluxvalue is a better name, and /v1/probe/1324/flux works too.
Why?
RESTFUL APIs almost exclusively use nouns in their URIs - remember that each URI needs to describe a specific thing you can GET POST PUT or DELETE or whatever. That means that any time there is a processed value we should give the resource the name of the processed (or calculated) value. This way, we remain RESTful by adhering to the always-current data (We can re-calculate the Flux value any time) and we haven't changed the state of the probe (we didn't save any values using GET).
Well, I can tell you that I know about this.
GET // Returns, JUST return
DELETE // Delete
POST // Send information that will be processed on server
PUT // Update a information
This schema is for laravel framework. Will be most interesting that you read the link in ref
Ref:
https://rafaell-lycan.com/2015/construindo-restful-api-laravel-parte-1/
You should start with the following process:
Identify the resources (nouns) in your system.
They should all respond to GET.
Let's take your translation example. You could decide that every word in the source language is a resource. This would give:
http://example.com/translations/en-fr/hello
Which might return:
Content-Type: text/plain
Content-Language: fr
bonjour
If your processes are long-running, you should create a request queue that clients can POST to, and provide them with another (new) resource that they can query to see if the process has completed.

What status code should I return when part of a bulk update fails?

Let's say I'm a server responding to a request to do a bulk create of some entity. Let's say that I've also decided to make it so that if one instance of the entity can't be created, due to a server error or user error, I will still create the other entity instances. In this scenario what should I return? A 201 because I created most of the entities in the request? Or A 4xx/5xx since there was an error while creating one of the entities?
If you return a 4xx code it implies that the entire request has failed, and the server-state has not changed.
If the intent of the request is to do 'one or more things and some may fail', then a partial application is still a success, so that puts in the 2XX range.
206 is not a good idea. This is specifically for requests that use Range, which is not the case here.
207 could be used. You'll probably want to define a custom format instead of the default XML-based on. My vote would probably just go to 200.
Also, consider just doing many requests. Requests are cheap, why lump them together? Now each request can have their own beautiful, accurate status code.
In that case you could return a multi-status (207) response where you acknowledge the result for each batch entry. That way the client would have a complete awareness of the results. However, that type of HTTP status involves more complex processing on the client-side.
I think 201 is not a good choice because it says "Accepted" which mean everything is alright. But not in your case. Maybe 206 is a good idea which means "Partial Content" according to Wiki "The server is delivering only part of the resource..."

Is it valid to modify a REST API representation based on a If-Modified-Since header?

I want to implement a "get changed values" capability in my API. For example, say I have the following REST API call:
GET /ws/school/7/student
This gets all the students in school #7. Unfortunately, this may be a lot. So, I want to modify the API to return only the student records that have been modified since a certain time. (The use case is that a nightly process runs from another system to pull all the students from my system to theirs.)
I see http://blog.mugunthkumar.com/articles/restful-api-server-doing-it-the-right-way-part-2/ recommends using the if-modified-since header and returning a representation as follows:
Search all the students updated since the time requested in the if-modified-since header
If there are any, return those students with a 200 OK
If there are no students returned from that query, return a 304 Not Modified
I understand what he wants to do, but this seems the wrong way to go about it. The definition of the If-Modified-Since header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.24) says:
The If-Modified-Since request-header field is used with a method to make it conditional: if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.
This seems wrong to me. We would not be returning the representation or a 304 as indicated by the RFC, but some hybrid. It seems like client side code (or worse, a web cache between server and client) might misinterpret the meaning and replace the local cached value, when it should really just be updating it.
So, two questions:
Is this a correct use of the header?
If not (and I suspect not), what is the best practice? Query string parameter?
This is not the correct use of the header. The If-Modified-Since header is one which an HTTP client (browser or code) may optionally supply to the server when requesting a resource. If supplied the meaning is "I want resource X, but only if it's changed since time T." Its purpose is to allow client-side caching of resources.
The semantics of your proposed usage are "I want updates for collection X that happened since time T." It's a request for a subset of X. It does not seem like your motivation is to enable caching. Your client-side cached representation seemingly contains all of X, even though the typical request will only return you a small set of changes to X; that is, the response is not what you are directly caching, so the caching needs to happen in custom user logic client-side.
A query string parameter is a much more appropriate solution. Below {seq} would be something like a sequence number or timestamp.
GET /ws/schools/7/students/updates?since={seq}
Server-side I imagine you have a sequence of updates since the beginning of your system and a request of the above form would grab the first N updates that had a sequence value greater than {seq}. In this way, if a client ever got very far behind and needed to catch up, the results would be paged.

Why do we need anything more than HTTP GET, PUT, POST?

What is the practical benefit of using HTTP GET, PUT, DELETE, POST, HEAD? Why not focus on their behavioral benefits (safety and idempotency), forgetting their names, and use GET, PUT or POST depending on which behavior we want?
Why shouldn't we only use GET, PUT and POST (and drop HEAD, DELETE)?
The [REST][1] approach uses POST, GET, PUT and DELETE to implement the CRUD rules for a web resource. It's a simple and tidy way to expose objects to requests on the web. It's web services without the overheads.
Just to clarify the semantic differences. Each operation is rather different. The point is to have nice HTTP methods that have clear, distinct meanings.
POST creates new objects. The URI has no key; it accepts a message body that defines the object. SQL Insert. [Edit While there's no technical reason for POST to have no key, the REST folks suggest strongly that for POST to have distinct meaning as CREATE, it should not have a key.]
GET retrieves existing objects. The URI may have a key, depends on whether you are doing singleton GET or list GET. SQL Select
PUT updates an existing object. The URI has a key; It accepts a message body that updates an object. SQL Update.
DELETE deletes an existing object. The URI has a key. SQL Delete.
Can you update a record with POST instead of PUT? Not without introducing some ambiguity. Verbs should have unambiguous effects. Further, POST URI's have no key, where PUT must have a key.
When I POST, I expect a 201 CREATED. If I don't get that, something's wrong. Similarly, when I PUT, I expect a 200 OK. If I don't get that, something's wrong.
I suppose you could insist on some ambiguity where POST does either POST or PUT. The URI has to be different; also the associated message could be different. Generally, the REST folks take their cue from SQL where INSERT and UPDATE are different verbs.
You could make the case that UPDATE should insert if the record doesn't exist or update if the record does exist. However, it's simpler if UPDATE means UPDATE and failure to update means something's wrong. A secret fall-back to INSERT makes the operation ambiguous.
If you're not building a RESTful interface, then it's typical to only use GET and POST for retrieve and create/update. It's common to have URI differences or message content differences to distinguish between POST and PUT when a person is clicking submit on a form. It, however, isn't very clean because your code has to determine if you're in the POST=create case or POST=update case.
POST has no guarantees of safety or idempotency. That's one reason for PUT and DELETE—both PUT and DELETE are idempotent (i.e., 1+N identical requests have the same end result as just 1 request).
PUT is used for setting the state of a resource at a given URI. When you send a POST request to a resource at a particular URI, that resource should not be replaced by the content. At most, it should be appended to. This is why POST isn't idempotent—in the case of appending POSTS, every request will add to the resource (e.g., post a new message to a discussion forum each time).
DELETE is used for making sure that a resource at a given URI is removed from the server. POST shouldn't normally be used for deleting except for the case of submitting a request to delete. Again, the URI of the resource you would POST to in that case shouldn't be the URI for the resource you want to delete. Any resource for which you POST to is a resource that accepts the POSTed data to append to itself, add to a collection, or to process in some other way.
HEAD is used if all you care about is the headers of a GET request and you don't want to waste bandwidth on the actual content. This is nice to have.
Why do we need more than POST? It allows data to flow both ways, so why would GET be needed? The answer is basically the same as for your question. By standardizing the basic expectations of the various methods other processes can better know what to do.
For example, intervening caching proxies can have a better chance of doing the correct thing.
Think about HEAD for instance. If the proxy server knows what HEAD means then it can process the result from a previous GET request to provide the proper answer to a HEAD request. And it can know that POST, PUT and DELETE should not be cached.
No one posted the kind of answer I was looking for so I will try to summarize the points myself.
"RESTful Web Services" chapter 8 section "Overloading POST" reads: "If you want to do without PUT and DELETE altogether, it’s entirely RESTful to expose safe operations on resources through GET, and all other operations through overloaded POST. Doing this violates my Resource-Oriented Architecture, but it conforms to the less restrictive rules of REST."
In short, replacing PUT/DELETE in favor of POST makes the API harder to read and PUT/DELETE calls are no longer idempotent.
In a word:
idempotency
In a few more words:
GET = safe + idempotent
PUT = idempotent
DELETE = idempotent
POST = neither safe or idempotent
'Idempotent' just means you can do it over and over again and it will always do exactly the same thing.
You can reissue a PUT (update) or DELETE request as many times as you want and it will have the same effect every time, however the desired effect will modify a resource so it is not considered 'safe'.
A POST request should create a new resource with every request, meaning the effect will be different every time. Therefore POST is not considered safe or idempotent.
Methods like GET and HEAD are just read operations and are therefore considered 'safe' aswell as idempotent.
This is actually a pretty important concept because it provides a standard/consistent way to interpret HTTP transactions; this is particularly useful in a security context.
Not all hosters don't support PUT, DELETE.
I asked this question, in an ideal world we'd have all the verbs but....:
RESTful web services and HTTP verbs
HEAD is really useful for determining what a given server's clock is set to (accurate to within the 1 second or the network round-trip time, whichever is greater). It's also great for getting Futurama quotes from Slashdot:
~$ curl -I slashdot.org
HTTP/1.1 200 OK
Date: Wed, 29 Oct 2008 05:35:13 GMT
Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4
SLASH_LOG_DATA: shtml
X-Powered-By: Slash 2.005001227
X-Fry: That's a chick show. I prefer programs of the genre: World's Blankiest Blank.
Cache-Control: private
Pragma: private
Connection: close
Content-Type: text/html; charset=iso-8859-1
For cURL, -I is the option for performing a HEAD request. To get the current date and time of a given server, just do
curl -I $server | grep ^Date
To limit ambiguity which will allow for better/easier reuse of our simple REST apis.
You could use only GET and POST but then you are losing out on some of the precision and clarity that PUT and DELETE bring. POST is a wildcard operation that could mean anything.
PUT and DELETE's behaviour is very well defined.
If you think of a resource management API then GET, PUT and DELETE probably cover 80%-90% of the required functionality. If you limit yourself to GET and POST then 40%-60% of your api is accessed using the poorly specified POST.
Web applications using GET and POST allow users to create, view, modify and delete their data, but do so at a layer above the HTTP commands originally created for these purposes. One of the ideas behind REST is a return to the original intent of the design of the Web, whereby there are specific HTTP operations for each CRUD verb.
Also, the HEAD command can be used to improve the user experience for (potentially large) file downloads. You call HEAD to find out how large the response is going to be and then call GET to actually retrieve the content.
See the following link for an illustrative example. It also suggests one way to use the OPTIONS http method, which hasn't yet been discussed here.
There are http extensions like WebDAV that require additional functionally.
http://en.wikipedia.org/wiki/WebDAV
The web server war from the earlier days probably caused it.
In HTTP 1.0 written in 1996, there were only GET, HEAD, and POST. But as you can see in Appendix D, vendors started to add their own things. So, to keep HTTP compatible, they were forced to make HTTP 1.1 in 1999.
However, HTTP/1.0 does not sufficiently take into consideration
the effects of hierarchical proxies, caching, the need for
persistent connections, or virtual hosts. In addition, the proliferation
of incompletely-implemented applications calling themselves
"HTTP/1.0" has necessitated a protocol version change in order for
two communicating applications to determine each other's true capabilities.
This specification defines the protocol referred to as "HTTP/1.1". This protocol includes more stringent requirements than HTTP/1.0 in order
to ensure reliable implementation of its features.
GET, PUT, DELETE and POST are holdovers from an era when sophomores thought that a web page could be reduced to a few hoighty-toity principles.
Nowadays, most web pages are composite entities, which contain some or all of these primitive operations. For instance, a page could have forms for viewing or updating customer information, which perhaps spans a number of tables.
I usually use $_REQUEST[] in php, not really caring how the information arrived. I would choose to use GET or PUT methods based on efficiency, not the underlying (multiple) paradigms.