In my REST API I have a very large collection; it contains millions of items. The path for this collection is /mycollection
Because this collection is so large it is not good practice to GET the whole collection so the API supports paging. Paging will be the primary way of getting the collection
GET /mycollection?page=1&page-size=100 HTTP/1.1
Say the original collection contains 1,000,000 items and I want to update 5,000, delete 3,000 and add 2,000 items. I could write my API to support updating the collection via either the PUT method or the PATCH method. While either method would require very different request bodies I believe both methods would require the exact same response body, i.e. the response body would have to contain the current representation of the entire updated resource, i.e. all 999,000 items in the collection.
As I mentioned earlier GETting the entire collection is just not realistic; it's too big. For the same reason I don't want PUTting or PATCHing to return the entire collection. Adding query parameters to a PUT or PATCH request wouldn't work either because neither PUT nor PATCH are safe methods.
So what would be the proper response in this large collection scenario?
I could respond with
HTTP/1.1 202 Accepted
Location: /mycollection?page=1&page-size=100
The 202 Accepted response code doesn't feel correct because the update would have been done synchronously. The Location header doesn't quite feel right either. I could maybe go with a Links header, but still it doesn't feel right.
Again I ask what would be the proper response in this large collection scenario?
This question is based on a misconception:
While either method would require very different request bodies I believe both methods would require the exact same response body, i.e. the response body would have to contain the current representation of the entire updated resource
Either can just return 204 No Content or 200 OK and no response body. There's no requirement that they include the full representation in the response.
You could optionally support this (perhaps along with the Prefer: return=representation header, or perhaps Content-Location header), but without this header I would say it's not even a convention that the current representation is returned. Generic clients shouldn't assume that the response body is the new representation unless these headers are used.
So, just return a 2xx and you're good to go.
So what would be the proper response in this large collection scenario?
Short version: you should probably treat a successful PUT as though it were a successful POST.
the intended meaning of the payload can be summarized as:
a representation of the status of, or results obtained from, the action
So the response could be as simple as
200 OK
Content-Type: text/plain
It worked!
Longer answer:
While either method would require very different request bodies I believe both methods would require the exact same response body, i.e. the response body would have to contain the current representation of the entire updated resource
This isn't right - If you review RFC 7231, you'll see that the response to PUT has this description
a representation of the status of the action
Returning the new representation of the resource is an edge case, not the default (see the specification of the Content-Location header).
For a state-changing request like PUT (Section 4.3.4) or POST (Section 4.3.3), it implies that the server's response contains the new representation of that resource, thereby distinguishing it from representations that might only report about the action (e.g., "It worked!"). This allows authoring applications to update their local copies without the need for a subsequent GET request.
That said, I'd suggest a review of your choice of method token. Both PUT and PATCH support remote authoring semantics - messages that ask a server to make its copy of a document look like your local copy. That's why, for example, the PUT specification has a bunch of constraints about adding validator header fields to the response. General purpose components are allowed to assume that they know what's going on, because all resources are supposed to understand these methods the same way.
But in your case, you can't really be said to be remote authoring the collection, because the client (and the general purpose components) don't have a representation of the collection, but instead only representations of pages of the collection.
If you were going to be consistent with the uniform interface, then you would either
allow remote authoring of the pages, or
abandon the method tokens that imply remote authoring
It is okay to use POST when the semantics of your request don't quite align with the standardized meanings
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”
Related
As an example I have an order where the invoicing address can be modified. The change might trigger various additional actions (i.e. create a cancellation invoice and a new invoice with the updated address).
As recommended by various sources (see below) I don't want to have a PATCH on the order resource, because it has many other properties, but want to expose a dedicated endpoint, also called "intent" resource or subresource according to the web links below:
/orders/{orderId}/invoicing-address
Should I use a POST or a PATCH against this subresource?
The invoicing address itself has no ID. In the domain layer it is represented as a value object that is part of the order entity.
What ETag should be used for the subresource?
The address is part of the order and together with the items they form an aggregate in the domain layer. When the aggregate is updated it gets a new version number in the database. That version number is used as an ETag for optimistic locking.
Should a GET on invoicing-address respond with the order aggregate version number or a hash value of the address DTO in the ETag header?
What payload should be returned after updating the address?
Since the resource is the invoicing address it seems natural to return the updated address object (maybe with server side added fields). Should the body also include the ID/URI and the ETag of the order resource?
None of the examples I found with subresources showed any server responses or considered optimistic locking.
https://rclayton.silvrback.com/case-against-generic-use-of-patch-and-put
https://www.thoughtworks.com/insights/blog/rest-api-design-resource-modeling
https://softwareengineering.stackexchange.com/questions/371273/design-update-properties-on-an-entity-in-a-restful-resource-based-api (see provided answer)
https://www.youtube.com/watch?v=aQVSzMV8DWc&t=188s (Jim Webber at about about 31 mins)
As far as REST is concerned, "subresources" aren't a thing. /orders/12345/invoicing-address identifies a resource. The fact that this resource has a relationship with another resource identified by /orders/12345 is irrelevant.
Thus, the invoicing-address resource should understand HTTP methods exactly the same way as every other resource on the web.
Should I use a POST or a PATCH against this subresource?
Use PUT/PATCH if you are proposing a direct change to the representation of the resource. For example, these are the HTTP methods we would use if we were trying to fix a spelling error in an HTML document (PUT if we were sending a complete copy of the HTML document; PATCH if we were sending a diff).
PUT /orders/12345/invoicing-address
Content-Type: text/plain
1060 W Addison St.
Chicago, IL
60613
On the other hand, if you are proposing an indirect change to the representation of the resource (the request shows some information to the server, and the server is expected to compute a new representation itself)... well, we don't have a standardized method that means exactly that; therefore, we use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
What ETag should be used for the subresource?
You should first give some thought to whether you want to use a strong-validator or a weak validator
A strong validator is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the content of a 200 (OK) response to GET.
...
In contrast, a weak validator is representation metadata that might
not change for every change to the representation data.
...
a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.
I might use a weak validator if the representation included volatile but insignificant information; I don't need clients to refresh their copy of a document because it doesn't have the latest timestamp metadata. But I probably wouldn't use an "aggregate version number" if I expected the aggregate to be changing more frequently than the invoicing-address itself changes.
What payload should be returned after updating the address?
See 200 OK.
In the case of a POST request, sending the current representation of the resource (after changes have been made to it) is nice because the response is cacheable (assuming you include the appropriate metadata signals in the response headers).
Responses to PATCH have similar rules to POST (see RFC 5789).
PUT is the odd man out, here
Responses to the PUT method are not cacheable.
Should the body also include the ID/URI and the ETag of the order resource?
Entirely up to you - HTTP components aren't going to be paying attention to the representation, so you can design that representation as makes sense to you. On the web, it's perfectly normal to return HTML documents with links to other HTML documents.
I had to create the implementation of a PATCH method, because of the need to update only a single value.
I spend too much time wandering is there any way to only pass the single value, instead of a whole request body.
And I've got the feeling that it is bad idea to pass it as a url query param, but why? Can someone explain if and why it is a bad idea, for example, to try and update the age of a customer like this:
PATCH host/api/customers/{id}?newAge=45
And I've got the feeling that it is bad idea to pass it as a url query param
Yes, it is.
but why?
Because the target-uri in the request identifies the document (resource) that we are taking about, not the change that we want to make to that document.
In other words, as far as a general purpose component is concerned, in the same way that
/host/api/customers/{id}
/host/api/customers/{id}/newAge/45
identify different resources, so too do
/host/api/customers/{id}
/host/api/customers/{id}?newAge=45
In other words, both the path and the query part are part of the "identifier", see RFC 7230, RFC 3986.
RFC 7234 describes one setting where your proposal is likely to cause a problem: cache-invalidation. Because the target-uri is the primary cache key, a general purpose cache will not recognize that
GET /host/api/customers/{id}
PATCH /host/api/customers/{id}?newAge=45
are two requests about the same object, and that the copy of the document we cached after the first request is no longer valid after the second succeeds.
If we were modifying this resource via forms on the web, we would use a request like
POST /host/api/customers/{id} HTTP/2.0
Content-Type: application/x-www-form-urlencoded
newValue=45
And because the identifiers match, general purpose caches can do the right thing by just following the standard.
Update 2022-07-11: the current standard for cache invalidation is RFC-9111
Designing API using JSONAPI 1.0 standard there is no PUT method. There is only POST method for create resource and PATCH for partially update. We have use case where user can send request to the server and if resource doesn't exist then must be created otherwise updated. RFC describe such method as a PUT. Next quoting mentioned RFC 5789 standard for PATCH there is information:
"If the Request-URI does not point to an existing resource, the server MAY create a new resource,
depending on the patch document type (whether it can logically modify a null resource) and permissions, etc."
Is it good idea to have PATCH method for update and create resource? Which standard should be used to support both PUT and PATCH methods (maybe OpenApi)?
How to interpret RFC description?
Best Regards
We have use case where user can send request to the server and if resource doesn't exist then must be created otherwise updated.
The right answer, in this case, is almost certainly going to be to POST your request to the collection resource, and let the server figure out the "right" thing to do.
A resource can be created by sending a POST request to a URL that represents a collection of resources.
Using PUT to create a resource assumes that the client can/should guess what the correct identifier for the new resource should be. If we're not giving the client that authority/control, then the request needs to instead use a stable target-uri, and the server computes the side effects on other resources.
In JSON:API, the server gets to control the choice of URI for the new item.
POST /photos HTTP/1.1
Content-Type: application/vnd.api+json
...
HTTP/1.1 201 Created
Location: http://example.com/photos/550e8400-e29b-41d4-a716-446655440000
If the API were supporting PUT semantics, an equivalent change would look
something like
PUT /photos/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1
Content-Type: application/vnd.api+json
HTTP/1.1 201 Created
But JSON:API has decided that PUT isn't interesting yet. Reading between the lines, the authors decided that PUT should be reserved until more implementations demonstrate that they understand the HTTP spec.
So instead you have POST to the collection for the create, and PATCH on the item to for partial or complete replacement.
That in turn implies that if the client doesn't/cannot know that a resource already exists, that it should POST to the collection. The server, in turn, should be aware that the resource may already exist, and do something sensible (replace the representation of the resource, redirect the client to the resource, etc). How the server achieves that would be an implementation detail.
Looking into Internet deal with REST HTTP methods I have never seen that PATCH can be used for resource creation therefore I am surprised that JsonApi forgo PUT method.
PATCH can certainly be used for resource creation -- see RFC 5789
If the Request-URI does not point to an existing resource, the server MAY create a new resource, depending on the patch document type (whether it can logically modify a null resource) and permissions, etc.
It's an uncommon choice, because PUT semantics are a better fit for that use case. Choosing to support PATCH, but not PUT, is weird.
I am surprised that JsonApi forgo PUT method
I am also surprised.
It might be possible to resolve your concerns by registering a new profile, encouraging the community to adopt a common pattern for the semantics that you need.
According to Postel's law one should be conservative in what you do, be liberal in what you accept from others.
Two common media-types used with PATCH are application/json-patch+json (a.k.a. JSON Patch) and application/json-merge-patch+json (a.k.a MergePatch).
MergePatch defines a couple of rules that determine whether a part needs to be added, removed or updated. The spec defines that a request received of that type needs to be processed by calling a function that takes in two arguments, the target resource and the representation received. The target itself might be either a JSON value or undefined. If the resource does not yet exist it is undefined and will lead to all of the values in the received patch document to be added to the yet undefined resource. This is basically your resource creation then.
JSON Patch, in contrast to MergePatch, is specified to only operate on JSON documents. There is no mention how the patch needs to be applied in case no resource was yet available. This makes somehow sense if you look at the operations is offers, such as test, remove, replace or move that only work if there is a counterpart in the original JSON document availalbe. This media-type is quite close to the actual PATCH definition in that a client sends a set of instructions, which were previously calculate by the client, that need to be applied atomically by the server. Either all or none of the changes are applied. Here a client should have already fetched the current state of the target resource beforehand, otherwise it wont be able to calculate the necessary changes to transform the current representation into the desired one. So applying a representation of that media-type only makes sense if there is already a resource available. If the client saw that no target resource is yet available it simply can use POST then to create the resource. If a client though sends a patch document containing only add operations I'd create a JSON representation and add all of the fields accordingly though.
As you see there are two different takes on how PATCHing can be done in HTTP. One being very close to the original idea of how patching is done in software engineering for decades while the other method being a more pragmatic approch to partial updating remote resources. Which one you use or support (in best case both) is your choice.
REST recommends that queries (not resource creation) be done through the GET method. In some cases, the query data is too large or structured in a way that makes it difficult to put in a URL, and to address this, a RESTful API is modified to support queries with bodies.
It seems that the convention for RESTful queries that require bodies is to use POST. Here are a few examples:
Dropbox API
ElasticSearch
O'Reilley's Restful Web Services Cookbook
Queries don't modify the internal state of the system, but POST doesn't support idempotent operations. However, PUT is idempotent. Why don't RESTful APIs use PUT with a body instead of POST for queries that require a body?
NOTE: A popular question asks which (PUT vs POST) is preferred for creating a resource. This question asks why PUT is not used for queries that require bodies.
No. PUT might be idempotent, but it also has a specific meaning. The body of the request in PUT should be used to replace the resource in the URI.
With POST no such assumptions are being made. And note that using a POST request means that the request might not be idempotent, in specific cases it still might be.
However, you could do it with PUT, but it requires you to jump through an extra hoop. Basically, you could create a "query resource" with PUT, and then use GET immediately after to fetch the results of this query resource. Perhaps this was what you were after, but this is the most RESTful, because the resulting query results can still be linked to. (something which is completely missing if you use POST requests).
You should read the standard: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
The definition of POST:
The POST method is used to request that the origin server accept the
entity enclosed in the request as a new subordinate of the resource
identified by the Request-URI in the Request-Line.
The action performed by the POST method might not result in a resource
that can be identified by a URI. In this case, either 200 (OK) or 204
(No Content) is the appropriate response status, depending on whether
or not the response includes an entity that describes the result.
The definition of PUT
The PUT method requests that the enclosed entity be stored under the
supplied Request-URI. If the Request-URI refers to an already existing
resource, the enclosed entity SHOULD be considered as a modified
version of the one residing on the origin server. If the Request-URI
does not point to an existing resource, and that URI is capable of
being defined as a new resource by the requesting user agent, the
origin server can create the resource with that URI.
If the resource could not be created or modified with the Request-URI, an appropriate error response SHOULD be given that reflects the nature of the problem.
Another thing that PUT is not cacheable while POST is.
Responses to this method are not cacheable, unless the response
includes appropriate Cache-Control or Expires header fields.
e.g. http://www.ebaytechblog.com/2012/08/20/caching-http-post-requests-and-responses/
Consider a web API method that has no side effects, but which takes binary data as a parameter. An example would be a method that tells the user whether or not their image is photoshopped, but does not permanently store the image or the result on its servers.
Should such a method be a GET or a POST?
GET doesn't seem to have a recommended way of sending data outside of URL parameters, but the behavior of the method implies a GET, which according to the HTTP spec is for safe, stateless responses. This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
While a POST request means that the entity sent will be treated "as a new subordinate of the resource identified by the Request-URI", there is no requirement that this result in the creation of a new permanent object or that any such new object be identified by a URI (so no new object as far as the client knows). An object can be transient, representing the results of e.g. "Providing a block of data, such as the result of submitting a form, to a data-handling process" and not persisting after the entity representing that object has been sent.
While this means that a POST can create a new resource, and is certainly the best way to do so when it is the server that will give that new resource its URI (with PUT being the more appropriate method when the client dictates the new URI) it is can also be used for cases that delete objects (though again if it's a deletion of a single* resource identifiable by a URI then DELETE is far more appropriate), both create and delete objects, change multiple objects, it can mean that your kitchen light turns on but that the response is the same whether that worked or failed because the communication from the webserver to the kitchen light doesn't allow for feedback about success. It really can do anything at all.
But, your instincts are good in wanting this to be a GET: While the looseness of POST means we can make a case for it for just about every request (as done by approaches that use HTTP for an RPC-like protocol, essentially treating HTTP as if it was a transport protocol), this is inelegant in theory, inefficient in practice and clumsy in definition. You have an idempotent function that depends on solely on what the client is interested in, and that maps most obviously the GET in a few ways.
If we could fit everything in a URI then GET would be easy. E.g we can define a simple integer addition with something like http://example.net/addInts?x=1;y=2 representing the addition of 71 and 2 and hence being a permanent immutable resource representing the number 3 (since the results of GET can vary with changes to a resource over time, but this resource never changes) and then use a mechanism like HTML's <form> or javascript to allow the server to inform the client as to how to construct the URIs for other numbers (to maintain the HATEOS and/or COD constraints). Simples!
Your problem here is that you do not have input that can be represented as concisely as the numbers 1 and 2 can above. In theory you could do something like http://example.net/photoshoppedCheck?image=data:image/png;base64,iVBORw0KGgoAAAANSU… and hence create a URI that represents the resource of the results of the check. This URI will though will have 4 characters for every 3 bytes in the image. While there's no absolute limit on URI length both the theory and the practice allow this to fail (in theory HTTP allows for proxies and servers to set a limit on URI length, and in practice they do).
An argument could be made for using GET and sending a request body the same way as you would with a POST, and some webservers will even allow you to do this. However, GET is defined as returning an entity describing the resource identified in the URI with headers restricting how that entity does that describing: Since the request body is not part of that definition it must be ignored by your code! If you were tempted to bend this rule then you must consider that:
Some webservers will refuse the request or strip the body, so you may not be able to.
If your webserver does allow it, the fact that its not specified means you can't be sure an upgrade won't "fix" this and so break your code.
Some proxies will refuse or strip the request.
Some client libraries will most certainly refuse to allow developers to send a request body along with a GET.
So it's a no-no in both theory and practice.
About the only other approach we could do apart from POST is to have a URI that we consider as representing an image that was not photoshopped. Hence if you GET that you get an entity describing the image (obviously it could be the actual image, though it could also be something else if we stretch the concept of content-negotiation) and then PUT will check the image and if its deemed to not be photoshopped it responds with the same image and a 200 or just a 204 while if it is deemed to be photoshopped it responds with a 400 because we've tried to PUT a photoshopped image as a resource that can only be a non-photoshopped image. Because we respond immediately, there's no race-condition with simultaneous requests.
Frankly, this would be darn-right horrible. While I think I've made a case for it by the letter of the specs, it's just nasty: REST is meant to help us design clear APIs, not obtuse APIs we can offer a too-clever-for-its-own-good justification of.
No, in all the way to go here is to POST the image to a fixed URI which then returns a simple entity describing the analysis.
It's perfectly justifiable as REST (the POST creates a transient object based on that image, and then responds with an entity describing that object, and then that object disappears again). It's straight-forward. It's about as efficient as it could be (we can't do any HTTP caching† but most of the network delay is going to be on the upload rather than the download anyway). It also fits into the general use-case of "process something" that POST was first invented for. (Remember that first there was HTTP, then REST described why it worked so well, and then HTTP was refined to better play to those strengths).
In all, while the classic mistake that moves a web application away from REST is to abuse POST into doing absolutely everything when GET, PUT and DELETE (and perhaps the WebDAV methods) would be superior, don't be afraid to use its power when those don't meet the bill, and don't think that the "new subordinate of the resource" has to mean a full long-lived resource.
*Note that a "single" resource here might be composed of several resources that may have their own URIs, so it can be easy to have a single DELETE that deletes multiple objects, but if deleting X deletes A, B & C then it better be obvious that you can't have A, B or C if you don't have X or your API is not going to be understandable. Generally this comes down to what is being modelled, and how obvious it is that one thing depends on another.
†Strictly speaking we can, as we're allowed to send cache headers indicating that sending an identical entity to the same URI will have the same results, but there's no general-purpose web-software that will do this and your custom client can just "remember" the opinion about a given image itself anyway.
It's a difficult one. Like with many other scenarios there is no absolutely correct way of doing it. You have to try to interpret RESTful principles in terms of the limitations of the semantics of HTTP. (Incidentally, I don't think it's right to think of REST having semantics, REST is an architectural style which is commonly used with HTTP services, but can be used for any type of interface.)
I've faced a similar situation in my current project. We chose to use a POST but with the response code being a 200 (OK) rather than the 201 (Resource Created) usually returned by RESTful Web APIs.