Why use two step approach to deleting multiple items with REST - rest

We all know the 'standard' way of deleting a single item via REST is to send a single DELETE request to a URI example.com/Items/666. Grand, let's move on to deleting many at once. As we do not require atomic deleting (or true transaction, ie all or nothing) we could just tell the client 'tough luck, make many requests' but that's not very nice is it. So we need a way to allow a client to request many 'Items' be deleted at once.
From my understanding, the 'typical' solution to this problem is a 'two step' approach. First the client POSTs a list of item IDs and is returned a URI such as example.com/Items/Collection/1. Once that collection is created, they call DELETE on it.
Now, I see that this works just fine, except to me, it is a bad solution. Firstly, you are forcing the client to make two requests to accommodate the server. Secondly, 'I thought DELETE was supposed to delete an Item?', shouldn't calling DELETE on this URI effectively cancel the transaction (it's not a true transaction though), how would we even cancel it? Really would be better if there was some form 'EXECUTE' action, but I can't rock the boat that much. It also forces the server to have to consider 'the JSON that was POSTed looks more like a request to modify these Items, but the request was DELETE... so I guess I will delete them'. This approach also starts to impose a sort of state on the client/server, not a true state I will admit, but it is sort of.
In my opinion, a better solution would be to simply call DELETE on example.com/Items (or maybe example.com/Items/Collection to imply this is a multiple delete) and pass JSON data containing a list of IDs that you wish to delete. As far as I can see, this basically solves all the problems the first method had. It is easier to use as a client, reduces the work the server has to do, is truly stateless, is more semantic.
I would really appreciate the feed back on this, am I missing something about REST that makes my solution to this problem unrealistic? I would also appreciate links to articles, especially if they compare these two methods; I am aware this is not normally approved of for SO. I need to be able to disprove that only the first method is truly RESTfull, prove that the second approach is a viable solution. Of course, if I am barking up the wrong tree do tell me.

I have spent the last week or so reading a fair bit on REST, and to the best of my understanding, it would be wrong to describe either of these solutions as 'RESTfull', rather you should say that 'neither solution goes against what REST means'.
The short answer is simply that REST, as laid out in Roy Fielding's dissertation (See chapter 5), does not cover the topic of how to go about deleting resources, singular or multiple, in a REST manor. That's right, there is no 'correct RESTful way to delete a resource'... well, not quite.
REST itself does not define how delete a resource, but it does define that what ever protocol you are using (remember that REST is not a protocol) will dictate the how perform these actions. The protocol will usually be HTTP; 'usually' being the key word as Fielding will point out, REST is not synonymous with HTTP.
So we look to HTTP to say which method is 'right'. Sadly, as far as HTTP is concerned, both approaches are viable. Yes 'viable'. HTTP will allow a client to send a POST request with a payload (to create a collection resource), and then call a DELETE method on this new collection to delete the resources; it will also allow you to send the data within the payload of a single DELETE method to delete the list of resources. HTTP is simply the medium by which you send requests to the server, it would be up to the server to respond appropriately. To me, the HTTP protocol seems to be rather open to interpretation in places, but it does seem to lay down fairly clear guide lines for what actions mean, how they should be dealt with and what response should be given; it's just it is a 'you should do this' rather than 'you must do this', but perhaps I am being a little pedantic on the wording.
Some people would argue that the 'two stage' approach cannot possibly be 'REST' as the server has to store a 'state' for the client to perform the second action. This is simply a misunderstanding of some part. It must be understood that neither the client nor the server is storing any 'state' information about the other between the list being POSTed and then subsequently being DELETEd. Yes, the list must have been created before it can deleted, but the server does not remember that it was client alpha that made this list (such an approach would allow the client to simply call 'DELETE' as the next request and the server remembers to use that list, this would not be stateless at all) as such, the client must tell the server to DELETE that specific list, the list it was given a specific URI for. If the client attempted to DELETE a collection list that did not already exist it would simply be told 'the resource can not be found' (the classic 404 error most likely). If you wish to claim that this two step approach does maintain a state, you must also claim that to simply GET an URI requires a state, as the URI must first exist. To claim that there is this 'state' persisting is misunderstanding what 'state' means. And as further 'proof' that such a two stage approach is indeed stateless, you could quite happily have client alpha POST the list and later client beta (without having had any communication with the other client) call DELETE on the list resources.
I think it can stand rather self evident that the second option, of just sending the list in the payload of the DELETE request, is stateless. All the information required to complete the request is stored completely within the one request.
It could be argued though that the DELETE action should only be called on a 'tangible' resource, but in doing so you are blatantly ignoring the REpresentational part of REST; It's in the name! It is the representational aspect that 'permits' URIs such as http://example.com/myService/timeNow, a URI that when 'got' will return, dynamically, the current time, with out having to load some file or read from some database. It is a key concept that the URIs are not mapping directly to some 'tangible' piece of data.
There is however one aspect of that stateless nature that must be questioned. As Fielding describes the 'client-stateless-server' in section 5.1.3, he states:
We next add a constraint to the client-server interaction: communication must
be stateless in nature, as in the client-stateless-server (CSS) style of
Section 3.4.3 (Figure 5-3), such that each request from client to server must
contain all of the information necessary to understand the request, and
cannot take advantage of any stored context on the server. Session state is
therefore kept entirely on the client.
The key part here in my eyes is "cannot take advantage of any stored context on the server". Now I will grant you that 'context' is somewhat open for interpretation. But I find it hard to see how you could consider storing a list (either in memory or on disk) that will be used to give actual useful meaning would not violate this 'rule'. With out this 'list context' the DELETE operation makes no sense. As such, I can only conclude that making use of a two step approach to perform an action such as deleting multiple resources cannot and should not be considered 'RESTfull'.
I also begrudge somewhat the effort that has had to be put into finding arguments either way for this. The Internet at large seems to have become swept up with this idea the the two step approach is the 'RESTfull' way doing such actions, with the reasoning 'it is the RESTfull way to do it'. If you step back for a moment from what everybody else is doing, you will see that either approach requires sending the same list, so it can be ignored from the argument. Both approaches are 'representational' and 'stateless'. The only real difference is that for some reason one approach has decided to require two requests. These two requests then come with follow up questions, such as how 'long do you keep that data for' and 'how does a client tell a server that it no longer wants that this collection, but wishes to keep the actual resources it refers to'.
So I am, to a point, answering my question with the same question, 'Why would you even consider a two step approach?'

IMO:
HTTP DELETE on existing collection to delete all of its member seems fine. Creating the collection just to delete all of the member sounds odd. As you yourself suggest, just pass IDs of the to be deleted items using JSON (or any other payload format). I think that the server should try to make multiple deletes an internal transaction though.

I would argue that HTTP already provides a method of deleting multiple items in the form of persistent connections and pipelining. At the HTTP protocol level it is absolutely fine to request idempotent methods like DELETE in a pipelined way - that is, send all the DELETE requests at once on a single connection and wait for all the responses.
This may be problematic for an AJAX client running in a browser since few browsers have pipelining support enabled by default. This is not the fault of HTTP, though, it is the fault of those specific clients.

Related

REST delete multiple items in the batch

I need to delete multiple items by id in the batch however HTTP DELETE does not support a body payload.
Work around options:
1. #DELETE /path/abc?itemId=1&itemId=2&itemId=3 on the server side it will be parsed as List of ids and DELETE operation will be performed on each item.
2. #POST /path/abc including JSON payload containing all ids. { ids: [1, 2, 3] }
How bad this is and which option is preferable? Any alternatives?
Update: Please note that performance is a key here, it is not an option execute delete operation for each individual id.
Along the years, many people fell in doubt about it, as we can see in the related questions here aside. It seems that the accepted answers ranges from "for sure do it" to "its clearly mistreating the protocol". Since many questions was sent years ago, let's dig into the HTTP 1.1 specification from June 2014 (RFC 7231), for better understanding of what's clearly discouraged or not.
The first proposed workaround:
First, about resources and the URI itself on Section 2:
The target of an HTTP request is called a "resource". HTTP does not limit the nature of a resource; it merely defines an interface that might be used to interact with resources. Each resource is identified by a Uniform Resource Identifier (URI).
Based on it, some may argue that since HTTP does not limite the nature of a resource, a URI containing more than one id would be possible. I personally believe it's a matter of interpretation here.
About your first proposed workaround (DELETE '/path/abc?itemId=1&itemId=2&itemId=3') we can conclude that it's something discouraged if you think about a resource as a single document in your entity collection while being good to go if you think about a resource as the entity collection itself.
The second proposed workaround:
About your second proposed workaround (POST '/path/abc' with body: { ids: [1, 2, 3] }), using POST method for deletion could be misleading. The section Section 4.3.3 says about POST:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. For example, POST is used for the following functions (among others): Providing a block of data, such as the fields entered into an HTML form, to a data-handling process; Posting a message to a bulletin board, newsgroup, mailing list, blog, or similar group of articles; Creating a new resource that has yet to be identified by the origin server; and Appending data to a resource's existing representation(s).
While there's some space for interpretation about "among others" functions for POST, it clearly conflicts with the fact that we have the method DELETE for resources removal, as we can see in Section 4.1:
The DELETE method removes all current representations of the target resource.
So I personally strongly discourage the use of POST to delete resources.
An alternative workaround:
Inspired on your second workaround, we'd suggest one more:
DELETE '/path/abc' with body: { ids: [1, 2, 3] }
It's almost the same as proposed in the workaround two but instead using the correct HTTP method for deletion. Here, we arrive to the confusion about using an entity body in a DELETE request. There are many people out there stating that it isn't valid, but let's stick with the Section 4.3.5 of the specification:
A payload within a DELETE request message has no defined semantics; sending a payload body on a DELETE request might cause some existing implementations to reject the request.
So, we can conclude that the specification doesn't prevent DELETE from having a body payload. Unfortunately some existing implementations could reject the request... But how is this affecting us today?
It's hard to be 100% sure, but a modern request made with fetch just doesn't allow body for GET and HEAD. It's what the Fetch Standard states at Section 5.3 on Item 34:
If either body exists and is non-null or inputBody is non-null, and request’s method is GET or HEAD, then throw a TypeError.
And we can confirm it's implemented in the same way for the fetch pollyfill at line 342.
Final thoughts:
Since the alternative workaround with DELETE and a body payload is let viable by the HTTP specification and is supported by all modern browsers with fetch and since IE10 with the polyfill, I recommend this way to do batch deletes in a valid and full working way.
It's important to understand that the HTTP methods operate in the domain of "transferring documents across a network", and not in your own custom domain.
Your resource model is not your domain model is not your data model.
Alternative spelling: the REST API is a facade to make your domain look like a web site.
Behind the facade, the implementation can do what it likes, subject to the consideration that if the implementation does not comply with the semantics described by the messages, then it (and not the client) are responsible for any damages caused by the discrepancy.
DELETE /path/abc?itemId=1&itemId=2&itemId=3
So that HTTP request says specifically "Apply the delete semantics to the document described by /path/abc?itemId=1&itemId=2&itemId=3". The fact that this document is a composite of three different items in your durable store, that each need to be removed independently, is an implementation details. Part of the point of REST is that clients are insulated from precisely this sort of knowledge.
However, and I feel like this is where many people get lost, the metadata returned by the response to that delete request tells the client nothing about resources with different identifiers.
As far as the client is concerned, /path/abc is a distinct identifier from /path/abc?itemId=1&itemId=2&itemId=3. So if the client did a GET of /path/abc, and received a representation that includes itemIds 1, 2, 3; and then submits the delete you describe, it will still have within its own cache the representation that includes /path/abc after the delete succeeds.
This may, or may not, be what you want. If you are doing REST (via HTTP), it's the sort of thing you ought to be thinking about in your design.
POST /path/abc
some-useful-payload
This method tells the client that we are making some (possibly unsafe) change to /path/abc, and if it succeeds then the previous representation needs to be invalidated. The client should repeat its earlier GET /path/abc request to refresh its prior representation rather than using any earlier invalidated copy.
But as before, it doesn't affect the cached copies of other resources
/path/abc/1
/path/abc/2
/path/abc/3
All of these are still going to be sitting there in the cache, even though they have been "deleted".
To be completely fair, a lot of people don't care, because they aren't thinking about clients caching the data they get from the web server. And you can add metadata to the responses sent by the web server to communicate to the client (and intermediate components) that the representations don't support caching, or that the results can be cached but they must be revalidated with each use.
Again: Your resource model is not your domain model is not your data model. A REST API is a different way of thinking about what's going on, and the REST architectural style is tuned to solve a particular problem, and therefore may not be a good fit for the simpler problem you are trying to solve.
That doesn’t mean that I think everyone should design their own systems according to the REST architectural style. REST is intended for long-lived network-based applications that span multiple organizations. If you don’t see a need for the constraints, then don’t use them. That’s fine with me as long as you don’t call the result a REST API. I have no problem with systems that are true to their own architectural style. -- Fielding, 2008

Correct HTTP request method for nullipotent action that takes binary data

Consider a web API method that has no side effects, but which takes binary data as a parameter. An example would be a method that tells the user whether or not their image is photoshopped, but does not permanently store the image or the result on its servers.
Should such a method be a GET or a POST?
GET doesn't seem to have a recommended way of sending data outside of URL parameters, but the behavior of the method implies a GET, which according to the HTTP spec is for safe, stateless responses. This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
While a POST request means that the entity sent will be treated "as a new subordinate of the resource identified by the Request-URI", there is no requirement that this result in the creation of a new permanent object or that any such new object be identified by a URI (so no new object as far as the client knows). An object can be transient, representing the results of e.g. "Providing a block of data, such as the result of submitting a form, to a data-handling process" and not persisting after the entity representing that object has been sent.
While this means that a POST can create a new resource, and is certainly the best way to do so when it is the server that will give that new resource its URI (with PUT being the more appropriate method when the client dictates the new URI) it is can also be used for cases that delete objects (though again if it's a deletion of a single* resource identifiable by a URI then DELETE is far more appropriate), both create and delete objects, change multiple objects, it can mean that your kitchen light turns on but that the response is the same whether that worked or failed because the communication from the webserver to the kitchen light doesn't allow for feedback about success. It really can do anything at all.
But, your instincts are good in wanting this to be a GET: While the looseness of POST means we can make a case for it for just about every request (as done by approaches that use HTTP for an RPC-like protocol, essentially treating HTTP as if it was a transport protocol), this is inelegant in theory, inefficient in practice and clumsy in definition. You have an idempotent function that depends on solely on what the client is interested in, and that maps most obviously the GET in a few ways.
If we could fit everything in a URI then GET would be easy. E.g we can define a simple integer addition with something like http://example.net/addInts?x=1;y=2 representing the addition of 71 and 2 and hence being a permanent immutable resource representing the number 3 (since the results of GET can vary with changes to a resource over time, but this resource never changes) and then use a mechanism like HTML's <form> or javascript to allow the server to inform the client as to how to construct the URIs for other numbers (to maintain the HATEOS and/or COD constraints). Simples!
Your problem here is that you do not have input that can be represented as concisely as the numbers 1 and 2 can above. In theory you could do something like http://example.net/photoshoppedCheck?image=data:image/png;base64,iVBORw0KGgoAAAANSU… and hence create a URI that represents the resource of the results of the check. This URI will though will have 4 characters for every 3 bytes in the image. While there's no absolute limit on URI length both the theory and the practice allow this to fail (in theory HTTP allows for proxies and servers to set a limit on URI length, and in practice they do).
An argument could be made for using GET and sending a request body the same way as you would with a POST, and some webservers will even allow you to do this. However, GET is defined as returning an entity describing the resource identified in the URI with headers restricting how that entity does that describing: Since the request body is not part of that definition it must be ignored by your code! If you were tempted to bend this rule then you must consider that:
Some webservers will refuse the request or strip the body, so you may not be able to.
If your webserver does allow it, the fact that its not specified means you can't be sure an upgrade won't "fix" this and so break your code.
Some proxies will refuse or strip the request.
Some client libraries will most certainly refuse to allow developers to send a request body along with a GET.
So it's a no-no in both theory and practice.
About the only other approach we could do apart from POST is to have a URI that we consider as representing an image that was not photoshopped. Hence if you GET that you get an entity describing the image (obviously it could be the actual image, though it could also be something else if we stretch the concept of content-negotiation) and then PUT will check the image and if its deemed to not be photoshopped it responds with the same image and a 200 or just a 204 while if it is deemed to be photoshopped it responds with a 400 because we've tried to PUT a photoshopped image as a resource that can only be a non-photoshopped image. Because we respond immediately, there's no race-condition with simultaneous requests.
Frankly, this would be darn-right horrible. While I think I've made a case for it by the letter of the specs, it's just nasty: REST is meant to help us design clear APIs, not obtuse APIs we can offer a too-clever-for-its-own-good justification of.
No, in all the way to go here is to POST the image to a fixed URI which then returns a simple entity describing the analysis.
It's perfectly justifiable as REST (the POST creates a transient object based on that image, and then responds with an entity describing that object, and then that object disappears again). It's straight-forward. It's about as efficient as it could be (we can't do any HTTP caching† but most of the network delay is going to be on the upload rather than the download anyway). It also fits into the general use-case of "process something" that POST was first invented for. (Remember that first there was HTTP, then REST described why it worked so well, and then HTTP was refined to better play to those strengths).
In all, while the classic mistake that moves a web application away from REST is to abuse POST into doing absolutely everything when GET, PUT and DELETE (and perhaps the WebDAV methods) would be superior, don't be afraid to use its power when those don't meet the bill, and don't think that the "new subordinate of the resource" has to mean a full long-lived resource.
*Note that a "single" resource here might be composed of several resources that may have their own URIs, so it can be easy to have a single DELETE that deletes multiple objects, but if deleting X deletes A, B & C then it better be obvious that you can't have A, B or C if you don't have X or your API is not going to be understandable. Generally this comes down to what is being modelled, and how obvious it is that one thing depends on another.
†Strictly speaking we can, as we're allowed to send cache headers indicating that sending an identical entity to the same URI will have the same results, but there's no general-purpose web-software that will do this and your custom client can just "remember" the opinion about a given image itself anyway.
It's a difficult one. Like with many other scenarios there is no absolutely correct way of doing it. You have to try to interpret RESTful principles in terms of the limitations of the semantics of HTTP. (Incidentally, I don't think it's right to think of REST having semantics, REST is an architectural style which is commonly used with HTTP services, but can be used for any type of interface.)
I've faced a similar situation in my current project. We chose to use a POST but with the response code being a 200 (OK) rather than the 201 (Resource Created) usually returned by RESTful Web APIs.

What is the proper HTTP method for modifying a subordinate of the named resource?

I am creating a web client which has the purpose of modifying a set of database tables by adding records to them and removing records from them. It must do so atomically, so both deletion and insertion must be done with a single HTTP request. Clearly, this is a write operation of some sort, but I struggle to identify which method is appropriate.
POST seemed right at first, except that RFC 2616 specifies that a POST request must describe "a new subordinate" of the named resource. That isn't quite what I'm doing here.
PUT can be used to make changes to existing things, so that seemed about right, except that RFC 2616 also specifies that "the URI in a PUT request identifies the entity enclosed with the request [...] and the server MUST NOT attempt to apply the request to some other resource," which rules that method out because my URI does not directly specify the database tables.
PATCH seemed closer - now I am not cheating by only partly overwriting a resource - but RFC 5789 makes it clear that this method, like PUT, must actually modify the resource specified by the URI, not some subordinate resource.
So what method should I be using?
Or, more broadly for the benefit of other users:
For a request to X, you use
POST to create a new subordinate of X,
PUT to create a new X,
PATCH to modify X.
But what method should you use if you want to modify a subordinate of X?
To start.. not everything has to be REST. If REST is your hammer, everything may look like a nail.
If you really want to conform to REST ideals, PATCH is kind of out of the question. You're only really supposed to transfer state.
So the common 'solution' to this problem is to work outside the resources that you already have, but invent a new resource that represents the 'transaction' you wish to perform. This transaction can contain information about the operations you're doing in sequence, potentially atomically.
This allows you to PUT (or maybe POST) the transaction, and if needed, also GET the current state of the transaction to find out if it was successful.
In most designs this is not really appropriate though, and you should just fall back on POST and define a simple rpc-style action you perform on the parent.
First, allow me to correct your understanding of these methods.
POST is all about creating a brand new resource. You send some data to the server, and expect a response back saying where this new resource is created. The expectation would be that if you POST to /things/ the new resource will be stored at /things/theNewThing/. With POST you leave it to the server to decide the name of the resource that was created. Sending multiple identical POST requests results in multiple resources, each their own 'thing' with their own URI (unless the server has some additional logic to detect the duplicates).
PUT is mostly about creating a resource. The first major difference between PUT and POST is that PUT leaves the client in control of the URI. Generally, you don't really want this, but that's getting of the point. The other thing that PUT does, is not modify, if you read the specification carefully, it states that you replace what ever resource is at a URI with a brand new version. This has the appearance of making a modification, but is actually just a brand new resource at the same URI.
PATCH is for, as the name suggest, PATCHing a resource. You send a data to the server describing how to modify a particular resource. Consider a huge resource, PATCH allows you to send just the tiny bit of data that you wish to change, whilst PUT would require you send the entire new version.
Next, consider the resources. You have a set of tables each with many rows, that equates to a set of collections with many resources. Now, your problem is that you want to be able to atomically add resources and remove them at the same time. So you can't just POST then DELETE, as that's clearly not atomic. PATCHing the table how ever can be...
{ "add": [
{ /* a resource */ },
{ /* a resource */ } ],
"remove" : [ "id one", "id two" ] }
In that one body, we have sent the data to the server to both create two resources and delete two resources in the server. Now, there is a draw back to this, and that is that it's hard to let clients know what is going on. There's no 'proper' way of the client of the two new resources, 204 created is sort of there, but is meant have a header for the URI of the one new resource... but we added two. Sadly, this a problem you are going to face no matter what, HTTP simple isn't designed to handle multiple resources at once.
Transaction Resources
So this is a common solution people propose, and I think it stinks. The basic idea is that you first POST/PUT a blob of data on the server the encodes the transaction you wish to make. You then use another method to 'activate' this transaction.
Well hang on... that's two requests... it sends the same data that you would via PATCH and then you have fudge HTTP even more in order to somehow 'activate' this transaction. And what's more, we have this 'transaction' resource now floating around! What do we even do with that?
I know this question has been asked already some time ago, but I thought I should provide some commentary to this myself. This is actually not a real "answer" but a response to thecoshman's answer. Unfortunately, I am unable to comment on his answer which would be the right thing to do, but I don't have enough "reputation" which is a strange (and unnecessary) concept, IMHO.
So, now on to my comment for #thecoshman:
You seem to question the concept of "transactional resources" but in your answer it looks to me that you might have misunderstood the concept of them. In your answer, you describe that you first do a POST with the resource and the associated transaction and then POST another resource to "activate" this transaction. But I believe the concept of transactional resources are somehow different.
Let me give you a simple example:
In a system you have a "customer" resource and his address with customer as the primary (or named) resource and the address being the subordinate address. For this example, let us assume we have a customer with a customerId of 1234. The URI to reach this customer would be /api/customer/1234. So, how would you now just update the customer's address without having to update the entire customer resource? You could define a "transaction resource" called "updateCustomerAddress". With that you would then POST the updated customer address data (JSON or even XML) to the following URI: POST /api/customer/1234/updateCustomerAddress. The service would then create this new transactional resource to be applied to the customer with customerId=1234. Once the transaction resource has been created, the call would return with 201, although the actual change may not have been applied to the customer resource. So a subsequent GET /api/customer/1234 may return the old address, or already the new and updated address. This supports well an asynchronous model for updating subordinate resources, or even named resources.
And what would we do with the created transactional resource? It would be completely opaque to the client and discarded as soon as the transaction has been completed. So the call may actually not return a URI of the transactional resource since it may have disappeared already by the time a client would try to access it.
As you can see, transactional resources should not require two HTTP calls to a service and can be done in just one.
RFC 2616 is obsolete. Please read RFC 723* instead, in particular https://datatracker.ietf.org/doc/html/rfc7231#section-4.3.3.

Avoid duplicate POSTs with REST

I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists

Move resource in RESTful architecture

I have a RESTful web service which represent processes and activities. Each activity is inside one and only one process.
I would like to represent a "move" operation of activity between the process it is currently in and another process.
I've look at forums and found people suggest to use MOVE operation which is not very standard and other suggest to use PUT but then I'm not sure how to tell the difference between PUT that update and PUT that moves which looks semantically wrong.
Any ideas?
One way might be to represent the move itself as, say, a "transfer" resource (transfer as a noun), and POST a new one:
POST /transfer
With an entity containing:
activity: /activities/4
toProcess: /processes/13
This way, clients are creating new "transfers" which, on the server, handle validating and transferring the activity.
This gives you the ability to add information about the transfer, too. If you wanted to keep a history for auditing, you could add a transferredBy property to the resource, or a transferredOn date.
If using PUTs, you can tell the difference by whether the process of the existing entity matches the new one.
PUT /process1/activity2
process: 2
some_data: and_stuff
To which the logical response (if successful) is
303 See Other
Location: /process2/activity2
Given the available answers I'm not really satisfied with the proposals.
POST is an all purpose method that should be used if none of the other operations fit the bill. The semantics of a payload received are defined by the service/API only and may therefore a solution for one API but not for most ones. It further lacks the property of idempotency which in case of a network issue will leave the client in an uncertainty whether the request received the server and only the response got lost mid way or if the request failed to reach the server at all. A consecutive request might therefore lead to unexpected results or further actions required.
PUT has the semantics of replace the current representation obtainable from the resource (may be empty) with the representation provided in the payload. Servers are free to modify the received representation to a more fitting one or to append or remove further data. PUT may even have side effects on other resources as well, i.e. if a versioning mechanism for a document update is provided. While providing the above-mentioned idempotency property, PUT actually does not fit the semantics of the requested action. This might have serious implications on the interoperability as standard HTTP servers wont be able to server you correctly.
One might use a combination of POST to create the new representation on the new endpoint first and afterwards remove the old one via DELETE. However, this are two separate operations where the first one might fail and if not handled correctly lead to an immediate deletion of the original resource in worst case. There is no real transactional behavior in these set of operations unfortunately.
Instead of using the above mentioned operations I'd suggest to use PATCH. PATCH is a serious of changes calculated by the client necessary to transform a current representation to a desiered one. A server supporting PATCH will have to apply these instructions atomically. Either all of them are applied or none of them at all. PATCH can have side effects and is thus the most suitable fit to perform a move in HTTP currently. To properly use this method, however, a certain media-types should be used. One might orientate on JSON Patch (more reader-friendly) i.e., though this only defines the semantics of operations to modify state of JSON based representations and does not deal with multiple resources AFAIK.