REST: Prevent creation of duplicate resources via conditional POST? - rest

I've been searching for best practices for preventing the accidental creation of duplicate resources when using POST to create a new resource, for the case where the resource is to be named by the server and hence PUT can't be used. The API I'm building will be used by mobile clients, and the situation I'm concerned about is when the client gets disconnected after submitting the POST request but before getting the response. I found this question, but there was no mention of using a conditional POST, hence my question.
Is doing a conditional POST to the parent resource, analogous to using a conditional PUT to modify a resource, a reasonable solution to this problem? If not, why not?
The client/server interaction would be just like with a conditional PUT:
Client GETs the parent resource, including the ETag reflecting its current state (which would include its subordinate resources),
Client does a conditional POST to the parent resource (includes the parent's ETag value in an If-Match header) to create a new resource,
Client gets disconnected before getting the server response, so doesn't know if it succeeded,
Later, when reconnected, the client resubmits the same conditional POST request,
Either the earlier request didn't reach the server, so the server creates the resource and replies with a 201, or the earlier request did reach the server, so the server replies with a 412 and the duplicate resource isn't created.

Your solution is clever, but less than ideal. Your client may never get his 201 confirmation, and will have to interpret the 412 error as success.
REST afficianados often suggest you create the resource with an empty POST, then, once the client has the id of the newly created resource, he can do an "idempotent" update to fill it. This is nice, but you will likely need to make DB columns nullable that wouldn't otherwise be, and your updates are only idempotent if no-one else is trying to update at the same time.
According to ME, HTTP is flaky. Requests timeout, browser windows get closed, connections get reset, trains go into tunnels with mobile users aboard. There's a simple, robust pattern for dealing with this. Unsafe actions should always be uniquely identified, and servers should store, and be able to repeat if necessary, the response to any unsafe request. This is not HTTP caching, where a request may be served from cache but the cache may be flushed for whatever reason. This is a guarantee by the server application that if an "action" request is seen a second time, the stored response will be repeated without anything else happening. If the action identity is to be generated by the server, then a request-response should be dedicated just to sending the id. If you implement this for one unsafe request, you might as well do it for all of them, and in so doing you will escape numerous thorny problems: successive update requests wiping out other users' changes, or hitting incompatible states ("order already submitted"), successive delete requests generating 404 errors.
I have a little google doc exploring the pattern more fully if you're interested.

I think this scheme would work. If you want to ensure POST does not result in duplicates, you need the client to send something unique in the POST. The server can then verify uniqueness.
You might as well have the client generate a GUID for each request, rather than obtaining this from the server via a GET.
Your steps then become:-
Client generates a GUID
Client does a POST to the resource, which includes the GUID
Client gets disconnected and doesn't know if it succeeded
Client connects again and does another POST with the same GUID
Server checks the GUID, and either creates the resource (if it never received the first POST) or indicates that this was a duplicate
It might be more restful to use PUT, and have the client decide the resource name. If you did not like the choosen name, you could indicate that you had created the resource but that it's canonical location was somewhere of the server's choosing.

Why not simply do duplicate detection on the server based on the actual resource, using whatever internal mechanism the server chooses to use.
It's just safer that way.
Then you return the URL to the appropriate resource (whether it was freshly created or not).
If the parents ETag is based on the state of sub resources, then it's not a reliable mechanism to check for "duplicate resources". All you know is that the parent has "changed", somehow, since last time. How do you even know it's because your old POST was processed after disconnect? Could be anything changed that ETag.
This is basically a optimistic locking scenario being played out, and it comes down to another question. If the resource is already created, what then? Is that an error? Or a feature? Do you care? Is it bad to send a creation request that's silently ignored by the server when the resource already exists?
And if it already exists, but is "different" enough (i.e. say the name matches but the address is different), is that a duplicate? is that an update? is that a error for trying to change an existing resource?
Another solution is to make two trips. One to stage the request, another to commit it. You can query the status of the request when you come back if it's interrupted. If the commit didn't got through, you can commit it again. If it did, you're happy and can move on.
Just depends on how unstable your comms are and how important this particular operation is whether you want to jump through the hoops to do it safely.

Related

Are PUT and DELETE HTTP methods indispensable just because of their idempotency property?

I have a REST API and I want to handle all HTTP requests via POST request.
Is there any performance or other kind of issue in using just POST to perform all CRUD operations requested by a user, which sends a JSON containing some data and the operation to be performed?
Technically, the HTML used in the Web only supports GET and POST and this is more or less the reference implementation of a REST architecture.
So, while this is possible I wouldn't advocate for something like that as the idempotency property of PUT and DELETE provide some other benefits in case of network issues where a client can automatically resend the request regardless whether the initial request, whose response might have just got lost mid-way, actually performed its task or not. The result should always be an updated/created resource or a removed URI mapping to the actual resource (or even a removal of the actual resource) as DELETE technically just removes the URI mapping.
In regards to put some operations in the payload, it depends. This actually sounds very RPCy to me, similar to SOAP i.e. If the operation however is defined by a well-defined media-type, like in the JSON Patch case, I guess this is not wrong. Similar to the Web, however, a server should use some resource that is able to teach a client on how to build up a request, like HTML does with forms. This will not only teach the client on what fields the server supports for the target resource but also where to send the request to as well as the media-type and HTTP operation to use, which might be fixed to POST as in the HTML case.

When designing Restful API, should I use DELETE or POST for UNFOLLOW?

I have this question in my mind for a day, I tried to do some reading from RESTful Web Services Cookbook and other stackoverflow posts, but still did not get a convincing answer to this question:
Assuming I have a database table storing relationships between two users, the relation represents that if user A is following user B (e.g. on Instagram/Twitter).
userId|userId
------|------
userA | userB
userA | userC
....
So now if user A would like to unfollow user B, then should this API be DELETE or POST?
In RESTful Web Services Cookbook page 11, it says:
"The DELETE method is idempotent. This implies that the server must return response code 200 (OK) even if the server deleted the resource in a previous request. But in practice, implementing DELETE as an idempotent operation requires the server to keep track of all deleted resources. Otherwise, it can return a 404 (Not Found)."
Does this suggest us to not use DELETE whenever we can avoid?
Thanks for any insight on this question!
DELETE is for deleting specific resources. So if DELETE is appropriate for you depends on if you have a single resource that 'represents' the follow relationship between two users.
For example, if you have a resource as such:
/api/userA/follows/userB
Then it could be said that this resource represents the relationship between the two. It has a unique url, so this url can be deleted at which point I would expect the relationship to be severed.
Building on top of Evert's answer, the DELETE method is suitable for your needs, as long as you have a resource that represents the relationship between two users.
The semantic of the DELETE method is defined in the RFC 7231:
4.3.5. DELETE
The DELETE method requests that the origin server remove the association between the target resource and its current functionality. [...]
The DELETE method is, in fact, meant to be idempotent, but your quote is fundamentally wrong when it relates idempotency with the status code.
As I previously mentioned in this answer, idempotency is not related to the status code itself. Idempotency is about the effect produced on the state of the resource on the server, even if the response for the subsequent requests are different from the first request.
Consider a client performs a DELETE request to delete a resource from the server. The server processes the request, the resource gets deleted and the server returns 204. Then the client repeats the same DELETE request and, as the resource has already been deleted, the server returns 404 and that's totally fine.
Despite the different status code received by the client, the effect produced by a single DELETE request is the same effect of multiple DELETE requests to the same URI.

Differentiating REST status codes

Lately, I have started adding status codes to my responses instead of returning them directly.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
I do not have a lot of experience nor knowledge of HTTP servers so I am not sure I am explaining this (nor using it) right, so I apologize for the broad descriptions.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
To a client it doesn't matter whether the resource or the endpoint did not exist. All it is told by the server is that for the given URI there is no representation available.
As inf3rno already mentioned a client is usually served all of the URIs a client will need by the server directly in a response. Through bookmarking or including links in some external resource certain links might get invalid over time and as such a 404 Not Found response just informs the client that no representation is available for the given URI.
A client typically is also not interested in the internals of an API but just to send or receive data it can work upon.
A further misconception many users have, unfortunately, is, that they already assume certain resources to return certain types. Such types may lead to failures on the client side if the expected representation format ever changes. In addition to that the URI structure itself, including any path, matrix and query parameters, should not be used to deduce any logical structure of the API, its exposed endpoints or the logical structure of the resources to other resources of that API. A URI as a whole is a pointer to a resource. A resource may have a dozens of links pointing to it. You might think of a URI as cache-key for representations returned that, on consecutive invocations are further served by the cache instead of the actual server. This is actually one of the constraints REST imposes and is widely used on the Web.
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
RFC 7231 defines POST as an all-purpose tool that should be used if other methods aren't fitting for the task at hand. It explicitely states that the payload provided by that method will be processed according to the resource's own specific semantics. So, if you need to validate an email-address of a user before persisting it or before starting a calculation, background process or whatever, fine, do that :) Even PUT, which is often said to only replace the current representatin with the given one in the request, is not only allowed but also encouraged to perform verifications regarding any constraints the server has for the target resource and therefore it should refuse payloads that do not fit its expectations.
The quintesence here is, that a server should provide a client always with as much information as possible to let a client determine what to do next. Think of a Web based application which you access through your browser. If you receive a 400 Bad Request the browser will usually tell what the server didn't like about your request, i.e. incomplete syntax or missing value of a required field. The same holds true for REST APIs as they are basically just a generalization of the interaction model used on the Web. So the same concepts that apply to the Web also apply to REST :)
By that, each HTTP status code has its own semantics and should help the client to determine what the client should do next. A 400 Bad Request i.e. states that the server either cannot or will not process the request due to something that the server considers to be a client based error and it's up to the client to correct that failure and resend the request.
A 405 Method Not Allowed on the other hand indicates that the client used a HTTP method not supported by the targeted endpoint. An error response not only indicates that to the client but also which methods are allowed on the targeted endpoint within an Alllow response header.
Each of the HTTP status codes specified in RFC 7231 has their own semantics and its probably advisable to at least skim over these. You can also lookup all available status codes at IANA that provides links to the specificaton describing those status codes.
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
As with error codes also the success codes (in the 200 range) have their own semantics. If a new resource is created as outcome of processing a request (via PUT or POST) a client should be notified with a 201 Created status response that furthremore contains a HTTP Location header containing a URI targeting at the newly created resource.
If a server may take some time in order to calculate a response it is probably advisable to return a 202 Accepted response in order to inform a client about the pending request. A client can later on poll for the request either after some threshold period or after getting notified by the server through callback mechanisms such as email-notification or similar stuff. Due to German law restrictions i.e. German companies have to maintain archives of their messages exchanged via EDI. We, as an EDI provider, offer our clients to perform an archive of their exchanged messages via triggering one of our HTTP endpoints. Depending on the number of messages exchange by that company and the time period selected the archive should be generated for, this process may take some time (a couple of hours to be more concrete) and instead of letting the client wait for that period we simply return 202 Accepted and start the archiving process in the back. Depending on the configuration they either poll for the finished archive, get an information about the final result or directly get the archive sent through email if the file isn't to large.
204 No Content is also quite useful if a client performs an update onto a resource. As PUT is generally defined as replace the current representation with the one provided in the payload, upon receiving a 204 No Content response the client knows that the server applied the update and the current representation does look like the requested one by the client. Thus the server does not need to inform the client further how the current representation looks like, as the client already knows how it should look like. However, in case the server had to convert the payload to a different representation that maybe lead to an other outcome, it is probably benefitial to inform a client about the new state of the resource within a 200 OK response including the a representation of the outcome of the update process.
Returning 200 OK for a failure including a JSON payload with fields indicating about the error is for sure a bad way to proceed. Not only does it give clients a wrong hint but the response might also be cached by intermediaries and returned to other clients requesting the same even when the failure might only be of temporary nature (DB crash or the like). In additon to that is such a JSON payload proabably using a non-standardized format and thus requires out-of-band knowledge to actually process the message. While we humans are quite capable of figuring out what's going on, computers aren't yet that smart on their own.
I hope you can see that HTTP offers a lot of semantics on when to use what method or response code. They are there for a reason and therefore also should be used if the circumstances are right.
In GET request, 404 status is just a response code. You have to provide error message in body of the response in case when record is not found for the id provided.
For POST request, you can return 400 error code with specifying in the body which fields are missing/failing validation.
For url not found, User will always get the 404 error code.
For succcessful GET or POST request, you can return the response with 200 status
How am I supposed to differentiate if the endpoint does not exist on
the server or the resource does not exist?
The endpoint is the IRI (URI) of the web resource in this case. If the endpoint does not exist, then there is a good chance that the web resource does not exist either. It is an unlikely scenario, since you got your URIs from the server (HATEOAS), but it can happen if something changes between two requests, e.g. the URI template changes or somebody deletes the resource. In all of these cases the 404 is a fine HTTP status code. You can elaborate in the error message or use an additional error code, but for me it does not make sense, because the URI template change is a rare event. It would make the client more flexible though, since it could clear the cache and retry with a new link.

POST/PUT response REST in a CQRS/ES system

I'm implementing a CQRS/ES based system with a RESTful interface which is used by a webapp.
When performing certain actions e.g. creating a new profile I need to be able to check certain conditions, such as uniqueness of the profile ID, or that the person has the right to create a resource under a group. Which means I have a couple of options:
Context: POST/profiles { "email": "unique#example.com" }
From my REST API return 202 from my service with a location of the new resource where my client can poll for it. In this case, however, how do I handle errors as in effect the view will not exist or ever exist.
Create a saga on the initial request then dispatch the event. Once my service creates the view or finds the error then the result is written to the saga. When the saga has been completed return the result to the user.
From these two options - the second seems more reasonable to me, if not more complex. Is this a viable option for building RESTful request/response models on a CQRS/ES event sourced backend?
Yes, the second solution seems to better fit the business.
From what I understand from your case, from the DDD point of view, the creation of a user profile is a business process, with more than one steps (verifying the uniqueness of the profile, creating the profile and recovering from a duplicate profile situation). This process acts like an entity, it starts, runs and ends with a result (success or error). Being an entity it has an ID and it can be viewed as a REST resource. A Saga will be responsible for executing it.
So, in response to the client's request you send the URI of the process resource where the client can poll for the status. In case of error, it reads the error message. In case of success, it gets the URI of its profile.
The first solution can still be used if the use-case is simpler, if the command can be executed synchronously and the client gets the final result (error or success) as an immediate response.
From my REST API return 202 from my service with a location of the new resource where my client can poll for it. In this case, however, how do I handle errors as in effect the view will not exist or ever exist.
The usual answer here is that, as part of the 202 Accepted response, you include monitoring information
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
In other words, a link to a resource that will change when the accepted request is finally run.
So in describing the protocol, in addition to the resource that you create, you'll also need to document the representation used when you defer the work for later, and the representation used by the monitor.
When the saga has been completed return the result to the user.
Depending on the work, that may be overkill.
Which is to say, you are raising two different questions here; one of those is whether the request should be handled synchronously (don't respond until the work is done) or asynchronously (return right away, but give the client the means to monitor progress).
The other question is how the work looks from the business layer. If you are going to need multiple transactions to make the change, and if you may need to "revert" previously committed transactions in some variants of the process, then a saga (or a process manager) makes sense.
Set Validation -- the broader term for enforcing an invariant like "uniqueness" -- is awkward. Make sure you study, and ensure that you and the business understand the impact of having a failure.

REST DELETE idempotency

Given a DELETE request that will delete resource at an integer index, does the possibility of recreation of a resource at that index break strict idempotency?
e.g. a DELETE request is made to /api/resource/123 deletes the resource at 123. Then a post request is made which creates a new resource which can be retrieved by a GET request to the same url.
It seems to me that for the original DELETE to be properly idempotent, the API should never create a new, different, resource with a previously used id, but I can't find a clear reference.
Given a DELETE request that will delete resource at an integer index, does the possibility of recreation of a resource at that index break strict idempotency?
No.
RFC 7231
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response. For example, if a client sends a PUT request and the underlying connection is closed before any response is received, then the client can establish a new connection and retry the idempotent request. It knows that repeating the request will have the same intended effect, even if the original request succeeded, though the response might differ.
Notice: a generic client can retry the request; the client doesn't need to know anything about your particular implementation.
It seems to me that for the original DELETE to be properly idempotent, the API should never create a new, different, resource with a previously used id, but I can't find a clear reference.
That's not at all the case. Think about a static website. Can you, the website owner, delete foobar.html ? Of course you can. Can you recreate it later? Of course. If that's true, then it ought to be true for remoted editing as well.
And if it is true for remote editing of a web site, it should also be true for any other REST API. The point of the uniform interface is that consumers don't need to know if they are talking to a file system, a document store, a database, or some sophisticated service. The job of the API is to act as an integration layer, so that the underlying implementation acts just like the web.
In fact, this has nothing to do with the idempotent behavior of the methods.
This is the problem of naming resources. Because If the resource never existed, the deletion will behave exactly as after the resource has been deleted.
Yes, a second request will delete the new resource with the same name.
But if you are experiencing this problem, simply create a unique resource name.(UUID for example)
You can also try using the database index. Even if the entry with the key "123" is deleted - the database does not create it again.