Best practices for RESTful API for records with version numbers. Do I use PUT? - rest

Need some guidance on best practices for building a RESTful API in node.js
Let's say I have a person record like so:
{
id: 1,
name: 'Jon',
age: 25,
recordVersion: 1
}
If I need to increment the recordVersion every time a value gets changed, would I still use a HTTP PUT to update this record? I've researched on how PUT should be idempotent and should contain the newly-updated representation of the original resource, so I am no sure of what to do.
I could increment the recordVersion property on the first PUT call and send an error on the second PUT call with the same versionNumber of 1 (because it would have incremented to 2 at that point), but does this follow RESTful API standards?

Representation != State
The resources sent over the wire are a representation of the state, not the actual state.
It's perfectly fine to remove the recordVersion and to update it behind the scenes - however if you do that, it would be best to remove it from the representation returned by a GET to that resource as well. To understand why: idempotency is all about what would happen if you applied the operation multiple times in a row (it isn't guaranteed if other operations happen in between...), and about observable side effects.
PUT the data without the version
the data is updated
version code incremented
if you did a GET you would get the data you had PUT (with no version)
PUT the same data again without the version
the data is updated
version code incremented
if you did a GET you would get the same data you had PUT (with no version)
Idempotent, because the resource representation has not changed as a result of calling PUT twice, even though the internal entity state has changed - no observable side effects.
See http://restcookbook.com/HTTP%20Methods/idempotency/ for a bit more detail.
Using version codes to detect conflicts
As you note, you could use inspect the version and throw an error if it has changed - and in fact this is very RESTful, and in my opinion the best way to approach PUT as it helps avoid (often inexplicable) concurrency errors. If you detect this case, it would be appropriate to return a 409 Conflict http status code.
How this would work is:
PUT the data with the version (v1)
the data is updated
version code incremented
if you did a GET you would get the data you had PUT with the new version (v2) (this is a side effect, but it's ok to have a side effect from the first time you do an operation).
PUT the same data again with version (v1)
conflict is detected because v1 != v2
409 Conflict returned
if you did a GET you would get the same as the result of the first operation - the data you originally PUT with the version v2
This is idempotent, because there have been no observable side effects as a result of calling the operation twice.
The client should, in response to a 409, do another GET to get the latest version code, and possibly offer to the user the opportunity to merge their changes with whatever else has changed in the meantime.
Often people confuse idempotency with thinking that the response to the operation must be the same as a result of multiple calls, but that is not the case - it is about there being no observable side effects as a result of multiple sequential calls.

Related

REST: Should PUT endpoint compare the GET response before updating?

I am developing a REST API in Spring boot and i am trying to understand the PUT action. Currently my PUT action takes the object id and checks if the object is present and then replaced the object with the new one. I have the following questions.
Should i compare the old object with the new one before saving the new one to know if the newobject is modified at all and return an approriate http response status if the object is unchanged ?
If i do want to compare, what is the best way to know if the object is really modified or not
Thanks.
No, you don't need to compare generally. PUT means you need to replace all the values with new ones, but if you need to keep the created by or created time properties you should get the record from db and update necessary values form previous record to new one.
If you need to compare two objects you need to override equal and hashCode method. then you can check a.equal(b) to check whether values are changed. Keep in mind if object contains datatime or random generation numbers avoid adding them inside equal method.
PATCH is where you need to get the current record from db and update relevant values.
You can decide between PUT or PATCH, not all protocol support PATCH.
When a client needs to replace an existing Resource entirely, they can
use PUT. When they’re doing a partial update, they can use HTTP PATCH.
if you are using PUT there you need to do all compare things and check for example:
Get Object from DB
check and update whatever is updated business object
then save.
to answer your question, object is modified is not required you to compare value from existing object and new object.
but if client side OK to send only updated properties then PATCH is better to use.
to know more and how you can implement, please blog
HTTP PUT vs HTTP PATCH in a REST API
REST: Should PUT endpoint compare the GET response before updating?
Neither REST nor HTTP put any particular constraints on the implementation - only the semantics are constrained. RFC 7231
HTTP does not define exactly how a PUT method affects the state of an origin server beyond what can be expressed by the intent of the user agent request and the semantics of the origin server response.... Generally speaking, all implementation details behind the resource interface are intentionally hidden by the server.
RFC 7232 might have a clue to what you are looking for
An origin server MUST NOT perform the requested method if a received If-Match condition evaluates to false; instead, the origin server MUST respond with either a) the 412 (Precondition Failed) status code or b) one of the 2xx (Successful) status codes if the origin server has verified that a state change is being requested and the final state is already reflected in the current state of the target resource (i.e., the change requested by the user agent has already succeeded, but the user agent might not be aware of it, perhaps because the prior response was lost or a compatible change was made by some other user agent).
So as far as the spec is concerned, you can pretend that you changed things, even though a no-op was equivalent to what the client asked for.
If i do want to compare, what is the best way to know if the object is really modified or not
Not enough information provided. That's going to depend on things like how the origin server stores resource state. If you are just dealing with raw documents, then comparing two long hash keys could be fine.

Is it valid GET method in REST, that returns some set of data, but after a while, the dataset can be modified?

I was reading about "idempotent methods", but not quite get it.
1.1. So the GET method must be idempotent.
1.2. An idempotent HTTP method is a HTTP method that can be called many times without different outcomes. It would not matter if the method is called only once, or ten times over. The result should be the same. - See more at: http://restcookbook.com/HTTP%20Methods/idempotency/#sthash.hW6zSUi7.dpuf
Okay, that was theory. Now specific case:
2.1. I have exposed a GET method, that return all records in DB.
2.2. Somebody called this method and it returned 1000 results.
2.3. The application is running, so in a few minutes I have 1001 records in the DB.
2.4. Somebody (maybe the same caller) called this method again and now it returned 1001 results.
Is mine GET method is still idempotent or it should be changed to POST?
Yes.
Because the GET is not changing the resource. That's the distinction.
Consider:
GET /currenttime
Perfectly valid request, idempotent, but you'll get a new answer pretty much every time you call it.
An idempotent HTTP method is a HTTP method that can be called many times without different outcomes. It would not matter if the method is called only once, or ten times over. The result should be the same.
The opening sentence is somewhat unfortunate but the rest explains it pretty clearly.
The key point to note here is that the outcome may not be altered by any number of subsequent calls of the same method. The state of the resource, a represantation of which you're GETting is free to be changed by other means though.
In your example it isn't the GET request that's changing the state of the database. It's an external factor.
Is my GET method is still idempotent or it should be changed to POST?
Yes, the way you describe it, it's both idempotent and safe as it does not modify the state of your resources and it will always yield the same result provided that other parties do not alter the resource state between calls. Calling it does not affect the result of calling it.

Is it valid to modify a REST API representation based on a If-Modified-Since header?

I want to implement a "get changed values" capability in my API. For example, say I have the following REST API call:
GET /ws/school/7/student
This gets all the students in school #7. Unfortunately, this may be a lot. So, I want to modify the API to return only the student records that have been modified since a certain time. (The use case is that a nightly process runs from another system to pull all the students from my system to theirs.)
I see http://blog.mugunthkumar.com/articles/restful-api-server-doing-it-the-right-way-part-2/ recommends using the if-modified-since header and returning a representation as follows:
Search all the students updated since the time requested in the if-modified-since header
If there are any, return those students with a 200 OK
If there are no students returned from that query, return a 304 Not Modified
I understand what he wants to do, but this seems the wrong way to go about it. The definition of the If-Modified-Since header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.24) says:
The If-Modified-Since request-header field is used with a method to make it conditional: if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.
This seems wrong to me. We would not be returning the representation or a 304 as indicated by the RFC, but some hybrid. It seems like client side code (or worse, a web cache between server and client) might misinterpret the meaning and replace the local cached value, when it should really just be updating it.
So, two questions:
Is this a correct use of the header?
If not (and I suspect not), what is the best practice? Query string parameter?
This is not the correct use of the header. The If-Modified-Since header is one which an HTTP client (browser or code) may optionally supply to the server when requesting a resource. If supplied the meaning is "I want resource X, but only if it's changed since time T." Its purpose is to allow client-side caching of resources.
The semantics of your proposed usage are "I want updates for collection X that happened since time T." It's a request for a subset of X. It does not seem like your motivation is to enable caching. Your client-side cached representation seemingly contains all of X, even though the typical request will only return you a small set of changes to X; that is, the response is not what you are directly caching, so the caching needs to happen in custom user logic client-side.
A query string parameter is a much more appropriate solution. Below {seq} would be something like a sequence number or timestamp.
GET /ws/schools/7/students/updates?since={seq}
Server-side I imagine you have a sequence of updates since the beginning of your system and a request of the above form would grab the first N updates that had a sequence value greater than {seq}. In this way, if a client ever got very far behind and needed to catch up, the results would be paged.

Avoid duplicate POSTs with REST

I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists

Move resource in RESTful architecture

I have a RESTful web service which represent processes and activities. Each activity is inside one and only one process.
I would like to represent a "move" operation of activity between the process it is currently in and another process.
I've look at forums and found people suggest to use MOVE operation which is not very standard and other suggest to use PUT but then I'm not sure how to tell the difference between PUT that update and PUT that moves which looks semantically wrong.
Any ideas?
One way might be to represent the move itself as, say, a "transfer" resource (transfer as a noun), and POST a new one:
POST /transfer
With an entity containing:
activity: /activities/4
toProcess: /processes/13
This way, clients are creating new "transfers" which, on the server, handle validating and transferring the activity.
This gives you the ability to add information about the transfer, too. If you wanted to keep a history for auditing, you could add a transferredBy property to the resource, or a transferredOn date.
If using PUTs, you can tell the difference by whether the process of the existing entity matches the new one.
PUT /process1/activity2
process: 2
some_data: and_stuff
To which the logical response (if successful) is
303 See Other
Location: /process2/activity2
Given the available answers I'm not really satisfied with the proposals.
POST is an all purpose method that should be used if none of the other operations fit the bill. The semantics of a payload received are defined by the service/API only and may therefore a solution for one API but not for most ones. It further lacks the property of idempotency which in case of a network issue will leave the client in an uncertainty whether the request received the server and only the response got lost mid way or if the request failed to reach the server at all. A consecutive request might therefore lead to unexpected results or further actions required.
PUT has the semantics of replace the current representation obtainable from the resource (may be empty) with the representation provided in the payload. Servers are free to modify the received representation to a more fitting one or to append or remove further data. PUT may even have side effects on other resources as well, i.e. if a versioning mechanism for a document update is provided. While providing the above-mentioned idempotency property, PUT actually does not fit the semantics of the requested action. This might have serious implications on the interoperability as standard HTTP servers wont be able to server you correctly.
One might use a combination of POST to create the new representation on the new endpoint first and afterwards remove the old one via DELETE. However, this are two separate operations where the first one might fail and if not handled correctly lead to an immediate deletion of the original resource in worst case. There is no real transactional behavior in these set of operations unfortunately.
Instead of using the above mentioned operations I'd suggest to use PATCH. PATCH is a serious of changes calculated by the client necessary to transform a current representation to a desiered one. A server supporting PATCH will have to apply these instructions atomically. Either all of them are applied or none of them at all. PATCH can have side effects and is thus the most suitable fit to perform a move in HTTP currently. To properly use this method, however, a certain media-types should be used. One might orientate on JSON Patch (more reader-friendly) i.e., though this only defines the semantics of operations to modify state of JSON based representations and does not deal with multiple resources AFAIK.