RESTful API design: should unchangable data in an update (PUT) be optional? - rest

I'm in the middle of implementing a RESTful API, and I am unsure about the 'community accepted' behavior for the presence of data that can not change. For example, in my API there is a 'file' resource that when created contains a number of fields that can not be modified after creation, such as the file's binary data, and some metadata associated with it. Additionally, the 'file' can have a written description, and tags associated.
My question concerns doing an update to one of these 'file' resources. A GET of a specific 'file' will return all the metadata, description & tags associated with the file, plus the file's binary data. Should a PUT of a specific 'file' resource include the 'read only' fields? I realize that it can be coded either way: a) include the read only fields in the PUT data and then verify they match the original (or issue an error), or b) ignore the presence of the read only fields in the PUT data because they can't change, never issuing an error if they don't match or are missing because the logic ignores them.
Seems like it could go either way and be acceptable. The second method of ignoring the read only fields can be more compact, because the API client can skip sending that read only data if they want; which seems good for people who know what they are doing...

Personally, both ways are acceptable.... however, if I were you, I'll opt for option A (check read-only fields to ensure they are not changed, else throw an error). Depending on the scope of your project, you cannot assume what the consumers know about your Restful WS in depth because most of them don't read documentations or WADL, even if they are the experienced ones. :)
If you don't provide immediate feedback to the consumers that certain fields are read-only, they will have a false assumption that your web service will take care all the changes they have made without double checking, OR once they find out the "inconsistent" updates, they complain to others that your web service is buggy.
You can approach this in two different ways if the read-only field doesn't match the original values...
Don't process the request. Send a 409 Conflict code and specific error message.
Process the request, send a 200 OK and a message stating that changes made the read-only fields are ignored.

Unless the read-only data makes up a significant portion of the data (to the extreme that transmitting the read-only data has a noticeable impact on network traffic and response times), you should write the service to accept the read only fields in the PUT and check them for changes. It's just simpler to have the same data going in and out.
Look at it this way: You could make inclusion of the read only fields optional in the PUT, but you will still have to / should write the code in the service to check that any read only fields that were received contain the expected values. You have to write the read only checking either way.
Prohibiting the read-only fields in the PUT is a bad idea because it will require the clients to strip away fields they received from you in the GET. This requires that the client get more intimately involved with your data and semantics than they really need to be. The clients will consider this a headache, an unnecessary complication, and downright mean of you to add to their burden. Taking data received from your GET, modifying one field of interest, and sending it back to you with a PUT should be a brain-dead simple round-trip for the client. Don't complicate things when you don't have to.

Related

What REST method should be used to implement a simple inter-process communication?

This is more a theorical question than a practical one.
We have a backend application that uploads csv files to a frontend application, then and only then the backend sends an empty POST request to tell the frontend to start to process those files to update its database.
For this question it doesn't matter if this is a good design (I think it isn't), what are those files, and what database is: I am only want to know better about the REST "sintax".
I'm referring to wikipedia and restfulapi.net, but I'm not convinced about any alternative, because:
GET: Request sender doesn't receive data;
POST (the currently used): Request sender doesn't want to insert data that are on the request body (just data from external files, if existent. Also they can be insert/update/delete);
PUT: Sounds good, but again, data are not on the request body;
PATCH: Sounds best, but data are not on the body (Also, I am wrong or is it deprecated/unused?);
DELETE: Doesn't always need to delete.
I know it is habit to use POST requests to let machines yell "go!" to each other, but I never thought it was right.
What do you think - in theory - would be the proper method?
The actual reference for the semantics of the HTTP methods is the RFC 7231 and not the ones you referenced in your question.
POST is a catch all method and requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
4.3.3. POST
The POST method requests that the target resource process the
representation enclosed in the request according to the resource's
own specific semantics. For example, POST is used for the following
functions (among others):
Providing a block of data, such as the fields entered into an HTML
form, to a data-handling process;
Posting a message to a bulletin board, newsgroup, mailing list,
blog, or similar group of articles;
Creating a new resource that has yet to be identified by the
origin server; and
Appending data to a resource's existing representation(s).
[...]
Responses to POST requests are only cacheable when they include
explicit freshness information. However, POST caching is not widely implemented.
In these scenarios, the receiving application knows where the CSV files will be and monitors that location. When it finds one, it processes it and then deletes or archives it. The application will likely have its own criteria for considering itself ready to process, e.g. time of day, size of file etc.
If the data load on the front end takes a long time you could "partition" the updates based on "importance". How you define importance would be up to your business rules. You could then POST a list of CSV filenames/locations to the front end. The list would be ordered by importance. The front end could then update its database based on that importance. Scheduling less important data for a more appropriate time of day.
If the backend knows the difference between new users and updated users you could use PUT and POST. The front end could assign higher priority to PUT requests as they relate to new users, perhaps assigning lower priority and staggered syncing for CSV filenames in POST requests.

Why use two step approach to deleting multiple items with REST

We all know the 'standard' way of deleting a single item via REST is to send a single DELETE request to a URI example.com/Items/666. Grand, let's move on to deleting many at once. As we do not require atomic deleting (or true transaction, ie all or nothing) we could just tell the client 'tough luck, make many requests' but that's not very nice is it. So we need a way to allow a client to request many 'Items' be deleted at once.
From my understanding, the 'typical' solution to this problem is a 'two step' approach. First the client POSTs a list of item IDs and is returned a URI such as example.com/Items/Collection/1. Once that collection is created, they call DELETE on it.
Now, I see that this works just fine, except to me, it is a bad solution. Firstly, you are forcing the client to make two requests to accommodate the server. Secondly, 'I thought DELETE was supposed to delete an Item?', shouldn't calling DELETE on this URI effectively cancel the transaction (it's not a true transaction though), how would we even cancel it? Really would be better if there was some form 'EXECUTE' action, but I can't rock the boat that much. It also forces the server to have to consider 'the JSON that was POSTed looks more like a request to modify these Items, but the request was DELETE... so I guess I will delete them'. This approach also starts to impose a sort of state on the client/server, not a true state I will admit, but it is sort of.
In my opinion, a better solution would be to simply call DELETE on example.com/Items (or maybe example.com/Items/Collection to imply this is a multiple delete) and pass JSON data containing a list of IDs that you wish to delete. As far as I can see, this basically solves all the problems the first method had. It is easier to use as a client, reduces the work the server has to do, is truly stateless, is more semantic.
I would really appreciate the feed back on this, am I missing something about REST that makes my solution to this problem unrealistic? I would also appreciate links to articles, especially if they compare these two methods; I am aware this is not normally approved of for SO. I need to be able to disprove that only the first method is truly RESTfull, prove that the second approach is a viable solution. Of course, if I am barking up the wrong tree do tell me.
I have spent the last week or so reading a fair bit on REST, and to the best of my understanding, it would be wrong to describe either of these solutions as 'RESTfull', rather you should say that 'neither solution goes against what REST means'.
The short answer is simply that REST, as laid out in Roy Fielding's dissertation (See chapter 5), does not cover the topic of how to go about deleting resources, singular or multiple, in a REST manor. That's right, there is no 'correct RESTful way to delete a resource'... well, not quite.
REST itself does not define how delete a resource, but it does define that what ever protocol you are using (remember that REST is not a protocol) will dictate the how perform these actions. The protocol will usually be HTTP; 'usually' being the key word as Fielding will point out, REST is not synonymous with HTTP.
So we look to HTTP to say which method is 'right'. Sadly, as far as HTTP is concerned, both approaches are viable. Yes 'viable'. HTTP will allow a client to send a POST request with a payload (to create a collection resource), and then call a DELETE method on this new collection to delete the resources; it will also allow you to send the data within the payload of a single DELETE method to delete the list of resources. HTTP is simply the medium by which you send requests to the server, it would be up to the server to respond appropriately. To me, the HTTP protocol seems to be rather open to interpretation in places, but it does seem to lay down fairly clear guide lines for what actions mean, how they should be dealt with and what response should be given; it's just it is a 'you should do this' rather than 'you must do this', but perhaps I am being a little pedantic on the wording.
Some people would argue that the 'two stage' approach cannot possibly be 'REST' as the server has to store a 'state' for the client to perform the second action. This is simply a misunderstanding of some part. It must be understood that neither the client nor the server is storing any 'state' information about the other between the list being POSTed and then subsequently being DELETEd. Yes, the list must have been created before it can deleted, but the server does not remember that it was client alpha that made this list (such an approach would allow the client to simply call 'DELETE' as the next request and the server remembers to use that list, this would not be stateless at all) as such, the client must tell the server to DELETE that specific list, the list it was given a specific URI for. If the client attempted to DELETE a collection list that did not already exist it would simply be told 'the resource can not be found' (the classic 404 error most likely). If you wish to claim that this two step approach does maintain a state, you must also claim that to simply GET an URI requires a state, as the URI must first exist. To claim that there is this 'state' persisting is misunderstanding what 'state' means. And as further 'proof' that such a two stage approach is indeed stateless, you could quite happily have client alpha POST the list and later client beta (without having had any communication with the other client) call DELETE on the list resources.
I think it can stand rather self evident that the second option, of just sending the list in the payload of the DELETE request, is stateless. All the information required to complete the request is stored completely within the one request.
It could be argued though that the DELETE action should only be called on a 'tangible' resource, but in doing so you are blatantly ignoring the REpresentational part of REST; It's in the name! It is the representational aspect that 'permits' URIs such as http://example.com/myService/timeNow, a URI that when 'got' will return, dynamically, the current time, with out having to load some file or read from some database. It is a key concept that the URIs are not mapping directly to some 'tangible' piece of data.
There is however one aspect of that stateless nature that must be questioned. As Fielding describes the 'client-stateless-server' in section 5.1.3, he states:
We next add a constraint to the client-server interaction: communication must
be stateless in nature, as in the client-stateless-server (CSS) style of
Section 3.4.3 (Figure 5-3), such that each request from client to server must
contain all of the information necessary to understand the request, and
cannot take advantage of any stored context on the server. Session state is
therefore kept entirely on the client.
The key part here in my eyes is "cannot take advantage of any stored context on the server". Now I will grant you that 'context' is somewhat open for interpretation. But I find it hard to see how you could consider storing a list (either in memory or on disk) that will be used to give actual useful meaning would not violate this 'rule'. With out this 'list context' the DELETE operation makes no sense. As such, I can only conclude that making use of a two step approach to perform an action such as deleting multiple resources cannot and should not be considered 'RESTfull'.
I also begrudge somewhat the effort that has had to be put into finding arguments either way for this. The Internet at large seems to have become swept up with this idea the the two step approach is the 'RESTfull' way doing such actions, with the reasoning 'it is the RESTfull way to do it'. If you step back for a moment from what everybody else is doing, you will see that either approach requires sending the same list, so it can be ignored from the argument. Both approaches are 'representational' and 'stateless'. The only real difference is that for some reason one approach has decided to require two requests. These two requests then come with follow up questions, such as how 'long do you keep that data for' and 'how does a client tell a server that it no longer wants that this collection, but wishes to keep the actual resources it refers to'.
So I am, to a point, answering my question with the same question, 'Why would you even consider a two step approach?'
IMO:
HTTP DELETE on existing collection to delete all of its member seems fine. Creating the collection just to delete all of the member sounds odd. As you yourself suggest, just pass IDs of the to be deleted items using JSON (or any other payload format). I think that the server should try to make multiple deletes an internal transaction though.
I would argue that HTTP already provides a method of deleting multiple items in the form of persistent connections and pipelining. At the HTTP protocol level it is absolutely fine to request idempotent methods like DELETE in a pipelined way - that is, send all the DELETE requests at once on a single connection and wait for all the responses.
This may be problematic for an AJAX client running in a browser since few browsers have pipelining support enabled by default. This is not the fault of HTTP, though, it is the fault of those specific clients.

Avoid duplicate POSTs with REST

I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists

Move resource in RESTful architecture

I have a RESTful web service which represent processes and activities. Each activity is inside one and only one process.
I would like to represent a "move" operation of activity between the process it is currently in and another process.
I've look at forums and found people suggest to use MOVE operation which is not very standard and other suggest to use PUT but then I'm not sure how to tell the difference between PUT that update and PUT that moves which looks semantically wrong.
Any ideas?
One way might be to represent the move itself as, say, a "transfer" resource (transfer as a noun), and POST a new one:
POST /transfer
With an entity containing:
activity: /activities/4
toProcess: /processes/13
This way, clients are creating new "transfers" which, on the server, handle validating and transferring the activity.
This gives you the ability to add information about the transfer, too. If you wanted to keep a history for auditing, you could add a transferredBy property to the resource, or a transferredOn date.
If using PUTs, you can tell the difference by whether the process of the existing entity matches the new one.
PUT /process1/activity2
process: 2
some_data: and_stuff
To which the logical response (if successful) is
303 See Other
Location: /process2/activity2
Given the available answers I'm not really satisfied with the proposals.
POST is an all purpose method that should be used if none of the other operations fit the bill. The semantics of a payload received are defined by the service/API only and may therefore a solution for one API but not for most ones. It further lacks the property of idempotency which in case of a network issue will leave the client in an uncertainty whether the request received the server and only the response got lost mid way or if the request failed to reach the server at all. A consecutive request might therefore lead to unexpected results or further actions required.
PUT has the semantics of replace the current representation obtainable from the resource (may be empty) with the representation provided in the payload. Servers are free to modify the received representation to a more fitting one or to append or remove further data. PUT may even have side effects on other resources as well, i.e. if a versioning mechanism for a document update is provided. While providing the above-mentioned idempotency property, PUT actually does not fit the semantics of the requested action. This might have serious implications on the interoperability as standard HTTP servers wont be able to server you correctly.
One might use a combination of POST to create the new representation on the new endpoint first and afterwards remove the old one via DELETE. However, this are two separate operations where the first one might fail and if not handled correctly lead to an immediate deletion of the original resource in worst case. There is no real transactional behavior in these set of operations unfortunately.
Instead of using the above mentioned operations I'd suggest to use PATCH. PATCH is a serious of changes calculated by the client necessary to transform a current representation to a desiered one. A server supporting PATCH will have to apply these instructions atomically. Either all of them are applied or none of them at all. PATCH can have side effects and is thus the most suitable fit to perform a move in HTTP currently. To properly use this method, however, a certain media-types should be used. One might orientate on JSON Patch (more reader-friendly) i.e., though this only defines the semantics of operations to modify state of JSON based representations and does not deal with multiple resources AFAIK.

RESTful way to create multiple items in one request

I am working on a small client server program to collect orders. I want to do this in a "REST(ful) way".
What I want to do is:
Collect all orderlines (product and quantity) and send the complete order to the server
At the moment I see two options to do this:
Send each orderline to the server: POST qty and product_id
I actually don't want to do this because I want to limit the number of requests to the server so option 2:
Collect all the orderlines and send them to the server at once.
How should I implement option 2? a couple of ideas I have is:
Wrap all orderlines in a JSON object and send this to the server or use an array to post the orderlines.
Is it a good idea or good practice to implement option 2, and if so how should I do it.
What is good practice?
I believe that another correct way to approach this would be to create another resource that represents your collection of resources.
Example, imagine that we have an endpoint like /api/sheep/{id} and we can POST to /api/sheep to create a sheep resource.
Now, if we want to support bulk creation, we should consider a new flock resource at /api/flock (or /api/<your-resource>-collection if you lack a better meaningful name). Remember that resources don't need to map to your database or app models. This is a common misconception.
Resources are a higher level representation, unrelated with your data. Operating on a resource can have significant side effects, like firing an alert to a user, updating other related data, initiating a long lived process, etc. For example, we could map a file system or even the unix ps command as a REST API.
I think it is safe to assume that operating a resource may also mean to create several other entities as a side effect.
Although bulk operations (e.g. batch create) are essential in many systems, they are not formally addressed by the RESTful architecture style.
I found that POSTing a collection as you suggested basically works, but problems arise when you need to report failures in response to such a request. Such problems are worse when multiple failures occur for different causes or when the server doesn't support transactions.
My suggestion to you is that if there is no performance problem, for example when the service provider is on the LAN (not WAN) or the data is relatively small, it's worth it to send 100 POST requests to the server. Keep it simple, start with separate requests and if you have a performance problem try to optimize.
Facebook explains how to do this: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Simple batched requests
The batch API takes in an array of logical HTTP requests represented
as JSON arrays - each request has a method (corresponding to HTTP
method GET/PUT/POST/DELETE etc.), a relative_url (the portion of the
URL after graph.facebook.com), optional headers array (corresponding
to HTTP headers) and an optional body (for POST and PUT requests). The
Batch API returns an array of logical HTTP responses represented as
JSON arrays - each response has a status code, an optional headers
array and an optional body (which is a JSON encoded string).
Your idea seems valid to me. The implementation is a matter of your preference. You can use JSON or just parameters for this ("order_lines[]" array) and do
POST /orders
Since you are going to create more resources at once in a single action (order and its lines) it's vital to validate each and every of them and save them only if all of them pass validation, ie. you should do it in a transaction.
I've actually been wrestling with this lately, and here's what I'm working towards.
If a POST that adds multiple resources succeeds, return a 200 OK (I was considering a 201, but the user ultimately doesn't land on a resource that was created) along with a page that displays all resources that were added, either in read-only or editable fashion. For instance, a user is able to select and POST multiple images to a gallery using a form comprising only a single file input. If the POST request succeeds in its entirety the user is presented with a set of forms for each image resource representation created that allows them to specify more details about each (name, description, etc).
In the event that one or more resources fails to be created, the POST handler aborts all processing and appends each individual error message to an array. Then, a 419 Conflict is returned and the user is routed to a 419 Conflict error page that presents the contents of the error array, as well as a way back to the form that was submitted.
I guess it's better to send separate requests within single connection. Of course, your web-server should support it
You won't want to send the HTTP headers for 100 orderlines. You neither want to generate any more requests than necessary.
Send the whole order in one JSON object to the server, to: server/order or server/order/new.
Return something that points to: server/order/order_id
Also consider using CREATE PUT instead of POST