Proper status code for data that is well formed but invalid because of system state - rest

I have a system where users are represented primarily with an integer ID. I have a resource; let's call it X. X is associated with 2 users: one created X and the other will approve X when the creator is done. X's approver is selected by the creator when it is submitted via POST (or can be edited in later), and the request identifies the approver by user ID. There's one additional restriction: approvers and creators are paired together. Approvers can only approve X if the creator is assigned to them.
So I have a few possible failure cases regarding the approver's user ID in the request:
Malformed ID (non-integer)
Approver ID is an integer, but no user with that ID exists.
Approver ID is an existing user, but that user doesn't have the approver role.
Approver ID is an existing approver, but the approver is not assigned to the creator.
400 is obviously the correct status code for case 1, but it doesn't seem like the right status code for 2-4. 400 designates a malformed request, but 2-4 are problems specifically with existing data, not with parsing the request. I considered 409, but that seems to be a problem with the resource itself. This is a problem with additional resources that are related to resource X. I also thought about 406, but that seems to be geared toward providing content in an unknown format (like XML when only JSON is accepted).
So what status code is appropriate to indicate that the client provided well formed but bad data?

Note that for clarity to clients you will always include an explanation of the error, so a slightly inexact code with an appropriate message will most certainly be helpful.
That being said, I would use 404 (Not Found) for 1 and 2. Integer or not, a resource with that ID does not exist.
Both 3 and 4 seem localized to our application, so I would use 400. 403 could be used for 3, but that might imply authentication problems.

I agree with #Will for 1 & 2 - 404.
For 3 & 4, I would go with 409. Since in the general case (you said that the approver can be changed later) there's no real distinction (that I can see) between:
Approver ID is an existing user, but that user doesn't have the approver role.
and
Approver ID is an existing user, and 5 seconds ago they were the approver, but that is no longer the case
So it feels the same as an edit conflict.

Related

Rest API: path for accessing derived data

It is not clear to me that if I have a micro service that is in place to provide some derived data how the rest api should be designed for this. For instance :-
If I have a customer and I want to access the customer I would define the API as:
/customer/1234
this would return everything we know about the customer
however if I want to provide a microservice that simply tells me if the customer was previously known to the system with another account number what do I do. I want this logic to be in the microservice but how do I define the API
customer/1234/previouslyKnow
customerPreviouslyKnown/1234
Both don't seem correct. In the first case it implies
customer/1234
could be used to get all the customer information but the microservice doesn't offer this.
Confused!
Adding some extra details for clarification.
I suppose my issue is, I don't really want a massive service which handles everything customer related. It would be better if there were lighter weight services that handles customer orders, customer info, customer history, customer status (live, lost, dead....).
It strikes me all of these would start with
/customer/XXXX
so would all the services be expected to provide a customer object back if only customer/XXXX was given with no extra in the path such as /orders
Also some of the data as mentioned isn't actually persisted anywhere it is derived and I want the logic of this hidden in a service and not in the calling code. So how is this requested and returned.
Doing microservices doesn't mean to have a separate artifact for each method. The rules of coupling and cohesion also apply to the microservices world. So if you can query several data all related to a customer, the related resources should probably belong to the same service.
So your resource would be /customers/{id}/previous-customer-numbers whereas /customers (plural!) is the list of customers, /customers/{id} is a single customer and /customers/{id}/previous-customer-numbers the list of customer numbers the customer previously had.
Try to think in resources, not operations. So returning the list of previously used customer numbers is better than returning just a boolean value. /customer/{id}/previous-accounts would be even better, I think...
Back to topic: If the value of previous-accounts is directly derived from the same data, i.e. you don't need to query a second database, etc. I would even recommend just adding the value to the customer representation:
{
"id": "1234",
"firstName": "John",
"lastName": "Doe",
"previouslyKnown": true,
"previousAccounts": [
{
"id": "987",
...
}
]
}
Whether the data is stored or derived shouldn't matter so the service client to it should not be visible on the boundary.
Adding another resource or even another service is unnecessary complexity and complexity kills you in the long run.
You mention other examples:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Orders is clearly different from customer data so it should reside in a separate service. An order typically also has an order id which is globally unique. So there is the resource /orders/{orderId}. Retrieving orders by customer id is also possible:
/orders;customer={customerId}
which reads give me the list of orders for which the customer is identified by the given customer id.
These parameters which filter a list-like rest resource are called matrix parameters. You can also use a query parameter: /orders?customer={customerId} This is also quite common but a matrix parameter has the advantage that it clearly belongs to a specific part of the URL. Consider the following:
/orders;customer=1234/notifications
This would return the list of notifications belonging to the orders of the customer with the id 1234.
With a query parameter it would look like this:
/orders/notifications?customer=1234
It is not clear from the URL that the orders are filtered and not the notifications.
The drawback is that framework support for matrix parameters is varying. Some support them, some don't.
I'd like matrix parameters best here but a query parameter is OK, too.
Going back to your list:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Customer info and customer status most likely belong to the same service (customer core data or the like) or even the same resource. Customer history can also go there. I would place it there as long as there isn't a reason to think of it separately. Maybe customer history is such a complicated domain (and it surely can be) that it's worth a separate service: /customer-history/{id} or maybe just /customer/{id}.
It's no problem that different services use the same paths for providing different information about one customer. They are different services and they have different endpoints so there is no collision whatsoever. Ideally you even have a DNS alias pointing to the corresponding service:
https://customer-core-data.service.lan/customers/1234
https://customer-history.service.lan/customers/1234
I'm not sure if I really understand your question. However, let me show how you can check if a certain resource exist in your server.
Consider the server provides a URL that locates a certain resource (in this situation, the URL locates a customer with the identifier 1): http://example.org/api/customers/1.
When a client perform a GET request to this URL, the client can expect the following results (there may be other situation, like authentication/authorization problems, but let's keep it simple):
If a customer with the identifier 1 exists, the client is supposed to receive a response with the status code 200 and a representation of the resource (for example, a JSON or XML representing the customer) in the response payload.
If the customer with the identifier 1 do not exist, the client is supposed to receive a response with the status code 404.
To check whether a resource exists or not, the client doesn't need the resource representation (the JSON or XML that represents the customer). What's relevant here is the status code: 200 when the resource exists and 404 when the resource do not exist. Besides GET requests, the URL that locates a customer (http://example.org/api/customers/1) could also handle HEAD requests. The HEAD method is identical to the GET method, but the server won't send the resource representation in HEAD requests. Hence, it's useful to check whether a resource exists or not.
See more details regarding the HEAD method:
4.3.2. HEAD
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification. [...]
If the difference between resource and resource representation is not clear, please check this answer.
One thing I want to add to the already great answers is: URLS design doesn't really matter that much if you do REST correctly.
One of the important tenets of REST is that urls are discovered. A client that has the customers's information already, and wants to find out what the "previously known" information, should just be able to discover that url on the main customer resource. If it links from there to the "previously known" information, it doesn't matter if the url is on a different domain, path, or even protocol.
So if you application naturally makes more sense if "previouslyKnown" is on a separate base path, then maybe you should just go for that.

What should be the response of GET for multiple requested resources with some invalid ids?

what should be the response to a request to
http://localhost:8080/users/1,2,3 when the system doesn't have a user with id 3?
When all users are present I return a 200 response code with all user objects in the response body. When the user requests a single missing user I return a 404 with an error message in the body.
However, what should be the body and status code for a mix between valid and missing ids?
I assume that you want to follow REST API principles. In order to keep clear api design you should rather use query string for filtering
http://localhost:8080/users?id=1,2,3
Then you won't have such dilemmas - you can return just only users with id contained in provided value list and 200 status code (even if list is empty). This endpoint in general
http://localhost:8080/users/{id}
should be reserved for requesting single resource (user) by providing primary key.
What you are requesting there is a collection. The request essentially reads: "give me all users whose ID is in {1, 2, 3}." A subset of those users (let's say there is no user yet with the ID 3) would still be a successful operation, which is asking for a 200 (OK).
If you are overly concerned by this, there's still the possibility to redirect the client via 303 (See Other) to a resource representation without the offending elements.
If all of the IDs are invalid, things get a bit tricky. One may be tempted to simply return a 404 (Not Found), but strictly speaking that were not correct:
The 404 status code indicates that the origin server did not find a current representation for the target resource
Indeed, there is one: The empty set. From a programmatic standpoint, it may indeed be easier to just return that instead of throwing an error. This relies on clients being able to process empty sets/documents.
The RFC grants you the freedom to go either way:
[…] the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
So if you wish to hide the existence of an empty set, it's okay. It bears mentioning that a set containing nothing is not nothing itself ;)
I would recommend not to offer the method in the first place, but rather force the user of your APIto make three separate requests and return unambiguous responses (two 200s for users 1 and 2 and 404 for user 3). Additionally, the API could offer a get method that responds with all available user ids or such (depends on your system).
Alternatively, if that's not an option, I guess, you have two options:
Return 404 as soon as one user is not found, which technically is more accurate in my opinion. I mean, the request was for 1, 2 AND 3, which was not found.
Return 200 with users 1 and 2, and null, which probably is the most useful for your scenario.

REST API - filters, child entities or is leaking information really so bad?

The basic problem
I have a rest API. For the sake of this example, let's say I have users, they can be members in any number of groups, and both users and groups can own objects.
Any user can filter objects by various criteria:
/objects?color=green
/objects?created=yesterday
but only members of a given group can filter by group ownership:
/objects?groupId=1
and only the actual user can filter by user ownership:
/objects?userId=55
There are now two basic patterns - one could make the object a child entity of the group, such as:
/groups/4/objects/1
with 4 being the group ID and 1 being the object ID. the other option is having group and objects side-by-side:
/groups/4
and
/objects/1
making the object a child of the group and/or user would eliminate the other filtering options - essentially, i have one object with multiple paths to it.
The actual question
If I want to limit access for a regular user so that he/she can only access objects that are directly owned by him/her or by groups that he/she is a member of, it does work as a filter on the collection - but what about the entity level?
If I try:
/objects/9
But the object is owned by a group I am not a member of, I would expect an authorization error, while if the object doesn't exist at all, i would expect a "not found" - this, however, would leak information about the book's existence, and I also would have to retrieve the object in order to be able to determine whether or not the user has the right to see it.
So I came up with this:
/objects/9?groupId=4
or
/objects/9?userId=55
In this, I can base the initial decision on authorization on the group ID or user ID, and then try to retrieve the object with the additional restriction.
If the user is NOT a member of group 4, I can say not authorized, and if the book doesn't exist, I can say not found, meaning not that the object doesn't exist, but that the object doesn't exist in group 4. This answer is more clear, and also I would not have to retrieve the object first.
The alternative would be to return an authorization error regardless of whether it is due to the fact I am not authorized OR due to the fact that the object doesn't exist. This answer is slightly imprecise, but it would put less of a burden on the caller.
Another possibility would be to map multiple paths:
/objects
/groups/4/objects
/users/9/objects
/colors/green/objects
This seems rather messy and would violate the principle of having a single path for a single concept.
Does anyone have any practical insight on this? Any reasons (apart from the ones mentioned) why one or the other would be preferable?
If I understand you correct, every object is linked to (at least) one group or (at least) one user, so you don't have the problem of having an object without a group or user.
If this is the case I don't see the point in using filters as it would not make sense in a REST way and also not give any benefit to the client site.
So as you suggested you could just use the following:
/groups/$groupID/objects/$objectID
and
/user/$userID/objects/$objectID
Now you server should check if the client is authorized ("the user a member of the group" / "the current user") given the $groupID xor the $userID
if not authorized: not even check if object is there. Just give the not authorized error.
if authorized: give standard response codes
I don't see a benefit for a non authorized client to get information if a resource is available or not as its not important for him, because he cant access it either way. And as you suggested it would result in a information leak which could result in a security problem (but that is completely depending on your API and what its information and usage).
Now lets go through the scenarios for group calls:
User Arnold (member of groups: 1,2,3) wants to access existing object 7 of the member group 3.
GET /groups/3/objects/7
response: #200
User Arnold (member of groups: 1,2,3) wants to access non existing object 55 of his member group 2.
GET /groups/2/objects/55
response: #404
User Arnold (member of groups: 1,2,3) wants to access existing object 11 of a non member group 5.
GET /groups/5/objects/11
response: #401
User Arnold (member of groups: 1,2,3) wants to access non existing object 19 of a non member group 5.
GET /groups/5/objects/19
response: #401
And for user objects:
User Arnold wants to access his non existing object 56.
GET /user/arnold/objects/56
response: #404
User Arnold wants to access his existing object 13.
GET /user/arnold/objects/13
response: #200
User Arnold wants to access Jon's existing object 77.
GET /user/jon/objects/77
response: #401
User Arnold wants to access Jon's non existing object 88.
GET /user/jon/objects/88
response: #401
As you can see the server just responds with #401 if the client is non authorzied. Additional it would be great to give a error message in the body e.g. Sorry, but you are not authorized to see content of user "Jon" or Sorry, but you are not authorized to see content of group "ABYZX", so the client knows what the problem is.
This seems rather messy and would violate the principle of having a single path for a single concept.
I don't see it that way as also different sources (1) (3) and SO answers say its really no problem to have multiple paths or URIs.
Each resource in a service suite will have at least one URI identifying it.
It could help clients understand the authorization process and with that help them to navigate through your API.
Any of the URI choices you describe are fine. I perfer flat URIs, but it doesn't really matter.
The real question is how to handle unauthorized requests. In that case, I suggest responding with 404 in all cases including when the resource exists, but the user doesn't have access. It avoids the information leaking problem and it is completely compatible with the HTTP specification.
This pattern makes sense logically too. From the perspective of the user with insufficient permissions, the resource doesn't exist.
If you have ever tried to view a private Github project without having permissions to view it, you would have seen this pattern in action. Github will respond with 404 even if the project actually exists.

Should the natural or surrogate key be returned in an API?

First time I think about it...
Until now, I always used the natural key in my API. For example, a REST API allowing to deal with entities, the URL would be like /entities/{id} where id is a natural key known to the user (the ID is passed to the POST request that creates the entity). After the entity is created, the user can use multiple commands (GET, DELETE, PUT...) to manipulate the entity. The entity also has a surrogate key generated by the database.
Now, think about the following sequence:
A user creates entity with id 1. (POST /entities with body containing id 1)
Another user deletes the entity (DELETE /entities/1)
The same other user creates the entity again (POST /entities with body containing id 1)
The first user decides to modify the entity (PUT /entities/1 with body)
Before step 4 is executed, there is still an entity with id 1 in the database, but it is not the same entity created during step 1. The problem is that step 4 identifies the entity to modify based on the natural key which is the same for the deleted and new entity (while the surrogate key is different). Therefore, step 4 will succeed and the user will never know it is working on a new entity.
I generally also use optimistic locking in my applications, but I don't think it helps here. After step 1, the entity's version field is 0. After step 3, the new entity's version field is also 0. Therefore, the version check won't help. Is the right case to use timestamp field for optimistic locking?
Is the "good" solution to return surrogate key to the user? This way, the user always provides the surrogate key to the server which can use it to ensure it works on the same entity and not on a new one?
Which approach do you recommend?
It depends on how you want your users to user your api.
REST APIs should try to be discoverable. So if there is benefit in exposing natural keys in your API because it will allow users to modify the URI directly and get to a new state, then do it.
A good example is categories or tags. We could have these following URIs;
GET /some-resource?tag=1 // returns all resources tagged with 'blue'
GET /some-resource?tag=2 // returns all resources tagged with 'red'
or
GET /some-resource?tag=blue // returns all resources tagged with 'blue'
GET /some-resource?tag=red // returns all resources tagged with 'red'
There is clearly more value to a user in the second group, as they can see that the tag is a real word. This then allows them to type ANY word in there to see whats returned, whereas the first group does not allow this: it limits discoverability
A different example would be orders
GET /orders/1 // returns order 1
or
GET /orders/some-verbose-name-that-adds-no-meaning // returns order 1
In this case there is little value in adding some verbose name to the order to allow it to be discoverable. A user is more likely to want to view all orders first (or a subset) and filter by date or price etc, and then choose an order to view
GET /orders?orderBy={date}&order=asc
Additional
After our discussion over chat, your issue seems to be with versioning and how to manage resource locking.
If you allow resources to be modified by multiple users, you need to send a version number with every request and response. The version number is incremented when any changes are made. If a request sends an older version number when trying to modify a resource, throw an error.
In the case where you allow the same URIs to be reused, there is a potential for conflict as the version number always begins from 0. In this case, you will also need to send over a GUID (surrogate key) and a version number. Or don't use natural URIs (see original answer above to decided when to do this or not).
There is another option which is to disallow reuse of URIs. This really depends on the use case and your business requirements. It may be fine to reuse a URI as conceptually it means the same thing. Example would be if you had a folder on your computer. Deleting the folder and recreating it, is the same as emptying the folder. Conceptually the folder is the same 'thing' but with different properties.
User account is probably an area where reusing URIs is not a good idea. If you delete an account /accounts/u1, that URI should be marked as deleted, and no other user should be able to create an account with username u1. Conceptually, a new user using the same URI is not the same as when the previous user was using it.
Its interesting to see people trying to rediscover solutions to known problems. This issue is not specific to a REST API - it applies to any indexed storage. The only solution I have ever seen implemented is don't re-use surrogate keys.
If you are generating your surrogate key at the client, use UUIDs or split sequences, but for preference do it serverside.
Also, you should never use surrogate keys to de-reference data if a simple natural key exists in the data. Indeed, even if the natural key is a compound entity, you should consider very carefully whether to expose a surrogate key in the API.
You mentioned the possibility of using a timestamp as your optimistic locking.
Depending how strictly you're following a RESTful principle, the Entity returned by the POST will contain an "edit self" link; this is the URI to which a DELETE or UPDATE can be performed.
Taking your steps above as an example:
Step 1
User A does a POST of Entity 1. The returned Entity object will contain a "self" link indicating where updates should occur, like:
/entities/1/timestamp/312547124138
Step 2
User B gets the existing Entity 1, with the above "self" link, and performs a DELETE to that timestamp versioned URI.
Step 3
User B does a POST of a new Entity 1, which returns an object with a different "self" link, e.g.:
/entities/1/timestamp/312547999999
Step 4
User A, with the original Entity that they obtained in Step 1, tries doing a PUT to the "self" link on their object, which was:
/entities/1/timestamp/312547124138
...your service will recognise that although Entity 1 does exist; User A is trying a PUT against a version which has since become stale.
The service can then perform the appropriate action. Depending how sophisticated your algorithm is, you could either merge the changes or reject the PUT.
I can't remember the appropriate HTTP status code that you should return, following a PUT to a stale version... It's not something that I've implemented in the Rest framework that I work on, although I have planned to enable it in future. It might be that you return a 410 ("Gone").
Step 5
I know you don't have a step 5, but..! User A, upon finding their PUT has failed, might re-retrieve Entity 1. This could be a GET to their (stale) version, i.e. a GET to:
/entities/1/timestamp/312547124138
...and your service would return a redirect to GET from either a generic URI for that object, e.g.:
/entities/1
...or to the specific latest version, i.e.:
/entities/1/timestamp/312547999999
They can then make the changes intended in Step 4, subject to any application-level merge logic.
Hope that helps.
Your problem can be solved either using ETags for versioning (a record can only modified if the current ETag is supplied) or by soft deletes (so the deleted record still exists but with a trashed bool which is reset by a PUT).
Sounds like you might also benefit from a batch end point and using transactions.

Appending to a resource's attribute RESTfully

This is a follow up to Updating a value RESTfully with Post
How do I simply append to a resource's attribute using REST. Imagine I have customer.balance and balance is an int. Let' say I just want to tell the server to append 5 to whatever the current balance is. Can I do this restfully? If so, how?
Keep in mind that the client doesn't know the customer's existing balance, so it can't just
get customer
customer.balance += 5
post customer
(there would also be concurrency issues with the above.)
Simple, slightly ugly:
This is a simpler variation of my answer to your other question.
I think you're still within the constraints of REST if you do the following. However, I'm curious about what others think about this situation as well, so I hope to hear from others.
Your URI will be:
/customer/21/credits
You POST a credit resource (maybe <credit>5</credit>) to the URI, the server can then take the customer's balance and += it with the provided value. Additionally, you can support negative credits (e.g. <credit>-10</credit>);
Note that /customer/21/credits doesn't have to support all methods. Supporting POST only is perfectly acceptable.
However, this gets a little weird if credits aren't a true resource within your system. The HTTP spec says:
If a resource has been created on the origin server, the response SHOULD be 201 (Created) and contain an entity which describes the status of the request and refers to the new resource, and a Location header.
Technically you're not creating a resource here, you're appending to the customer's balance (which is really an aggregate of all previous credits in the system). Since you're not keeping the credit around (presumably), you wouldn't really be able to return a reference to the newly "created" credit resource. You could probably return the customer's balance, or the <customer> itself, but that's a bit unintuitive to clients. This is why I think treating each credit as a new resource in the system is easier to work with (see below).
My preferred solution:
This is adapted from my answer in your other question. Here I'll try to approach it from the perspective of what the client/server are doing:
Client:
Builds a new credit resource:
<credit>
<amount>5</amount>
</credit>
POSTs resource to /customer/21/credits
POSTing here means, "append this new <credit> I'm providing to the list of <credit>s for this customer.
Server:
Receives POST to /customer/21/credits
Takes the amount from the request and +=s it to the customer's balance
Saves the credit and its information for later retrieval
Sends response to client:
<credit href="/customer/21/credits/credit-id-4321234">
<amount>5</amount>
<date>2009-10-16 12:00:23</date>
<ending-balance>45.03</ending-balance>
</credit>
This gives you the following advantages:
Credits can be accessed at a later date by id (with GET /customer/21/credits/[id])
You have a complete audit trail of credit history
Clients can, if you support it, update or remove credits by id (with PUT or DELETE)
Clients can retrieve an ordered list of credits, if you support it; e.g. GET /customer/21/credits might return:
<credits href="/customer/21/credits">
<credit href="/customer/21/credits/credit-id-7382134">
<amount>13</amount>
...
</credit>
<credit href="/customer/21/credits/credit-id-134u482">
...
</credit>
...
</credits>
Makes sense, since the customer's balance is really the end result of all credits applied to that customer.
To think about this in a REST-ful way, you would need to think about the action itself as a resource. For example, if this was banking, and you wanted to update the balance on an account, you would create a deposit resource, and then add one of those. The consequence of this would be to update the customer's balance
This also helps deal with concurrency issues, because you would be submitting a +5 action rather than requiring prior knowledge of the customer's balance. And, you would also be able to recall that resource (say deposit/51 for deposit with an ID of 51) and see other details about it (ie. Reason for deposit, date of deposit etc.).
EDIT: Realised that using an id of 5 for the deposit actually confuses the issue, so changed it to 51.
Well, there is alternative other than #Rob-Hruska 's solution.
The fundamental idea is the same: to think each credit/debit operation as a standalone transaction. However I once used a backend which supports storing schema-less data in json, so that I end up with defining the API as PUT with dynamic field names. Something like this:
PUT /customer/21
{"transaction_yyyymmddHHMMSS": 5}
I know this is NOT appropriate in the "credit/debit" context because an active account could have growing transaction records. But in my context I am using such tactics to store finite data (actually I was storing different batches of GPS way points during a driving trip).
Cons: This api style has heavy dependence on backend behavior's schema-less feature.
Pros: At least my approach is fully RESTful from the semantic point of view.
By contrast, #Rob-Hruska 's "Simple, slightly ugly" solution 1 does not have a valid Location header to return in the "201 Created" response, which is not a common RESTful behavior. (Or perhaps, we can let #Rob-Hruska's solution 1 to also return a dummy Location header, which points to a "410 Gone" or "404 Not Found" page. Is this more RESTful? Comments are welcome!)