Embedding collections (subdocument arrays) with MongoDB violates REST? - mongodb

Let's say I have a users collection in my Mongo database:
users
_id
emailAddress
firstName
lastName
passwordHash
accessLogs: [ ... ]
createdAt
As you can see, a user document can contain an array of accessLogs. Great.
But let's say I want to update a user record and do a PUT /users/:id request against my RESTful API that uses this database. With a PUT, you're supposed to get back what you put in. So let's say the user has logged in 500 times. In order to avoid violating REST, does this mean my PUT data should include the accessLogs array and all its items?
I suppose the request handler could just update everything except accessLogs.

In the strictest definition PUT should indeed replace the contents of the object. If you want to update an existing object with partial data/instructions instead, you should use the PATCH method. This would allow you to specify that you want to add accessLogs (or otherwise leave them unmodified) and not have to send the entire object -- just instructions for what needs to be updated.

In your case the accessLogs is a generated read-only property and so it does not have to be part of your PUT request. You don't send the _id property as well and if some of the properties have default values, you don't have to send those either.
The PUT method requests that the enclosed entity be stored under the
supplied Request-URI. If the Request-URI refers to an already existing
resource, the enclosed entity SHOULD be considered as a modified
version of the one residing on the origin server.
Hypertext Transfer Protocol - HTTP/1.1
The difference between the PUT and PATCH requests is reflected in the
way the server processes the enclosed entity to modify the resource
identified by the Request-URI. In a PUT request, the enclosed entity
is considered to be a modified version of the resource stored on the
origin server, and the client is requesting that the stored version be
replaced. With PATCH, however, the enclosed entity contains a set of
instructions describing how a resource currently residing on the
origin server should be modified to produce a new version. The PATCH
method affects the resource identified by the Request-URI, and it also
MAY have side effects on other resources; i.e., new resources may be
created, or existing ones modified, by the application of a PATCH.
PATCH Method for HTTP
You can find a similar question here.

Related

REST API Design: Path variable vs request body on UPDATE (Best practices)

When creating an UPDATE endpoint to change a resource, the ID should be set in the path variable and also in the request body.
Before updating a resource, I check if the resource exists, and if not, I would respond with 404 Not Found.
Now I ask myself which of the two information I should use and if I should check if both values are the same.
For example:
PUT /users/42
// request body
{
"id": 42,
"username": "user42"
}
You should PUT only the properties you can change into the request body and omit read-only properties. So you should check the id in the URI, because it is the only one that should exist in the message.
It is convenient to accept the "id" field in the payload. But you have to be sure it is the same as the path parameter. I solve this problem by setting the id field to the value of the path parameter (be sure to explain that in the Swagger of the API). In pseudo-code :
idParam = request.getPathParam("id");
object = request.getPayload();
object.id = idParam;
So all these calls are equivalent :
PUT /users/42 + {"id":"42", ...}
PUT /users/42 + {"id":"41", ...}
PUT /users/42 + {"id":null, ...}
PUT /users/42 + {...}
Why do you need the id both in URL and in the body? because now you have to validate that they are both the same or ignore one in any case. If it is a requirement for some reason, than pick which one is the one that is definitive and ignore the other one. If you don't have to have this strange duplication, than I'd say pass it in the body only
If you take a closer look at how HTTP works you might notice that the URI used to send a request to is also used as key for caching results. Any non-safe operation performed on that URI, such as POST, PUT, PATCH, will lead to (intermediary) caches automatically invalidating any stored responses for that URI. As such, if you use an other URI than the actual resource URI you are actually bypassing that feature and risk getting served outdated state from caches. As caching is one of the few constraints REST has simply skipping all caching via certain directives isn't ideal in first place.
In regards to including the ID of the resource or domain entity in the URI and/or in the payload: A common mistake in designing so-called REST APIs is that the domain object is mapped in a 1:1 manner onto a resource. We had a customer once who went through a merger and in a result they ended up with the same products being addressed by multiple IDs. In order to reduce the data in their DB they at one point tried to consolidate their data and continue. But they had to support still the old URIs they exposed for their products. In the end they realized that exposing the product ID via the URI wasn't ideal in their situation as it lead to plenty of downstream changes that affected their customers. As such, a recommendation here is to use UUIDs that don't give the target resource any semantic meaning and don't ever change. If the product ID in the back changes it doesn't affect the exposed URI at all. Sure, you might need a further table/collection to map from the product to the actual resource URI but you in the end designed your system with the eventuality of change which it now is more likely to coop with.
I've read so many times that the product ID shouldn't be part of the resource as it is already present in the URI. First, the whole URI is a unique identifier of that resource and not only a part of it. Next as mentioned above, IMO the product ID shouldn't be part of the URI in first place but it should be part of the resources' state. After all, the product ID is part of the products properties and therefore should be included there accordingly. As such, the media type exposed should contain all the necessities that a client is able to identify the product ID off the payload. The media type the resource's state is exchange with should also provide means to include the ID if you want to perform an update. I.e. if you take HTML as example, here you get served a HTML form by the server which basically teaches you where to send the request to, which HTTP operation to use, which media-type to marshal the request with and the actual properties of the resource, including the ones you are not meant to change. HTML does this i.e. via hidden input fields. Other form-based media types, such as HAL forms, JsonForms or Ion, might provide other mechanisms though.
So, to sum my post up:
Don't map the product ID onto URIs. Use a mapping from product ID to UUIDs instead
Use form-based media-types that support clients in creating requests. These media types should allow to include unmodifiable properties, such as hidden input fields and the like

PUT on REST collection. Do I provide addresses?

When you do a PUT on a REST collection should you provide the addresses of the members in the collection?
PUT on dogs with address specified
[{"name":"sparky", "id":1}, {"name":"rusty", "id":2}]
or...
PUT on dogs without address specified
[{"name":"sparky"}, {"name":"rusty"}]
and let the server return the locations of the new members in the collection. In my case the row ids in my table.
If you are sending a PUT to replace every resource in the entire collection, you should send the full new / updated entities.
If the id is part of the entity, you should send it within the entity, but for the client that has nothing to do with the URI (a.k.a the "address") of the element, as clients have now knowledge how a URI is build by the server.
If the id is not part of the entity for the client, you should not send it within the entity.
But both cases are possible and your server could accept both, important is just that the the PUT on collection behaves as in the HTTP/1.1 specs described:
The PUT method requests that the enclosed entity be stored under the supplied Request-URI.
So if you PUT on /api/animals/dogs/ the dogs just be found under /dogs/ and not e.g. /cats/
If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server
That means that a PUT on any collection, delectes the whole collection and creates a new one with the entities given in the body of the method call.
If the Request-URI does not point to an existing resource [...] the origin server can create the resource with that URI.
That means if /animals/dogs/ is empty you should create a new collection there.

REST API Design - Resource relationships and Idempotency of a PUT request: What, exactly, is meant by a full resource representation?

I understand that for partial updates, an action must be taken that is not idempotent. To that end, a valid approach is to make a POST request to that resource.
I have a question though about related resources. For example imagine the following resources with their properties:
Accounts
Id
Name
Account #
Users (a collection)
Users
Id
Name
Now imagine I want to make a partial update to an Account - for example, to change the Account's name.
I could make the following request as a valid partial update:
POST /account/id/123
{
"name" : "My New Name"
}
My question is regarding a full PUT request which must be idempotent and must include a full representation of the resource.
Could I do the following as a valid idempotent request?
PUT /account/id/123
{
"name" : "My New Name",
"accountNumber" : "654-345-4323"
}
Is that considered a valid, idempotent action? I've included all the top level "Account" information, but I question it because I didn't post all the USERS that belong to the account as well.
In order to be a valid idempotent request, would I need to include all of it's sub resources as well in the PUT request?
An easier way to understand is to consider that the PUT method ignores the current state of the target resource, so a "full resource representation" means that it must have all data needed to replace the existent resource with a new one.
In your example, that could be a valid full representation for an account with no users.
It's fine for the server to assume default values when something is missing, but that should be documented properly as some users might confuse with a partial update.
If you want to design the PUT request as the full resource replacement, then this means that you need also to assign values to all the assignable (editable) properties of a resource, including the relations (links) of a resource. Otherwise, the properties which are not set are considered as being set to null.
For partial requests, you can use PATCH HTTP method. There is also a convention of PUT if your resource representation is simple enough to allow that, that you can use partial updates.
PATCH vs. PUT
Quoting:
PATCH vs. PUT
The HTTP RFC specifies that PUT must take a full new resource
representation as the request entity. This means that if for example
only certain attributes are provided, those should be remove (i.e. set
to null).
An additional method called PATCH has been proposed recently. The
semantics of this call are like PUT inthat it updates a resource, but
unlike PUT, it applies a delta rather than replacing the entire
resource. At the time of writing, PATCH was still a proposed standard
waiting final approval.
For simple resource representations, the difference is often not
important, and many APIs simply implement PUT as a synonym for PATCH.
This usually doesn’t give any problems because it is not very common
that you need to set an attribute to null, and if you need to, you can
always explicitly include it.
However for more complex representations, especially including lists,
it becomes very important to be able to express accurately the changes
you want to make. Therefore, it is my recommendation now to both
provide PATCH and PUT, and make PATCH do an relative update and have
PUT replace the entire resource.
It is important to realize that the request entity to PATCH is of a
different content-type that the entity that it is modifying. Instead
of being a full resource, it is a resource that describes
modifications to be made to a resource. For a JSON data model, which
is what this essay is advocating, I believe that there are two
sensible ways to define the patch format.
An informal approach where you accept a dict with a partial
representation of the object. Only attributes that are present are
updated. Attributes that are not present are left alone. This approach
is simple, but it has the drawback that if the resource has a complex
internal structure e.g. containing a big list of dicts, then that
entire list of dicts need to be given in the entity. Effectively PATCH
becomes similar to PUT again.
A more formal approach would be to
accept a list of modifications. Each modification can be a dict
specifying the JSON path of the node to modify, the modification
(‘add’, ‘remove’, ‘change’) and the new value.

REST - How would a PUT request handle auto-incremented resource identifiers

According to the HTTP 1.1. spec:
If the Request-URI does not point to an existing resource, and that
URI is capable of being defined as a new resource by the requesting
user agent, the origin server can create the resource with that URI.
So in other words, PUT can be used to create & update. More specifically, if I do a PUT request e.g.
PUT /users/1
and that user does not exist, I would expect the result of this request to create a user with this ID. However, how would this work if your backend is using an auto-increment key? Would it be a case of simply ignoring it if it's not feasible (e.g. auto-increment is at 6 and I request 10) & creating if it is possible (e.g. request 7)?
From the snippet I have extracted above it does appear to give you this flexibility, just looking for some clarification.
I'd suggest that you use POST, not PUT, for an auto-increment key, or do not use the auto-increment key in the resource ID.
If you use POST, then you'd POST to /users rather than to /users/1. The reply might redirect you to /users/1 or whatever the ID is.
If you use PUT, then you might PUT to /users/10292829 where the number is a unique resource key generated on the client. This key can be time-generated, or it can be a hash of time, session ID, and some other factors to guarantee uniqueness of the value across your client audience. The server can then generate its own auto-incremented index, distinct from 10292829 or whatever.
For more on that, see PUT vs POST in REST
Following up. . .
In the case of allowing PUT to /users/XXXXXXX, for all users, you'd end up with two distinct unique keys that refer to the same resource. (10292829 and 1 might refer to the same user). You'd need to decide how to allow the use of each of these different keys in a REST-style URL. Because of the need to reconcile the use of these two distinct ids, I'd prefer to use the first option, POSTing to /users and getting a unique REST url of the created resource in the response.
I just re-read the relevant section of RFC 2616, and saw a return code specifically designed for this in REST applications:
10.2.2 201 Created
The request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field. The response SHOULD include an entity containing a list of resource characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.
So, the RESTful way to go is to POST to /users and return a 201 Created, with a Location: header specifying /users/1.
You should be using POST to create resources while the PUT should only be used for updating. Actually REST semantics forces you to do so.

RESTful idempotence

I'm designing a RESTful web service utilizing ROA(Resource oriented architecture).
I'm trying to work out an efficient way to guarantee idempotence for PUT requests that create new resources in cases that the server designates the resource key.
From my understanding, the traditional approach is to create a type of transaction resource such as /CREATE_PERSON. The the client-server interaction for creating a new person resource would be in two parts:
Step 1: Get unique transaction id for creating the new PERSON resource:::
**Client request:**
POST /CREATE_PERSON
**Server response:**
200 OK
transaction-id:"as8yfasiob"
Step 2: Create the new person resource in a request guaranteed to be unique by using the transaction id:::
**Client request**
PUT /CREATE_PERSON/{transaction_id}
first_name="Big bubba"
**Server response**
201 Created // (If the request is a duplicate, it would send this
PersonKey="398u4nsdf" // same response without creating a new resource. It
// would perhaps send an error response if the was used
// on a transaction id non-duplicate request, but I have
// control over the client, so I can guarantee that this
// won't happen)
The problem that I see with this approach is that it requires sending two requests to the server in order to do to single operation of creating a new PERSON resource. This creates a performance issues increasing the chance that the user will be waiting around for the client to complete their request.
I've been trying to hash out ideas for eliminating the first step such as pre-sending transaction-id's with each request, but most of my ideas have other issues or involve sacrificing the statelessness of the application.
Is there a way to do this?
Edit::::::
The solution that we ended up going with was for the client to acquire a UUID and send it along with the request. A UUID is a very large number occupying the space of 16 bytes (2^128). Contrary to what someone with a programming mind might intuitively think, it is accepted practice to randomly generate a UUID and assume that it is a unique value. This is because the number of possible values is so large that the odds of generating two of the same number randomly are low enough to be virtually impossible.
One caveat is that we are having our clients request a UUID from the server (GET uuid/). This is because we cannot guarantee the environment that our client is running in. If there was a problem such as with seeding the random number generator on the client, then there very well could be a UUID collision.
You are using the wrong HTTP verb for your create operation. RFC 2616 specifies the semantic of the operations for POST and PUT.
Paragraph 9.5:
POST method is used to request
that the origin server accept the
entity enclosed in the request as
a new subordinate of the resource
identified by the Request-URI in the Request-Line
Paragraph 9.6
PUT method requests that the
enclosed entity be stored under the
supplied Request-URI.
There are subtle details of that behavior, for example PUT can be used to create new resource at the specified URL, if one does not already exist. However, POST should never put the new entity at the request URL and PUT should always put any new entity at the request URL. This relationship to the request URL defines POST as CREATE and PUT as UPDATE.
As per that semantic, if you want to use PUT to create a new person, it should be created in /CREATE_PERSON/{transaction_id}. In other words, the transaction ID returned by your first request should be the person key used to fetch that record later. You shouldn't make PUT request to a URL that is not going to be the final location of that record.
Better yet, though, you can do this as an atomic operation by using a POST to /CREATE_PERSON. This allows you with a single request to create the new person record and in the response to get the new ID (which should also be referred in the HTTP Location header as well).
Meanwhile, the REST guidelines specify that verbs should not be part of the resource URL. Thus, the URL to create new person should be the same as the location to get the list of all persons - /PERSONS (I prefer the plural form :-)).
Thus, your REST API becomes:
to get all persons - GET /PERSONS
to get single person - GET /PERSONS/{id}
to create new person - POST /PERSONS with the body containing the data for the new record
to update existing person or create new person with well-known id - PUT /PERSONS/{id} with the body containing the data for the updated record.
to delete existing person - DELETE /PERSONS/{id}
Note: I personally prefer not using PUT for creating records for two reasons, unless I need to create a sub record that has the same id as an already existing record from a different data set (also known as 'the poor man's foreign key' :-)).
Update: You are right that POST is not idempotent and that is as per HTTP spec. POST will always return a new resource. In your example above that new resource will be the transaction context.
However, my point is that you want the PUT to be used to create a new resource (a person record) and according to the HTTP spec, that new resource itself should be located at the URL. In particular, where your approach breaks is that the URL you use with the PUT is a representation of the transactional context that was created by the POST, not a representation of the new resource itself. In other words, the person record is a side effect of updating the transaction record, not the immediate result of it (the updated transaction record).
Of course, with this approach the PUT request will be idempotent, since once the person record is created and the transaction is 'finalized', subsequent PUT requests will do nothing. But now you have a different problem - to actually update that person record, you will need to make a PUT request to a different URL - one that represents the person record, not the transaction in which it was created. So now you have two separate URLs your API clients have to know and make requests against to manipulate the same resource.
Or you could have a complete representation of the last resource state copied in the transaction record as well and have person record updates go through the transaction URL for updates as well. But at this point, the transaction URL is for intends and purposes the person record, which means it was created by the POST request in first place.
I just came across this post:
Simple proof that GUID is not unique
Although the question is universally ridiculed, some of the answers go into deeper explanation of GUIDs. It seems that a GUID is a number of 2^128 in size and that the odds of randomly generating two of the same numbers of this size so low as to be impossible for all practical purposes.
Perhaps the client could just generate its own transaction id the size of a GUID instead of querying the server for one. If anyone can discredit this, please let me know.
I'm not sure I have a direct answer to your question, but I see a few issues that may lead to answers.
Your first operation is a GET, but it is not a safe operation as it is "creating" a new transaction Id. I would suggest POST is a more appropriate verb to use.
You mention that you are concerned about performance issues that would be perceived by the user caused by two round trips. Is this because your user is going to create 500 objects at once, or because you are on a network with massive latency problems?
If two round trips are not a reasonable expense for creating an object in response to a user request, then I would suggest HTTP is not the right protocol for your scenario. If however, your user needs to create large amounts of objects at once, then we can probably find a better way of exposing resources to enable that.
Why don't you just use a simple POST, also including the payload on your first call. This way you save on extra call and don't have to spawn a transaction:
POST /persons
first_name=foo
response would be:
HTTP 201 CREATED
...
payload_containing_data_and_auto_generated_id
server-internally an id would be generated. for simplicity i would go for an artifial primary key (e.g. auto-increment id from database).