correct REST approach for hierarchy Inserts and Updates - rest

Lets say I have the following classes:
company
user
address
a company object can contain users and addresses,
a user object contains addresses
company can exist on its own, but users and address is always a part of company or user (a composition in UML wording):
company ->
addresses
users ->
addresses
Now I want use the right REST-structure for POST (insert) and PUT (update) requests. What would be the correct approach?
variant 1:
// insert address:
POST /api/company/{companyId}/addresses
POST /api/company/{companyId}/users/{userId}/addresses
// update address:
PUT /api/company/{companyId}/addresses/{addressId}
PUT /api/company/{companyId}/users/{userId}/addresses/{addressId}
variant 2:
// insert address:
POST /api/address // companyId, userId etc. as parameter
// update address:
PUT /api/address/{addressId}
My personal gut feeling would be to use variant 1 for creation (POST) and variant 2 for updates (PUT), because variant 1 seems "cleaner" for creation but for update variant 2 would be better because it does not require a parentId (company or user)

First, REST is an architectur style not a protocol. This means that there is no right or wrong in how you define your URI. The only requirement Roy Fielding put in place was that each resource has to have a unique resource identifier (URI). The operation you may do on those URIs (verbs) are defined by the underlying hypertext transfer protocol (HTTP).
Second, there are certain best practices like using a sub-resource for resources that are embedded in other resources (especially if they can't exist without the parent resource) as in your case with addresses and users/companies
As you have certain issues with updating a sub-resource:
Usually to update a resource HTTP verb PUT is used which has the limitation of replacing the current state (all the available data) with the state you've sent to the server (in case the update goes well). As there is no partial update yet defined in HTTP some use the PUT verb a bit fuzzy and only update what is available within the request, though issuing a PATCH request and only updating these fields is probably more correct in terms of HTTP specification.
Sub-resources are very similar to regular resources. If you update a sub-resource you don't need to update the parent resource. In your particular case, if you want to update a user's address but not the user itself, you issue a PUT /users/{userId}/addresses/{addressId} HTTP/1.1 request containing the new state of the address to the server. The request body may look like this:
{
"street": "Sample Street 1"
"city": "Sampletown",
"zip": "12345",
"country": "Neverland",
"_links": {
"self": { "href": "/users/{userId}/addresses/{addressId} }
"googleMap": { "href": "https://www.google.com/maps/place/Liberty+Island/#40.6883129,-74.042817,16.4z/data=!4m2!3m1!1s0x0000000000000000:0x005f3b2fa5c0821d" }
}
}
If you want to follow the HTTP PUT verb closely and have dynamic fields, which may appear during runtime, you might have to alter the current table definition eventually, delete the old address entry and insert a new entry with the provided information (depending on the DB layer you are using SQL vs NoSQL). Note the HTTP PUT verb semantic!
On utilizing a fuzzy update strategy (=partial update) a simple update statement should be fine. Here you can simply ignore the additional userId contained within the URI, though this is not (yet) fully HTTP compliant - at least in regards to the spec. In that particular case it is your choice which version of URI you choose, though version 2 only works in the latter case while version 1 works in both cases.

Related

REST API Design: Path variable vs request body on UPDATE (Best practices)

When creating an UPDATE endpoint to change a resource, the ID should be set in the path variable and also in the request body.
Before updating a resource, I check if the resource exists, and if not, I would respond with 404 Not Found.
Now I ask myself which of the two information I should use and if I should check if both values are the same.
For example:
PUT /users/42
// request body
{
"id": 42,
"username": "user42"
}
You should PUT only the properties you can change into the request body and omit read-only properties. So you should check the id in the URI, because it is the only one that should exist in the message.
It is convenient to accept the "id" field in the payload. But you have to be sure it is the same as the path parameter. I solve this problem by setting the id field to the value of the path parameter (be sure to explain that in the Swagger of the API). In pseudo-code :
idParam = request.getPathParam("id");
object = request.getPayload();
object.id = idParam;
So all these calls are equivalent :
PUT /users/42 + {"id":"42", ...}
PUT /users/42 + {"id":"41", ...}
PUT /users/42 + {"id":null, ...}
PUT /users/42 + {...}
Why do you need the id both in URL and in the body? because now you have to validate that they are both the same or ignore one in any case. If it is a requirement for some reason, than pick which one is the one that is definitive and ignore the other one. If you don't have to have this strange duplication, than I'd say pass it in the body only
If you take a closer look at how HTTP works you might notice that the URI used to send a request to is also used as key for caching results. Any non-safe operation performed on that URI, such as POST, PUT, PATCH, will lead to (intermediary) caches automatically invalidating any stored responses for that URI. As such, if you use an other URI than the actual resource URI you are actually bypassing that feature and risk getting served outdated state from caches. As caching is one of the few constraints REST has simply skipping all caching via certain directives isn't ideal in first place.
In regards to including the ID of the resource or domain entity in the URI and/or in the payload: A common mistake in designing so-called REST APIs is that the domain object is mapped in a 1:1 manner onto a resource. We had a customer once who went through a merger and in a result they ended up with the same products being addressed by multiple IDs. In order to reduce the data in their DB they at one point tried to consolidate their data and continue. But they had to support still the old URIs they exposed for their products. In the end they realized that exposing the product ID via the URI wasn't ideal in their situation as it lead to plenty of downstream changes that affected their customers. As such, a recommendation here is to use UUIDs that don't give the target resource any semantic meaning and don't ever change. If the product ID in the back changes it doesn't affect the exposed URI at all. Sure, you might need a further table/collection to map from the product to the actual resource URI but you in the end designed your system with the eventuality of change which it now is more likely to coop with.
I've read so many times that the product ID shouldn't be part of the resource as it is already present in the URI. First, the whole URI is a unique identifier of that resource and not only a part of it. Next as mentioned above, IMO the product ID shouldn't be part of the URI in first place but it should be part of the resources' state. After all, the product ID is part of the products properties and therefore should be included there accordingly. As such, the media type exposed should contain all the necessities that a client is able to identify the product ID off the payload. The media type the resource's state is exchange with should also provide means to include the ID if you want to perform an update. I.e. if you take HTML as example, here you get served a HTML form by the server which basically teaches you where to send the request to, which HTTP operation to use, which media-type to marshal the request with and the actual properties of the resource, including the ones you are not meant to change. HTML does this i.e. via hidden input fields. Other form-based media types, such as HAL forms, JsonForms or Ion, might provide other mechanisms though.
So, to sum my post up:
Don't map the product ID onto URIs. Use a mapping from product ID to UUIDs instead
Use form-based media-types that support clients in creating requests. These media types should allow to include unmodifiable properties, such as hidden input fields and the like

REST design for update/add/delete item from a list of subresources

I would like to know which is the best practice when you are having a resource which contains a list of subresources. For example, you have the resource Author which has info like name, id, birthday and a List books. This list of books exists only in relation with the Author. So, you have the following scenario:
You want to add a new book to the book list
You want to update the name of a book from the list
You want to delete a book from the list
SOLUTION 1
I searched which is the correct design and I found multiple approaches. I want to know if there is a standard way of designing this. I think the design by the book says to have the following methods:
To add: POST /authors/{authorId}/book/
To update: PUT /authors/{authorId}/book/{bookId}
To delete: DELETE /authors/{authorId}/book/{bookId}
SOLUTION 2
My solution is to have only one PUT method which does all these 3 things because the list of books exists only inside object author and you are actually updating the author. Something like:
PUT /authors/{authorId}/updateBookList (and send the whole updated book list inside the author object)
I find multiple errors in my scenario. For example, sending more data from the client, having some logic on the client, more validation on the API and also relying that the client has the latest version of Book List.
My question is: is it anti-pattern to do this?
SITUATION 1. In my situation, my API is using another API, not a database. The used API has just one method of "updateBookList", so I am guessing it is easier to duplicate this behavior inside my API too. Is it also correct?
SITUATION 2. But, supposing my API would use a database would it be more suitable to use SOLUTION 1?
Also, if you could provide some articles, books where you can find similar information. I know this kind of design is not written in stone but some guidelines would help. (Example: from Book REST API Design Rulebook - Masse - O'Reilly)
Solution 2 sounds very much like old-style RPC where a method is invoked that performs some processing. This is like a REST antipattern as REST's focus is on resources and not on methods. The operations you can perform on a resource are given by the underlying protocol (HTTP in your case) and thus REST should adhere to the semantics of the underlying protocol (one of its few constraints).
In addition, REST doesn't care how you set up your URIs, hence there are no RESTful URLs actually. For an automated system a URI following a certain structure has just the same semantics as a randomly generated string acting as a URI. It's us humans who put sense into the string though an application should use the rel attribute which gives the URI some kind of logical name the application can use. An application who expects a certain logical composition of an URL is already tightly coupled to the API and hence violates the principles REST tries to solve, namely the decoupling of clients from server APIs.
If you want to update (sub)resources via PUT in a RESTful way, you have to follow the semantics of put which basically state that the received payload replaces the payload accessible at the given URI before the update.
The PUT method requests that the state of the target resource be
created or replaced with the state defined by the representation
enclosed in the request message payload.
...
The target resource in a POST request is intended to handle the
enclosed representation according to the resource's own semantics,
whereas the enclosed representation in a PUT request is defined as
replacing the state of the target resource. Hence, the intent of PUT
is idempotent and visible to intermediaries, even though the exact
effect is only known by the origin server.
In regards to partial updates RFC 7231 states that partial updates are possible by either using PATCH as suggested by #Alexandru or by issuing a PUT request directly at a sub-resource where the payload replaces the content of the sub-resource with the one in the payload. For the resource containing the sub-resouce this has an affect of a partial update.
Partial content updates are possible by
targeting a separately identified resource with state that overlaps a
portion of the larger resource, or by using a different method that
has been specifically defined for partial updates (for example, the
PATCH method defined in [RFC5789]).
In your case you could therefore send the updated book collection directly via a PUT operation to something like an .../author/{authorId}/books resource which replaces the old collection. As this might not scale well for authors that have written many publications PATCH is probably preferable. Note, however, that PATCH requires an atomic and transactional behavior. Either all actions succeed or none. If an error occurs in the middle of the actions you have to role back all already executed steps.
In regards to your request for further literature, SO isn't the right place to ask this as there is an own off-topic close/flag reason exactly for this.
I'd go with the first option and have separate methods instead of cramming all logic inside a generic PUT. Even if you're relying on an API instead of a database, that's just a 3rd party dependency that you should be able to switch at any point, without having to refactor too much of your code.
That being said, if you're going to allow the update of a large number of books at once, then PATCH might be your friend:
Looking at the RFC 6902 (which defines the Patch standard), from the client's perspective the API could be called like
PATCH /authors/{authorId}/book
[
{ "op": "add", "path": "/ids", "value": [ "24", "27", "35" ]},
{ "op": "remove", "path": "/ids", "value": [ "20", "30" ]}
]
Technically, solution 1 hands down.
REST API URLs consist of resources (and identifiers and filter attribute name/values). It should not contain actions (verbs). Using verbs encourages creation of stupid APIs.
E.g. I know a real-life-in-production API that wants you to
do POST on /getrecords to get all records
do POST on /putrecords to add a new record
Reasons to choose solution 2 would not be technical.
For requirement #2 (You want to update the name of a book from the list), it is possible to use JSON PATCH semantics, but use HTTP PATCH (https://tools.ietf.org/html/rfc5789) semantics to design the URL (not JSON PATCH semantics as suggested by Alexandru Marculescu).
I.e.
Do PATCH on /authors/{authorId}/book/{bookId}, where body contains only PK and changed attributes. Instead of:
To update: PUT on /authors/{authorId}/book/{bookId}
JSON PATCH semantics may of course be used to design the body of a PATCH request, but it just complicates things IMO.

How do you model a RESTful API for a single resource?

I am looking to expose some domain RESTful APIs on top of an existing project. One of the entities I need to model has a single document: settings. Settings are created with the application and is a singleton document. I'd like to expose it via a well-designed resource-based RESTful API.
Normally when modeling an API for a resource with many items its something like:
GET /employees/ <-- returns [] of 1-* items
GET /employees/{id}/ <-- returns 1 item
POST /employees/ <-- creates an item
PUT /employees/{id}/ <-- updates all fields on specific item
PATCH /employees/{id}/ <-- updates a subset of fields specified on an item
DELETE /employees/{id}/ <-- deletes a specific item
OPTION 1: If I modeled settings in the same way then the following API is built:
GET /settings/ <-- returns [] of 1-* items
[{ "id": "06e24c15-f7e6-418e-9077-7e86d14981e3", "property": "value" }]
GET /settings/{id}/ <-- returns 1 item
{ "id": "06e24c15-f7e6-418e-9077-7e86d14981e3", "property": "value" }
PUT /settings/{id}/
PATCH /settings/{id}/
This to me has a few nuances:
We return an array when only 1 item CAN and EVER WILL exist. Settings are a singleton that the application creates.
We require knowing the id to make a request only returning 1 item
We require the id of a singleton just to PUT or PATCH it
OPTION 2: My mind then goes in this direction:
GET /settings/ <-- returns 1 item
{ "id": "06e24c15-f7e6-418e-9077-7e86d14981e3", "property": "value" }
PUT /settings/
PATCH /settings/
This design removes the nuances brought up below and doesn't require an id to PUT or PATCH. This feels the most consistent to me as all requests have the same shape.
OPTION 3: Another option is to add the id back to the PUT and the PATCH to require it to make updates, but then an API user must perform a GET just to obtain the id of a singleton:
GET /settings/ <-- returns 1 item
{ "id": "06e24c15-f7e6-418e-9077-7e86d14981e3", "property": "value" }
PUT /settings/{id}/
PATCH /settings/{id}/
This seems inconsistent because the GET 1 doesn't have the same shape as the UPDATE 1 request. It also doesn't require a consumer to perform a GET to find the identifier of the singleton.
Is there a preferred way to model this?
Does anyone have any good reference material on modeling RESTful APIs for singleton resources? I am currently leaning towards OPTION 2 but I'd like to know if there are good resources or standards that I can look into.
Is there a compelling reason to require an API consumer to make a GET for the id of a resource to then use it in an update request, perhaps for security reasons, etc?
The ID of the Resource is the Url itself and not necessarily a Guid or UUID. The Url should uniquely IDentify the Resource, in your case the Settings entity.
But, in order to be RESTfull, you must point to this resource in your index Url (i.e. the / path) with an appropriate rel attribute, so the client will not hardcode the Url, such as this:
GET /
{ ....
"links": [
{ "url" : "/settings", "rel" : "settings" }
], ...
}
There are no specifics to accesing a singleton resource other than the Url will not contain a Guid, Uuid or any other numeric value.
Option 2 is perfectly RESTful, as far as I can tell.
The core idea behind RESTful APIs is that you're manipulating "resources". The word "resource" is intentionally left vague so that it can refer to whatever is important to the specfic application, and so that the API can focus only on how content will be accessed regardless of what content will be accessed.
If your resource is a singleton, it does not make sense to attribute an ID value to it. IDs are very useful and commonly used in RESTful APIs, but they are not a core part of what makes an API RESTful, and, as you have noticed, would actually make accessing singleton resources more cumbersome.
Therefore, you should just do away with IDs and have both
GET /settings/
and
GET /settings/{id}
always return the settings singleton object. (access-by-id is not required, but it's nice to have just in case someone tries it). Also, be sure to document your API endpoint so consumers don't expect an array :)
Re: your questions,
I believe option 2 would be the preferred way of modeling this, and I believe requiring your consumer to make a GET for the id would actually be somewhat of an anti-pattern.
I think the confusion here is because the word settings is plural, but the resource is a singleton.
Why not rename the resource to /configuration and go with option 2?
It would probably be less surprising to consumers of your API.
You're probably overthinking it. There's no concept of singleton in HTTP or REST.
GET /settings/ is perfectly fine.
By the way, we can hardly relate this to DDD - at least not if you don't give more context about what settings means in your domain.
It might also be that you're trying to tack an "Entity with ID" approach on Settings when it's not appropriate. Not all objects in a system are entities.

Rest API: path for accessing derived data

It is not clear to me that if I have a micro service that is in place to provide some derived data how the rest api should be designed for this. For instance :-
If I have a customer and I want to access the customer I would define the API as:
/customer/1234
this would return everything we know about the customer
however if I want to provide a microservice that simply tells me if the customer was previously known to the system with another account number what do I do. I want this logic to be in the microservice but how do I define the API
customer/1234/previouslyKnow
customerPreviouslyKnown/1234
Both don't seem correct. In the first case it implies
customer/1234
could be used to get all the customer information but the microservice doesn't offer this.
Confused!
Adding some extra details for clarification.
I suppose my issue is, I don't really want a massive service which handles everything customer related. It would be better if there were lighter weight services that handles customer orders, customer info, customer history, customer status (live, lost, dead....).
It strikes me all of these would start with
/customer/XXXX
so would all the services be expected to provide a customer object back if only customer/XXXX was given with no extra in the path such as /orders
Also some of the data as mentioned isn't actually persisted anywhere it is derived and I want the logic of this hidden in a service and not in the calling code. So how is this requested and returned.
Doing microservices doesn't mean to have a separate artifact for each method. The rules of coupling and cohesion also apply to the microservices world. So if you can query several data all related to a customer, the related resources should probably belong to the same service.
So your resource would be /customers/{id}/previous-customer-numbers whereas /customers (plural!) is the list of customers, /customers/{id} is a single customer and /customers/{id}/previous-customer-numbers the list of customer numbers the customer previously had.
Try to think in resources, not operations. So returning the list of previously used customer numbers is better than returning just a boolean value. /customer/{id}/previous-accounts would be even better, I think...
Back to topic: If the value of previous-accounts is directly derived from the same data, i.e. you don't need to query a second database, etc. I would even recommend just adding the value to the customer representation:
{
"id": "1234",
"firstName": "John",
"lastName": "Doe",
"previouslyKnown": true,
"previousAccounts": [
{
"id": "987",
...
}
]
}
Whether the data is stored or derived shouldn't matter so the service client to it should not be visible on the boundary.
Adding another resource or even another service is unnecessary complexity and complexity kills you in the long run.
You mention other examples:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Orders is clearly different from customer data so it should reside in a separate service. An order typically also has an order id which is globally unique. So there is the resource /orders/{orderId}. Retrieving orders by customer id is also possible:
/orders;customer={customerId}
which reads give me the list of orders for which the customer is identified by the given customer id.
These parameters which filter a list-like rest resource are called matrix parameters. You can also use a query parameter: /orders?customer={customerId} This is also quite common but a matrix parameter has the advantage that it clearly belongs to a specific part of the URL. Consider the following:
/orders;customer=1234/notifications
This would return the list of notifications belonging to the orders of the customer with the id 1234.
With a query parameter it would look like this:
/orders/notifications?customer=1234
It is not clear from the URL that the orders are filtered and not the notifications.
The drawback is that framework support for matrix parameters is varying. Some support them, some don't.
I'd like matrix parameters best here but a query parameter is OK, too.
Going back to your list:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Customer info and customer status most likely belong to the same service (customer core data or the like) or even the same resource. Customer history can also go there. I would place it there as long as there isn't a reason to think of it separately. Maybe customer history is such a complicated domain (and it surely can be) that it's worth a separate service: /customer-history/{id} or maybe just /customer/{id}.
It's no problem that different services use the same paths for providing different information about one customer. They are different services and they have different endpoints so there is no collision whatsoever. Ideally you even have a DNS alias pointing to the corresponding service:
https://customer-core-data.service.lan/customers/1234
https://customer-history.service.lan/customers/1234
I'm not sure if I really understand your question. However, let me show how you can check if a certain resource exist in your server.
Consider the server provides a URL that locates a certain resource (in this situation, the URL locates a customer with the identifier 1): http://example.org/api/customers/1.
When a client perform a GET request to this URL, the client can expect the following results (there may be other situation, like authentication/authorization problems, but let's keep it simple):
If a customer with the identifier 1 exists, the client is supposed to receive a response with the status code 200 and a representation of the resource (for example, a JSON or XML representing the customer) in the response payload.
If the customer with the identifier 1 do not exist, the client is supposed to receive a response with the status code 404.
To check whether a resource exists or not, the client doesn't need the resource representation (the JSON or XML that represents the customer). What's relevant here is the status code: 200 when the resource exists and 404 when the resource do not exist. Besides GET requests, the URL that locates a customer (http://example.org/api/customers/1) could also handle HEAD requests. The HEAD method is identical to the GET method, but the server won't send the resource representation in HEAD requests. Hence, it's useful to check whether a resource exists or not.
See more details regarding the HEAD method:
4.3.2. HEAD
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification. [...]
If the difference between resource and resource representation is not clear, please check this answer.
One thing I want to add to the already great answers is: URLS design doesn't really matter that much if you do REST correctly.
One of the important tenets of REST is that urls are discovered. A client that has the customers's information already, and wants to find out what the "previously known" information, should just be able to discover that url on the main customer resource. If it links from there to the "previously known" information, it doesn't matter if the url is on a different domain, path, or even protocol.
So if you application naturally makes more sense if "previouslyKnown" is on a separate base path, then maybe you should just go for that.

REST Partial update with several fields

Lets say I have a rest service that allows applications to create User object
URI: /users
HTTP Method:POST
{
"firstName":"Edward",
"lastName": "Nygma",
"dob": "01011981",
"email": "en#gc.com",
"phone": "0123456789"
}
On the first POST the User object is created and and the User ID is returned
Lets say there is a second service that allows the user to update the lastName and email fields.
URI: /user/1/last-email
HTTP Method: POST
{
"lastName": "scissorhands",
"email": "ec#bc.com"
}
Let's say for the sake of bandwidth sending the full User object is not an option for this update call.
Is this the correct way to do a partial update that involves several fields? Also using PATCH is out of the question.
Edit:
I know that the correct way to do this restfully would be to post an update to each field as a sub resource but lets say for the sake of bandwidth/business requirements that this update has to be done in a single call. Is this the correct way to do so?
Edit 2:
Our implementation does not support the HTTP PATCH method, hence why I indicated in the initial question that using patch is out of the question. That being said, maybe I should rephrase the question.
Since system/business requirements prevent us from implementing this correctly in a RESTful manor. What is the best way to handle this scenario.
Putting the verb "update-" suddenly makes it feel like it's an RPC call. What do you do when you want to delete this email? Doing a DELETE action on this URI looks a bit silly.
URI: /user/1/email and
URI: user/1/lastname would make more sense as EMAIL would just be a subresource and you can use all the verbs on those resources.
Yes, this would take 2 calls in case you want to update 2 resources instead of 1.
For partial updates of a resource use "PATCH" verb on the resource. This way you would not need a new URI at all. (Best practice for partial updates in a RESTful service)
Reference: http://restcookbook.com/HTTP%20Methods/patch/
Quote:
When should we use the PATCH HTTP method?
The HTTP methods PATCH can be used to update partial resources. For instance, when you only need to update one field of the resource, PUTting a complete resource representation might be cumbersome and utilizes more bandwidth
See more at: http://restcookbook.com/HTTP%20Methods/patch/#sthash.jEWLklrg.dpuf