What is the correct REST endpoint for adding an item to an array field? - mongodb

Say I'm trying to model the action of adding a Student to a Group in a RESTful API written in Go with MongoDB.
A Group is modeled like this:
type Group struct {
Section mgo.DBRef
Instructor mgo.DBRef
Students []mgo.DBRef
}
An additional constraint is that the API is implementing HAL+JSON protocol, where resources are represented as links.
I've seen a couple of options (below):
POST /groups/{groupID}/students/{studentID} will add student with studentID to the group. The problem with this approach is that since I'm implementing the HAL+JSON protocol, I don't want the client to have manually pull out the ID and generate this link. All resources will be represented, i.e. /person/123 could be a Student.
PUT /groups/{groupID} while sending the complete array of Students that should belong to the group. This seems like it will introduce a lot of complicated parsing logic.
If there are other options I'd be open to it too.
EDIT: The approach that I'm going with is the following:
* POST /groupmembership/ by sending a JSON with the ID of the student and the ID of the group to add the student to. However, on the backend, I'm not generating a new model, but instead taking the object and programmatically adding the specified student to the specified group.
The question then is how would I remove the Student from the Group? Can I similar send a DELETE request to /groupmembership with
{
"student": 123,
"group": 456
}
to remove student 123 from group 456?

where resources are represented as links
This is not true. Links are possibly operations calls, so they are representing possible resource state transitions.
To add something to a collection, you need a collection resource and you have to decide what you want to store in that collection. In your case this can be 2 things: group-student memberships or students. If this is an 1:n relation, then you can store students and remove students. If this is an n:m relation then you have to store memberships and remove memberships, since you don't want to remove the students from your storage, just the memberships.
You can identify the memberships 2 ways:
you can use the ids of the participants: /groups/1/memberships/student:1 or /students/1/memberships/group:1
you can add a unique id to each membership: /memberships/1234
notes:
The URI structure matters only from a human perspective. The REST client will check the link relations and not the URI structure.
The resources are different from the entities in your database. Only by simple CRUD application represent them the same thing. So REST has nothing to do with your database structure.

First of all, there's no correct REST endpoint. URL semantics are irrelevant to REST. All that matters is that URLs are obtained from hypertext and not from out-of-band information, and seems like you got that part right, since you're using HAL. So, the correct REST endpoint is whatever link your server gives to the clients in order to add the item.
As long as an option isn't incorrect from an HTTP standpoint, I'd say to stick with whatever is more consistent with the REST of your API.
The option to POST /groups/{groupID}/students/{studentID} in order to create a new student in that location is incorrect, since a POST is submitting the payload to be processed by the targeted resource, and in this case it doesn't exist yet. A common pattern is to use POST /groups/{groupID}/students, where the collection acts as a facory for new elements, with the creation parameters in the payload, and returning the created student URL in the Location header, with 201 HTTP status code.

Related

Restful URI design

Let's say that the domain structure of anapplication is as follows:
There is domain object called Department.
There is a domain object called Student.
There is a domain object called Paper.
The relationship between Student and Department is many-to-many.
A student can publish (create) a Paper for himself or for a
particular Department.
A student can view all the papers published by him for
himself and for departments to which he belongs (the latter includes
papers published by other students belonging to the same department
as the given student)
Here is what I think the restful uri designs should be like
Student creates (POST) a white paper for himself :
/students/{studentid}/papers
Student creates (POST) a white
paper for a particular department
/students/{studentid}/departments/{departmentid}/papers
Get all student papers published by him for himself
/students/{studentid}/papers/self
Get all student papers published by him for himself including the papers
of the departments to which he belongs
/students/{studentid}/papers
Similar get requests for point number 1 and 2.
The other way to arrive at the above end points would be something like (considering only points 1 and 2) :
/students/{studentid}/papers
and then pass departmentid in the request body. The application would the check for the presence of departmentId in the request. If it's not null then it will assume that this paper is being published for the given departmentid, otherwise for the student himself.
Which one of the above would be a better approach?
This link could help you to design your RESTful service: https://templth.wordpress.com/2014/12/15/designing-a-web-api/.
In addition, here are my comments regarding your URLs:
Everything that identifies your resource should be within the resource path (for example departmentid)
Regarding relations, we need to identify which URLs will handle references. For example, /students/{studentid}/departments/{departmentid}/papers will allow to attach an existing paper to a department or create a new one and in addition attach it to the department
I don't understand this url: /students/{studentid}/papers/self especially the token self. Does self refer to the current authenticated user? If so, I think that should use a query parameter since it doesn't really correspond to a resource... In fact, you rather use query parameters for list filtering
Hope it helps you,
Thierry
Since departmentid is part of how a resources is identified, it must be part of the URL. Putting it into the request body is a violation of REST principles.

Dynamic representation of a REST resource

Lets assume I have an object that I expose as a REST resource in my application. This object has many fields and contains many other objects including associated collections. Something like this, but think MUCH bigger:
Customer
List<Order> orders
List<Address> shippingAddresses;
// other fields for name, etc.
Order
List<Product> products
// fields for total, tax, shipping, etc.
Product
// fields for name, UPC, description, etc.
I expose the customer in my api as /customer/{id}
Some of my clients will want all of the details for every product in each order. If I follow HATEOAS I could supply a link to get the product details. That would lead to n+1 calls to the service to populate the products within the orders for the customer. On the other hand, if I always populate it then many clients receive a bunch of information they don't need and I do a ton of database lookups that aren't needful.
How do I allow for a customer representation of my resource based on the needs of the client?
I see a few options.
Use Jackson's JsonView annotation to specify in advance what is used. The caller asks for a view appropriate to them. i.e. /customer/{id}?view=withProducts. This would require me to specify all available views at compile time and would not be all that flexible.
Allow the caller to ask for certain fields to be populated in the request, i.e. /customer/{id}?fields=orders,firstName,lastName. This would require me to have some handler that could parse the fields parameter and probably use reflection to populate stuff. Sounds super messy to me. The what do you do about sub-resources. Could I do fields=orders.products.upc and join into the collection that way? Sounds like I'm trying to write hibernate on top of REST or something.
Follow HATEOAS and require the client to make a million HTTP calls in order to populate what they need. This would work great for those that don't want to populate the item most of the time, but gets expensive for someone that is attempting to show a summary of order details or something like that.
Have separate resources for each view...
Other?
I would do something like this:
/customers/{id}/orders/?include=entities
Which is a kind of a more specific variation of your option 1.
You would also have the following options:
Specific order from a specific customer without list of products:
/customers/{id}/orders/{id}
Just the orders of a customer without products:
/customers/{id}/orders/
I tend to avoid singular resources, because most of the time or eventually someone always wants a list of things.
Option 2 (client specifies fields) is a filtering approach, and acts more like a query interface than a GETable resource. Your filter could be more expressive if you accept a partial template in a POST request that your service will populate. But that's complicated.
I'm willing to bet all you need is 2 simple representations of any complex entity. That should handle 99.9% of the cases in your domain. Given that, make a few more URIs, one for each "view" of things.
To handle the 0.1% case (for example, when you need the Products collection fully populated), provide query interfaces for the nested entities that allow you to filter. You can even provide hypermedia links to retrieve these collections as part of the simplified representations above.

REST API Design: Nested Collection vs. New Root

This question is about optimal REST API design and a problem I'm facing to choose between nested resources and root level collections.
To demonstrate the concept, suppose I have collections City, Business, and Employees. A typical API may be constructed as follows. Imagine that ABC, X7N and WWW are keys, e.g. guids:
GET Api/City/ABC/Businesses (returns all Businesses in City ABC)
GET Api/City/ABC/Businesses/X7N (returns business X7N)
GET Api/City/ABC/Businesses/X7N/Employees (returns all employees at business X7N)
PUT Api/City/ABC/Businesses/X7N/Employees/WWW (updates employee WWW)
This appears clean because it follows the original domain structure - business are in a city, and employees are at a business. Individual items are accessible via key under the collection (e.g. ../Businesses returns all businesses, while ../Businesses/X7N returns the individual business).
Here is what the API consumer needs to be able to do:
Get businesses in a city (GET Api/City/ABC/Businesses)
Get all employees at a business (GET Api/City/ABC/Businesses/X7N/Employees)
Update individual employee information (PUT Api/City/ABC/Businesses/X7N/Employees/WWW)
That second and third call, while appearing to be in the right place, use a lot of parameters that are actually unnecessary.
To get employees at a business, the only parameter needed is the key of the business (X7N).
To update an individual employee, the only parameter needed it the key of the employee (WWW)
Nothing in the backend code requires non-key information to look up the business or update the employee. So, instead, the following endpoints appear better:
GET Api/City/ABC/Businesses (returns all Businesses in City ABC)
GET Api/Businesses/X7N (returns business X7N)
GET Api/Businesses/X7N/Employees (returns all employees at business X7N)
PUT Api/Employees/WWW (updates employee WWW)
As you can see, I've created a new root for businesses and employees, even though from a domain perspective they are a sub/sub-sub-collection.
Neither solution appears very clean to me.
The first example asks for unnecessary information, but is structured in a way that appears "natural" to the consumer (individual items from a collection are retrieved via lower leafs)
The second example only asks for necessary information, but isn't structured in a "natural" way - subcollections are accessible via roots
The individual employee root would not work when adding a new employee, as we need to know which business to add the employee to, which means that call would at least have to reside under the Business root, such as POST Api/Businesses/X7N7/Employees, which makes everything even more confusing.
Is there a cleaner, third way that I'm not thinking of?
I don't see how REST adds a constraint that two resources could not have the same value. The resourceType/ID is just an example of the easiest use case rather than the best way to go from a RESTful point of view.
If you read paragraph 5.2.1.1 of Roy Fielding's dissertation carefully, you will notice that Fielding makes the disctinction between a value and a resource. Now a resource should have a unique URI, that's true. But nothing prevents two resources from having the same value:
For example, the "authors' preferred version" of an academic paper is a mapping whose value changes over time, whereas a mapping to "the paper published in the proceedings of conference X" is static. These are two distinct resources, even if they both map to the same value at some point in time. The distinction is necessary so that both resources can be identified and referenced independently. A similar example from software engineering is the separate identification of a version-controlled source code file when referring to the "latest revision", "revision number 1.2.7", or "revision included with the Orange release."
So nothing prevents you from, as you say, changing the root. In your example, a Business is a value not a resource. It is perfectly RESTful to create a resource which is a list of "every business located in a city" (just like Roy's example, "revisions included with the Orange release"), while having a "business which ID is x" resource as well (like "revision number x").
For Employees, I would keep API/Businesses/X7N/Employees as the relation between a business and its employees is a composition relationship, and thus as you say, Employees can and should only be accessed through the Businesses class root. But this is not a REST requirement, and the other alternative is perfectly RESTful as well.
Note that this goes in pair with the application of the HATEAOS principle. In your API, the list of Businesses located in a city could (and perhaps should from a theoretical point of view) be just a list of links to the API/Businesses. But this would mean that the clients would have to do one round-trip to the server for each of the items in the list. This is not efficient and, to stay pragmatic, what I do is embed the representation of the business in the list along with the self link to the URI that would be in this example API/Businesses.
You should not confuse REST with the application of a specific URI naming convention.
HOW the resources are named is entirely secondary. You are trying to use HTTP resource naming conventions - this has nothing to do with REST. Roy Fielding himself states so repeatedly in the documents quoted above by others. REST is not a protocol, it is an architectural style.
In fact, Roy Fielding states in his 2008 blog comment (http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven 6/20/2012):
"A REST API must not define fixed resource names or hierarchies (an obvious coupling of
client and server). Servers must have the freedom to control their own namespace. Instead,
allow servers to instruct clients on how to construct appropriate URIs, such as is done in
HTML forms and URI templates, by defining those instructions within media types and link relations."
So in essence:
The problem you describe is not actually a problem of REST - conceptually, it is a problem of HIERARCHY STRUCTURES versus RELATIONAL STRUCTURES.
While a business is "in" a city and so can be considered to be part of the city "hierarchy" - what about international companies which have offices in 75 cities. Then the city suddenly becomes the junior element in a hierarchy with the business name at the senior level of the structure.
The point is, you can view data from various angles, and depending on the viewpoint you take, it may be simplest to see it as a hierarchy. But the same data can be seen as a hierarchy with different levels. When you are using HTTP type resource names, then you have entered a hierarchy structure defined by HTTP. This is a constraint, yes, but it's not a REST constraint, it's a HTTP constraint.
From that angle, you can chose the solution which fits better to your scenario. If your customer cannot supply the city name when he supplies the company name (he may not know), then it would be better to have the key with only city name. As I said, it's up to you, and REST won't stand in your way ...
More to the point:
The only real REST constraints you have, if you have already decided to use HTTP with GET
PUT and so on, are:
Thou shalt not presumeth any prior ("out of band") knowledge between client and servers. *
Look at your proposal #1 above in that light. You assume that customers know the keys for the cities which are contained in your system? Wrong - that's not restful. So the server has to give the list of cities as a list of choices in some way. So are you going to list every city in the world here?
I guess not, but then you'll have to do some work on how you are planning to do this, which brings us to:
A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state ...
I think, reading the mentioned Roy Fielding blog will help you out considerably.
In a RESTful-API URL design should be quite unimportant - or at least a side issue since the discoverability is encoded in the hypertext and not in the URL path. Have a look at the resources linked in the REST tag wiki here on StackOverflow.
But if you want to design human readable URLs for your UC, I would suggest the following:
Use the resource type you are creating/updating/querying as the first part of the URL (after your API prefix). So when somebody sees the URL he immediately knows to which resources this URL points. GET /Api/Employees... is the only only way to receive Employee resources from the API.
Use Unique IDs for each resource independent of the relations they are in. So GET /Api/<CollectionType>/UniqueKey should return a valid resource representation. Nobody should have to worry where the Employee is located. (But the returned Employee should have the links to the Business (and for convenience sake City) he belongs to.) GET /Api/Employees/Z6W returns the Employee with this ID no matter where is is located.
If you want to get a specific resource: Put your query parameter at the end (instead in the hierarchical order described in the question). You can use the URL query string (GET /Api/Employees?City=X7N) or a matrix parameter expression (GET /Api/Employees;City=X7N;Business=A4X,A5Y). This will allow you to easily express a collection of all Employees in a specific City - independent of the Business they are in.
Side node:
In my experience an initial hierarchical domain data model seldom survives additional requirements that come up during a project. In your case: Consider a business located in two Cities. You could create a workaround by modelling it as two separate businesses but what about the employee who works half his time in one place and the other half at the other location? Or even worse: It's only clear for which business he works but it's undefined, in which city?
The third way that I see is to make Businesses and Employees root resources and use query parameters to filter collections:
GET Api/Businesses?city=ABC (returns all Businesses in City ABC)
GET Api/Businesses/X7N (returns business X7N)
GET Api/Employees?businesses=X7N (returns all employees at business X7N)
PUT Api/Employees/WWW (updates employee WWW)
Your both solutions use concept of REST sub-resources which requires that subresource is included in parent resource so:
GET Api/City/ABC/Businesses
in response should also return data provided by:
GET Api/City/ABC/Businesses/X7N
GET Api/City/ABC/Businesses/X7N/Employees
similar for:
GET Api/Businesses/X7N
which should return data provided by:
GET Api/Businesses/X7N/Employees
It will make size of the response huge and time required to generate will increase.
To make REST API clean each resource should have only one bounded URI which fallow below patterns:
GET /resources
GET /resources/{id}
POST /resources
PUT /resources/{id}
If you need to make links between resources use HATEOAS
Go with example 1. I wouldn't worry about unnecessary information from the point of view of the server. A URL should clearly identify a resource in a unique fashion from the point of view of the client. If the client would not know what /Employee/12 means without first knowing that it is actually /Businesses/X7N/Employees/12 then the first URL seems redundant.
The client should be dealing with URLs rather than the individual parameters that make up the URLs, so there is nothing wrong with long URLs. To the client they are just strings. The server should be telling the client the URL to do what it needs to do, not the individual parameters that then require the client to construct the URL.

Informative vs unique generated ID in REST API

Designing a RESTful API. I have two ways of identifying resources (person data). Either by the unique ID generated by the database, or by a social security number (SSN), entered for each person. The SSN is supposedly unique, though can be changed.
Using the ID would be most convenient for me, since it is guaranteed to be unique, and does not change. Hence the URL for the resource, also always stays the same:
GET /persons/12
{
"name": Morgan
"ssn": "840212-3312"
}
The argument for using SSN, is that it is more informative and understandable by API clients. SSN is also used more in surrounding systems:
GET /persons/840212-3321
{
"name": Morgan
"id": "12"
}
So the question is: Should I go with the first approach, and avoid some implementation headaches where the SSN may change. And maybe provide some helper method that converts from SSN to ID?
Or go with the second approach. Providing a more informative API. Though having to deal with some not so RESTful strangeness where URL:s might change due to SSN changes?
URL design is a personal choice. But to give you some more examples which differ from those Ray has already provided, I will give you some of my own.
I have a user account resource and allow access via both URIs:
/users/12
and
/users/morgan
where the numerical value is an auto_incremented ID, and the alphabetic value is a unique username on the system specified by the user. these resources are uncachable so I do not bother about canonicalisation, however the /users page links to the alphabetic forms.
No other resources on my system have two unique fields, so are referred to by IDs, /jobs/123, /quotations/456.
As you can see, I prefer plural URI segments ;-)
I think of "job 123" as being from the "jobs" collection, so it seems logical to have a "jobs" resource, with subresources for each job.
You do not need to have a separate /search/ area for performing searches, I think it would be cleaner to apply your search criteria to the collection resource directly:
/people?ssn=123456-7890 (people with SSN matching/containing "123456-7890")
/people?name=morgan (people who's name is/contains "Morgan")
I have something similar, but use only the first letter as a filter:
/sites?alpha=f
Lists all sites beginning with F. You can think of it as a filter, or as a search criteria, those terms are just different sides of the same coin.
Good to see someone taking time to think about their Resource urls!
I would make a Url with the unique id to provide resource to a single user. Like:
http://api.mysite.com/person/12/
Where 12 is your unique ID. Note that I also prefer the singular 'person'....
Regardless, the url should return:
{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}
However, I would also create a general search URL that returns a list of users that match the parameters (either a json array or whatever format you need). You can specify search parameters as get params like this:
http://api.mysite.com/person/search/?ssn=840212-3312
Or
http://api.mysite.com/person/search/?name=Morgan
These would return something like this for a single search hit--note it's an array, not a single item like the unique id url that points directly to a single user.
[{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}]
This search could then be later augmented for other search criteria. You might only return the unique id's via the search Url--you could always make a request to the unique id url once you've got it from the search...
I would suggest that you use neither. Generate resource IDs that are unique both to a single user of your API and across all other resources (including other users' resources).
Using the unique database ID is not ideal for a couple of reasons. First, API resources and database records won't necessarily always be 1-to-1 even if you have designed it that way today. Second, you might change to a different data store that would generate different format unique ids.
Also, it is good practice to separate out the ID from other resource properties, such as SSN (as an aside I hope you are storing SSN in a very secure manner, but that's another topic). If for whatever reason an SSN changed, more than one API resource was associated with the same SSN, or you decide that piece of data is not needed someday, you don't want to have to change the ID.
One pattern is to prepend the unique ID with a few characters that indicate the resource type. For example if User is a resource type in you API, a generated unique ID would be something like USR56382.
RESTful API is an architectural style which emphasizes on resource centric design approach.
In my opinion, I would keep the resources as plural and noun format.
Every resource, for example, customers has following uniform interfaces
POST /customers - for creating a resource instance
PUT /customers/{customerId} - for updating a particular instance
GET /customers - is for search customers. So #Ray, search is not required to be part of URI itself. Any filter or query parameters that need to be supported should be there itself.
GET /customers/{customerId} - to retrieve a particular instance of customer
DELETE /customers/{customerId} to delete a particular instance
The reason why plural, it is because it behaves as a factory. For example, when u r trying to create a new instance of a resource, the instance does not exist and therefore, it cannot be on the self instance. Hence, singularity is not used.
It also goes hand-in-hand for search/inquiry, where you do not know or hold the actual instance of resource. Hence, the plural form is much recommended.
Now, the question is what to use for a resource id - a database primary key, a generated identifier, or an encrypted token.
In my opinion, database primary keys should not be exposed. Resource identifier should not be designed 1-1 with DB primary key. But, it happens a lot. A generated UUID based key is much more recommended to avoid any sequential follow-through attack but world is not ideal always.
Coming back to token or an encrypted token, is a recommended approach for sensitive APIs, and where data exchange is performed between two separate applications. If we are using it, the encryption/decryption should be solely at the API end. That means, the encrypted keys for sub-resources should be returned as part of parent API response, otherwise it defeats the purpose.

Is there a better restful interface for this?

GET https://api.website.com/v1/project/employee;company-id={company-id},
title={title-id}?non-smoker={true|false}&<name1>=<value1>&<name2>=<value2>&<name3>=<value3>
where:
company-id is mandatory,
title is optional
name/value can be any filter criteria.
Is there a better way to define the interface?
This API is not supposed to create an employee object. It is for getting an array of employee objects that belongs to a particular company and has a particular title and the other filter criteria.
I don't know if there is a better way, because it depends often on the technology you use and its idioms.
However, here is two different URI designs that I like (and why)
#1 GET https://api.website.com/v1/project/employee/{company-id}?title={title-id}&non-smoker={true|false}&<name1>=<value1>&<name2>=<value2>&<name3>=<value3>
#2 GET https://api.website.com/v1/project/company/{company-id}/employee?title={title-id}&non-smoker={true|false}&<name1>=<value1>&<name2>=<value2>&<name3>=<value3>
As you can see in both example I extracted company-id from the query string. I prefer to add mandatory parameters in the path info to distinguish them. Then, in the second URI, the employee ressource is nested in the company. That way you can easily guess that you can retrieve all employee from a specific company, which is not obvious in the first example.
This api is supposed to GET employee objects that satisfy the given criteria of belonging to a particular company, having particular job title and some other filter criteria.
Personally I would just design your URI as http://acme.com/employee/?company=X&title=Y&non-smoker=Z&T=U. I wouldn't write "in stone" that the company is mandatory: your API will be easier to change.
However, you should consider that few "big" requests are far faster than plenty of small ones. Moreover, URI representations can be effectively cached. Therefore it is often better to have URIs based on IDs (since there are more chances that they will be asked again).
So you could get the complete employee list of a company (plus other data about the company itself) with http://acme.com/company/X and then filter it client-side.
Are you creating a new employee object? If so then a POST (create) is more appropriate. A good clue is all the data you're pushing in the URL. All that should be in the body of the POST object.