Rest API design: Managing access to sub-entities - rest

Note: I realize that this is close to being off-topic for being opinion-based, but I am hoping that there is some accepted best practice to handle this that I just don't know about.
My problem is the following: I need to design a Rest API for a program where users can create their own projects, and each project contains files that can only be seen by users that have access. I am stuck with how to design the "List all files of a project" query.
Standard Rest API practice would suggest two endpoints, like:
`GET /projects` # List all projects
`POST /projects` # Create new project
`GET /projects/id` # Get specific project
etc.
and the same for the files of a project.
However, there should never be a reason to list all files - only the files of a single project. To make it more complicated, access management needs to be a thing, users should never see files that are in projects they don't have access to.
I can see multiple ways to handle that:
So the obvious way is to implement the GET function, optionally with a filter. However, this isn't optimal, since if the user doesn't set a filter, it would have to crawl through all projects, check for each project whether the user has access, and then list all files the user has access to:
GET /files?project=test1
I could also make the files command a subcommand of the projects command - e.g.
GET /projects/#id/files
However, I have the feeling this isn't too restful, since it doesn't expose entities directly?
Is there any concencus on which should usually be implemented? Is it okay to "force" users to set a parameter in the first one? Or is there a third alternative that solves what I am looking for? Happy about any literature recommendations on how to design this as well.

Standard Rest API practice would suggest two endpoints
No, it wouldn't. REST practice would suggest figuring out the resources in your resource model.
Think "documents": I should be able to retrieve (GET) a document that describes all of the files in the project. Great! This document should only be accessible when the request authorization matches some access control list. Also good.
Maybe there should also be a document for each user, so they can see a list of all of the projects they have access to, where that document includes links to the "all of the files in the project" documents. And of course that document should also be subject to access control.
Note that "documents" here might be text, or media files, or scripts, or CSS, or pretty much any kind of information that you can transmit over the network. We can gloss the details, because "uniform interface" means that we manage them all the same way.
In other words, we're just designing a "web site" filled with interlinked documents, with access control.
Each document is going to need a unique identifier. That identifier can be anything we want: /5393d5b0-0517-4c13-a821-c6578cb97668 is fine. Because it can be anything we want, we have extra degrees of freedom.
For example, we might design our identifiers such that the document whose identifiers begin with /users/12345 are only accessible by requests with authorization headers that match user 12345, and that all documents whose identifiers begin with /projects/12345 are only accessible by requests with authorization headers that match any of the users that have access to that specific project, and so on.
In other words, it is completely acceptable to choose resource identifier spellings that make your implementation easier.
(Note: in an ideal world, you would have "cool" identifiers that are implementation agnostic, so that they still work even if you change the underlying implementation details of your server.)
I have the feeling this isn't too restful, since it doesn't expose entities directly?
It's fine. Resource models and entity models are different things; we shouldn't expect them to always match one to one.

After looking further, I came across this document from Microsoft. Some quotes:
Also consider the relationships between different types of resources and how you might expose these associations. For example, the /customers/5/orders might represent all of the orders for customer 5. You could also go in the other direction, and represent the association from an order back to a customer with a URI such as /orders/99/customer. However, extending this model too far can become cumbersome to implement. A better solution is to provide navigable links to associated resources in the body of the HTTP response message. This mechanism is described in more detail in the section Use HATEOAS to enable navigation to related resources.
In more complex systems, it can be tempting to provide URIs that enable a client to navigate through several levels of relationships, such as /customers/1/orders/99/products. However, this level of complexity can be difficult to maintain and is inflexible if the relationships between resources change in the future. Instead, try to keep URIs relatively simple. Once an application has a reference to a resource, it should be possible to use this reference to find items related to that resource. The preceding query can be replaced with the URI /customers/1/orders to find all the orders for customer 1, and then /orders/99/products to find the products in this order.
This makes me think that using solution 2 is probably the best case for me, since each file will be associated with only a single project, and should be deleted when a project is deleted. Files cannot exist on their own, outside of projects.

Related

How to name REST API endpoints for very especific features?

So, I'm building a RESTful API, that's like a Parking system, it have ParkingLots and Cars that enter or leave ParkingLots, at this moment, my endpoints looks like this.
POST '/parking-lots' // To create a ParkingLot
POST '/cars' // To create a Car
But, how to name an endpoint that has EnterParkingLot or LeaveParkingLot feature following REST best pratices? I didn't found an article or blog post that answer this question so far.
But, how to name an endpoint that has EnterParkingLot or LeaveParkingLot feature following REST best pratices? I didn't found an article or blog post that answer this question so far.
The key abstraction of information in REST is a resource. Resources are generalizations of documents, not endpoints.
Useful work is a side effect of editing documents (Webber, 2011).
If you are having trouble figuring out the URI, that probably means that you haven't been thinking enough about the documents (aka you don't yet have a clear understanding of your "resource model").
One idea that can sometimes help is to think about a client with cached documents. When they send you one of these messages, which document in the cache is the most important one to invalidate?
In REST, it is normal (not necessarily common) to have a single resource handle many different kinds of edits. For instance, we might submit an EnterParkingLot form, that would update a parking lot resource, and then later submit a LeaveParkingLot form that updates the same parking lot resource (with the code on the server parsing the requests to distinguish the different kinds of edits).
But it would also be fine to add a "parking ticket" resource to your resource model, with a different ticket tracking the arrival and departure of each car.
Domain experts are typically good at telling you what the documents are, and what names make sense.
I would do it like:
POST '/parking-lot/$n/enter'
POST '/parking-lot/$n/leave'

REST: Get query only changeable objects

I'm having a bunch of apis which return several types of data.
All users can query all data by using a GET rest api.
A few users can also change data. What is a common approach when designing REST API, to query only the data that can be changed by the current user but still allow the api to return all data (for display mode).
To explain it further:
The software manages projects. All projects are accessible for all users (also anonymous) via an api (let's call it GET api/projects).
A user has the ability to see a list of all projects he is involved in and which he can edit.
The api should return exactly the same data but limited to the projects he is involed in.
Should I create an additonal parameter, or maybe pass an http header, or what else?
There's no one-size-fits-all answer to this, so I will give you one recommendation that works for some people.
I don't really like creating resources that have 'complex access control'. Instead, I would prefer to create distinct resources for different access levels.
If you want to return limit results for people that have partial access, it might be better to create new resources that reflect this.
I think this might also help a bit thinking about the abstract role of a person who is not allowed to do everything. The abstraction probably doesn't exist per-property, but it exists somewhere as a business rule.

Hypermedia API: How to document properly?

I am in the middle of developing my first Hypermedia API. I thought I had a good grasp of things but when It comes to documenting the API, I start to question my understanding of the whole concept.
The core of the question comes down to documention, but It might be that I did not understand one or more aspects correctly. If so, please tell me :-)
Documenting Link Relations
Let's say I have a more or less generic link relation (https://example.com/rels/accounts) in my API to link related accounts. The exact meaning can change on context, right?
On my billboard (or index), I might have a link with that relation to browse all accounts.
On another resource, account group for example, it might just link to a specific subset of accounts belonging to that group.
How should I document that relation? Just saying that this is a link to a collection of accounts seems not enough. However, that is exactly what RFC5988 does, just describing what the link itself means. So that might be the way to go.
The problem gets worse with actual state transitions. Let's use https://example.com/rels/create-account as our example here. If the documentation just says "This is where you create a new account", It makes no statements on what I have to do with that link to create a resource. Where would the documentation be that states something along the lines:
You either POST multipart/form-data or application/json to this endpoint that contains at least the following fields [...]
The relation itself does not seem to be the right place. Especially when you consider that the payload to that URL might also change on context. In our example, having that relation on an account group would make it mandatory to omit the accountGroup field because the value is provided by the context.
Documenting Profiles
As far as I understood it, I can (and probably should) have a profile for each resource in my API, documenting the resource itself. This would include its properties and links that can occur. Would that be the place where I specify what the link means exactly?
Sticking to my earlier example, would I document for profile https://example.com/profiles/account-group that the link with the relation https://example.com/rels/accounts links to a collection of accounts that are part of this group?
That makes sense to me, but when we go into actual state transitions things seem to get messy.
State transitions
Let's say the client navigated to an account collection from an account group. The resource itself would not really differ from the resource that contains all global accounts. It would have pagination links and the links to the account resources themselves.
If that account-collection resource has a link with the relation type https://example.com/rel/create-account I would be in deep trouble, right? Because the information that this is an account-collection containing just accounts of a certain group is not encoded in the profile
https://example.com/profiles/account-collection and can therefore not contain the information that clients have to omit the accountGroup property when posting to that resource.
Concrete Questions
Am I right that relations definitions should be weak and not contain any information about how I can interact with the resource it links to?
If so, can I expect from client to follow a link and then discover what they can do, based on the profile of that resource. This seems wrong, especially for state transitions.
If profiles should document what a client might do with the linked resources, I cannot transport context across multiple "jumps" within the API, correct?
Should I use even more profiles and relations? https://example.com/profiles/global-accounts and https://example.com/profiles/account-group-accounts come to mind.
The more I think about it, I must either miss a critical piece or it's something that can be solved in multiple ways. Because of that, I am aware that there might be no 100%-correct answer to this question. But maybe someone can enlighten me so that I can make my own trade-offs? :)
Let's take your points one at a time:
1. Link relation meaning based on context
To reformulate your question into code context: How do I document the "getAccounts()" method when it means something different in the class Billboard and AccountGroup?
The obvious answer is you don't document the method in general, but you document the classes and within them the methods. The RFC you're referring to tries to define relations that are in some sense generic, or should mean the same thing every time. You might re-use some of them, but you still have to document the method and what they mean in your class.
Class equals to your Media Type. So I propose you document your Media Type.
2. Links and Forms, how to define what to POST
If you document your Media-Types, you can define whatever you like in there, including how to use its links. However, I would recommend not defining the Media Type of the POST, but letting it up to Content-Negotiation.
The client knows it has to POST an Account, it will use some Media-Type it thinks is proper for this task. The server will then tell the client whether that format is acceptable or not, it can also give a list of acceptable Media Types in return.
3. Documenting Profiles
If by Profiles you mean Media-Types, then yes, you should document them. You should actually only document Media-Types.
4. State transitions, modifications to representations based on state
Since the account group is a resource anyway, the client should not actually supply it, it should be part of the "state" which group the new account belongs to.
In other words, the client gets a link, that already has the context of the current account group. The client has to post a generic account, but the server knows it should belong to the group in the current state (it is part of the URI for example)
So no, the client shouldn't know it has to omit some parameter.
5. Question
Yes, relations should not define how to interact with the resource. Media Types could actually do that (like Forms defining it must be a POST, etc.), but more often than not it's not necessary.
Yes, clients discover not only what transitions are available (links), but what methods are available. The methods (GET, POST, PUT) always mean the same thing, and they are not described in Media-Types, since Media-Types only describe representations, not resources. The server normally submits all the supported methods in a response, or explicitly in response to OPTIONS.
I still don't really know what you mean by "Profile". If you mean relation profile, as in some data in the link definition, then no. The context/state travels in the URI. You can use the URI to "save" that the client is moving inside one account group for example.
No, you shouldn't add semantics to link relations. The Media-Type adds the semantics to links.
HTH
I will try to answer more succinctly than Robert's answer, which I think can get confusing.
Let's say I have a more or less generic link relation (https://example.com/rels/accounts) in my API to link related accounts. The exact meaning can change on context, right?
No. The meaning of a link relation is static. Your "list of accounts" relation on your index (list of lists) page is specific to your application. In your other example, account group, you are already in a "collection" resource, and members of that collection have link relation "item" (see the list of IANA link relations).
The problem gets worse with actual state transitions. Let's use https://example.com/rels/create-account as our example here. If the documentation just says "This is where you create a new account"
I would instead add a "create-form" link (another standard IANA link relation) to the "list of accounts" page. Your clients would then go: start -> list-of-Xes -> create-form -> submit. There would be no "create-an-X" or "create-a-Y" link relations. You probably would not allow clients to create new collection types either. This makes navigating the API lengthier for clients but reduces the amount of stuff they need to know (API breadth).
Would [a profile] be the place where I specify what the link means exactly?
If you make your link relations generic except for one to describe each model class, you will not have to document these in profiles on a per-model-class basis.
If that account-collection resource has a link with the relation type https://example.com/rel/create-account I would be in deep trouble, right?
Yes, so don't do that! What you are describing there is linking an existing resource to a collection (by IANA's definition). What I would advise is that you support clients being able to do this:
LINK http://site/account-collections/some-collection HTTP/1.1
Link: <http://site/accounts/some-account-id>; rel=item
Authorization: Token abc123
This simple (complete!) HTTP request has the semantics of adding an "item" link containing the existing account's URI to the collection you wish to add it to. It does not create a new account. The LINK method is still in draft but has existed in some form since HTTP 1.1 (1997).
Robert wrote:
You should actually only document Media-Types.
I disagree. You should document link relations too, but try not to create them if you can use an generic one.

Include / embed vs. link in RESTful APIs

So the general pattern for a RESTful API is to return a single object with embedded links you can use to retrieve related objects. But sometimes for convenience you want to pull back a whole chunk of the object graph at once.
For instance, let's say you have a store application with customers, orders, and returns. You want to display the personal information, all orders, and all returns, together, for customer ID 12345. (Presumably there's good reasons for not always returning orders and returns with customer personal information.)
The purely RESTful way to do this is something like:
GET /
returns a list of link templates, including one to query for customers
GET /customers/12345 (based on link template from /)
returns customer personal information
returns links to get this customer's orders and returns
GET /orders?customerId=12345 (from /customers/12345 response)
gets the orders for customer 12345
GET /returns?customerId=12345 (from /customers/12345 response)
gets the returns for customer 12345
But it'd be nice, once you have the customers URI, to be able to pull this all back in one query. Is there a best practice for this sort of convenience query, where you want to transclude some or all of the links instead of making multiple requests? I'm thinking something like:
GET /customers/12345?include=orders,returns
but if there's a way people are doing this out there I'd rather not just make something up.
(FWIW, I'm not building a store, so let's not quibble about whether these are the right objects for the model, or how you're going to drill down to the actual products, or whatever.)
Updated to add: It looks like in HAL speak these are called 'embedded resources', but in the examples shown, there doesn't seem to be any way to choose which resources to embed. I found one blog post suggesting something like what I described above, using embed as the query parameter:
GET /ticket/12?embed=customer.name,assigned_user
Is this a standard or semi-standard practice, or just something one blogger made up?
Being that the semantics of these types of parameters would have to be documented for each link relation that supported them and that this is more-or-less something you'd have to code to, I don't know that there's anything to gain by having a a standard way of expressing this. The URL structure is more likely to be driven by what's easiest or most prudent for the server to return rather than any particular standard or best practice.
That said, if you're looking for inspiration, you could check out what OData is doing with the $expand parameter and model your link relation from that. Keep in mind that you should still clearly define the contract of your relation, otherwise client programmers may see an OData-like convention and assume (wrongly) that your app is fully OData compliant and will behave like one.

Comparing data with RESTful API

For a website I am working on defining a RESTful API. I believe I got it (mostly) correct using proper resource URIs and proper use of GET/POST/UPDATE/DELETE.
However there is one point where I can't quite figure out what the proper way to do it "in" REST would be - comparison of lists.
Let's say I have a bookstore and a customer can have a wishlist. The wishlist consists of books (their full Book record, i.e. name, synopsis, etc) and a full copy of the list exists on the client. What would be a good way to design the RESTful API to allow a client to query the correctness of its local wishlist (i.e. get to know what books have been added/removed on the wishlist on the server side)?
One option would be to just download the full wishlist from the server and compare it locally. However this is quite a large amount of data (due to the embedded content) and this is a mobile client with a low-bandwidth connection, so this would cause a lot of problems.
Another option would be to download not the whole wishlist (i.e. not including book infos) but only a list of the books' identifiers. This would be not much data (compared to the previous option) and the client could compare the lists locally. However to get the full book record for newly added books, a REST call would have to be made for every single new book. Again, as this is a mobile client with bad network connectivity, this could be problematic.
A third option and my favorite, would be that the client sends its list of identifiers to the server and the server compares it to the wishlist and returns what books were removed and the data for books that were added. This would mean a single roundtrip and only the necessary amount of data. As the wishlist size is estimated to be less than 100 entries, sending just the IDs would be a minimal amount of data (~0.5kb). However I don't know what kind of call would be appropriate - it can't be GET as we are sending data (and putting it all in the URL does not feel right), it can't be POST/UPDATE as we do not change anything on the server. Obviously it's not DELETE either.
How would you implement this third option?
Side-question: how would you solve this problem (i.e. why is option 3 stupid or what better, simple solutions may there be)?
Thank you.
P.S.: A fourth option would be to implement a more sophisticated protocol where the server keeps track of changes to the list (additions/deletes) and the client can e.g. query for changes based on a version identifier or simply a timestamp. However I like the third option better as implementation-wise it is much more simpler and less error-prone on both client and server.
There is nothing in HTTP that says that POST must update the server. People seem to forget the following line in RFC2616 regarding one use of POST:
Providing a block of data, such as the result of submitting a
form, to a data-handling process;
There is nothing wrong with taking your client side wishlist and POSTing to a resource whose sole purpose is to return a set of differences.
POST /Bookstore/WishlistComparisonEngine
The whole concept behind REST is that you leverage the power of the underlying HTTP protocol.
In this case there are two HTTP headers that can help you find out if the list on your mobile device is stale. An added benefit is that the client on your mobile device probably supports these headers natively, which means you won't have to add any client side code to implement them!
If-Modified-Since: check to see if the server's copy has been updated since your client first retrieved it
Etag: check to see if a unique identifier for your client's local copy matches that which is on the server. An easy way to generate the unique string required for ETags on your server is to just hash the service's text output using MD5.
You might try reading Mark Nottingham's excellent HTTP caching tutorial for information on how these headers work.
If you are using Rails 2.2 or greater, there is built in support for these headers.
Django 1.1 supports conditional view processing.
And this MIX video shows how to implement with ASP.Net MVC.
I think the key problems here are the definitions of Book and Wishlist, and where the authoritative copies of Wishlists are kept.
I'd attack the problem this way. First you have Books, which are keyed by ISBN number and have all the metadata describing the book (title, authors, description, publication date, pages, etc.) Then you have Wishlists, which are merely lists of ISBN numbers. You'll also have Customer and other resources.
You could name Book resources something like:
/book/{isbn}
and Wishlist resources:
/customer/{customer}/wishlist
assuming you have one wishlist per customer.
The authoritative Wishlists are on the server, and the client has a local cached copy. Likewise the authoritative Books are on the server, and the client has cached copies.
The Book representation could be, say, an XML document with the metadata. The Wishlist representation would be a list of Book resource names (and perhaps snippets of metadata). The Atom and RSS formats seem good fits for Wishlist representations.
So your client-server synchronization would go like this:
GET /customer/{customer}/wishlist
for ( each Book resource name /book/{isbn} in the wishlist )
GET /book/{isbn}
This is fully RESTful, and lets the client later on do PUT (to update a Wishlist) and DELETE (to delete it).
This synchronization would be pretty efficient on a wired connection, but since you're on a mobile you need to be more careful. As #marshally points out, HTTP 1.1 has a lot of optimization features. Do read that HTTP caching tutorial, and be sure to have your web server properly set Expires headers, ETags, etc. Then make sure the client has an HTTP cache. If your app were browser-based, you could leverage the browser cache. If you're rolling your own app, and can't find a caching library to use, you can write a really basic HTTP 1.1 cache that stores the returned representations in a database or in the file system. The cache entries would be indexed by resource names, and hold the expiration dates, entity tag numbers, etc. This cache might take a couple days or a week or two to write, but it is a general solution to your synchronization problems.
You can also consider using GZIP compression on the responses, as this cuts down the sizes by maybe 60%. All major browsers and servers support it, and there are client libraries you can use if your programming language doesn't already (Java has GzipInputStream, for instance).
If I strip out the domain-specific details from your question, here's what I get:
In your RESTful client-server application, the client stores a local copy of a large resource. Periodically, the client needs to check with the server to determine whether its copy of the resource is up-to-date.
marshally's suggestion is to use HTTP caching, which IMO is a good approach provided it can be done within your app's constraints (e.g., authentication system).
The downside is that if the resource is stale in any way, you'll be downloading the entire list, which sounds like it's not feasible in your situation.
Instead, how about re-evaluating the need to keep a local copy of the Wishlist in the first place:
How is your client currently using the local Wishlist?
If you had to, how would you replace the local copy with data fetched from the server?
What have you done to minimize your client's data requirements when building its Wishlist view(s) and executing business logic?
Your third alternative sounds nice, but I agree that it doesn't feel to RESTfull ...
Here's another suggestion that may or may not work: If you keep a version history of of your list, you could ask for updates since a specific version. This feels more like something that can be a GET operation. The version identifiers could either be simple version numbers (like in e.g. svn), or if you want to support branching or other non-linear history they could be some kind of checksums (like in e.g. monotone).
Disclaimer: I'm not an expert on REST philosophy or implementation by any means.
Edit: Did you ad that PS after I loaded the question? Or did I simply not read your question all the way through before writing an answer? Sorry. I still think the versioning might be a good idea, though.