HATEOAS - Discovery and URI Templating - rest

I'm designing a HATEOAS API for internal data at my company, but have been having troubles with the discovery of links. Consider the following set of steps for someone to retrieve information about a specific employee in this system:
User sends GET to http://coredata/ to get all available resources, returns a number of links including one tagged as rel = "http://coredata/rels/employees"
User follows HREF on the rel from the first request, performing a GET at (for example) http://coredata/employees
The data returned from this last call is my conundrum and a situation where I've heard mixed suggestions. Here are some of them:
That GET will return all employees (with perhaps truncated data), and the client would be responsible for picking the one it wants from that list.
That GET would return a number of URI templated links describing how to query / get one employee / get all employees. Something like:
"_links": {
"http://coredata/rels/employees#RetrieveOne": {
"href": "http://coredata/employees/{id}"
},
"http://coredata/rels/employees#Query": {
"href": "http://coredata/employees{?login,firstName,lastName}"
},
"http://coredata/rels/employees#All": {
"href": "http://coredata/employees/all"
}
}
I'm a little stuck here with what remains closest to HATEOAS. For option 1, I really do not want to make my clients retrieve all employees every time for the sake of navigation, but I can see how using URI templating in example two introduces some out-of-band knowledge.
My other thought was to use the RetrieveOne, Query, and All operations as my cool URLs, but that seems to violate the concept that you should be able to navigate to the resources you want from one base URI.
Has anyone else managed to come up with a good way to handle this? Navigation is dead simple once you've retrieved one resource or a set of resources, but it seems very difficult to use for discovery.

Option 2 is not too bad as you're using RFC 6570 to characterize the URI patterns; while HATEOAS is usually stated in terms of not having clients synthesize URIs, if a server is prepared to make guarantees on the URI template and to tell it to clients explicitly in a standard format, it's acceptable. (I would be tempted to have the “list all employees” URL be without the all suffix, so as to distinguish it from the employee with that ID; the client should not — in principle — know what an employee ID looks like.)
In fact, the main problem is actually that clients have to understand what those tag URIs mean; there's just no real way to guess that “http://coredata/rels/employees#All” means “list all employees”. That's where you get into embedding knowledge in clients, semantic labeling, etc. and HATEOAS doesn't really address those things.

TL;DR: Use OPTIONS method to return programmatically consumable documentation and always implement pagination.
We create a number of internal REST services at my work. We have standardized on the use of the OPTIONS method to return the metadata of a resource. The metadata we return acts a parsable documentation of that resource. It indicates url templates, various options such as PAGE, PAGESIZE and the different methods that the resource supports. We also return rel links so top level resource discovery can occur with the use of OPTIONS without pulling and actual data.
We also implement pagination specifically to prevent issues around returning large amounts of data unnecessarily.

My HATEOAS API returns HTML as well as HAL+JSON, as you are using, and they both use the same URIs, so my JSON responses simply return what a human web user would see (minus all the pretty colours). e.g.
GET /
{"_links": {
"http://coredata/companies": { "href": "/companies?page=1" }
...
}}
GET /companies?page=1
{"_links": {
"next": { "href": "?page=2" }
...
}}

Related

RESTful API design - using a resource URI vs an ID

this is my first post, so please bear with me.
I am designing a new RESTful API and I have two design choices in how my clients interact with resources that they create.
As an example, I have a resource: "book", which is a simple, singleton resource.
Creating a new book is very simple:
POST https://api.mydomain.com/book
I know I can also use PUT if I want the operation to be idempotent.
This question is solely about the 200 OK response options, returning either:
an anonymous resource identifier (UUID) of the created "book":
{
book_id = 12345-67890
title = "a fantastic story"
}
a full FQDN URI to the created "book":
{
book_uri = "https://mylibrary.mydomain.com/upstairs/book/12345-67890
title = "a fantastic story"
}
This of course significantly effects the subsequent manipulation of the "book" by the client.
To get the title of the above book, the client API calls would be either:
GET https://api.mydomain.com/book/{book-id}
Example: GET https://api.mydomain.com/book/12345-67890
Notes: The client will always use the same endpoint as the POST call, with the book-id simply appended.
GET {book-uri}
Example: GET https://mylibrary.mydomain.com/upstairs/book/12345-67890
Notes: The client will use the {book-uri} object variable directly from the POST response. Importantly, the returned {book-uri) may be a completely different URI to that of the POST used to create the "book".
So my questions (please) are:
Q1) which is the better model for the client to use and why?
Q2) can you see any issues with using Option 2 in a high volume, commercial system?
Thanks for any help and answers in advance.
can you see any issues with using Option 2 in a high volume, commercial system?
So, Option 2, where the HTTP response includes a URI for the newly created resource, is how the web itself works, and the web seems to be doing pretty well as a high volume commercial system.
Note also that option #2 allows the server to control its URIs. For instance, if you later decide that you want to revise the resource model, and use different spellings for the resource identifiers, then you can do that without needing to make any changes to the client.
You can also introduce, for example, a URI shortening component, because again you've got an identifier with standardized rules for how it works.
You don't necessarily need to use a full URI - we've also got standardized rules for how a URI fragment can be used to compute a URI in a given context, so you'll likely have options like
{
book_uri = "/upstairs/book/12345-67890",
title = "a fantastic story"
}
... depending on whether or not the book resource is staged on the same host as the resource that handles the POST request.
Is this better? That's going to depend on what tradeoffs you need to make, and how much you value each of the benefits versus the costs.
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction. -- Fielding, 2000

bulk GET using HATEOAS

I've seen many examples of HATEOAS where every resource has links to related resources. An API that returns N items of a certain resource per page, the client would probably need N calls to fetch any nested resource by consuming HATEOAS. For example:
GET city/documents:
[{
id: 1,
city: {
self: 'http://service.com/cities?filter=id==1'
},
document: { ... }
...
}, {
id: 2,
city: {
self: 'http://service.com/cities?filter=id==2'
},
document: { ... }
...
}]
FYI, the query parameter uses the FIQL syntax to define the filters.
Now, if the client was to fetch the city details for each document (to show on UI), it will probably need N additional calls. However in my case, the /cities API can additionally take multiple city ids like this: /cities?filter=id=in=(1,2) that can reduce N calls to one. Is there a way to articulate something like this using HATEOAS? I've read about the templates but not sure how should the template look like and how would client consume it?
I've seen many examples of HATEOAS where every resource has links to related resources. An API that returns N items of a certain resource per page, the client would probably need N calls to fetch any nested resource by consuming HATEOAS.
Yes. Less true in a world with Server-Push, where the server can proactively provide multiple resources in response to a query. If you imagine asking for a web page, and getting the html, and then also the images and the java script resources too, then you've got the right sort of idea.
API can additionally take multiple city ids like this: /cities?filter=id=in=(1,2) that can reduce N calls to one. Is there a way to articulate something like this using HATEOAS?
Yes.
Let's walk through it carefully. What you've done here is introduced a new resource, with identifier /cities?filter=id=in=(1,2). You might have another resource /cities?filter=id=in=(1,20) and another resource /cities?filter=id=in=(1,2000). In your implementation, these might be a "single endpoint" that extracts parameters from the identifier and uses them to generate the correct representation.
So what you get is something like a data transfer object - a large grained resource fetched in a single go.
I've read about the templates but not sure how should the template look like and how would client consume it?
The simplest example, which you have likely seen already, is a web form. You allow the client to provide the start and end elements, and the form processing takes that information and creates the specified URI from it.
/filtered-cities?start=1&end=2000
So the client needs to understand what the form is for, and how to identify the semantics of the different elements in the form. The agent needs to understand the processing rules that transfer the form data into the URI.
URI Templates are the same basic idea; they give you a domain agnostic language with which to describe where the parameters go in a resource identifier. The basic pattern is the same - there needs to be agreement about the semantics of the parameters, the server provides a URI, the client provides a parameter map, and the generic code can take care of the merge
uri = template.apply(parameterMap)
URI Templates aren't quite as powerful as forms; with a form, you can introduce a default value for a parameter, but there is no analogous capability in URI templates.
HAL-Forms may give you a better sense of how a form based approach might work in JSON.

Is it really practical to use URLs instead of ids in a REST API

The proper design of REST APIs seems to be a controversial topic. As far as I understand it, the purist approach with regard to ids would be that the URL is the only identifier of a resource for the outside world, so neither does the client have to interpret the URL in any way (e.g. knowing that the latest segment is the id) nor does the id have to be included explicitly in the representation returned for a simple GET request.
At first sight this seems to be a good rule because the client does not have to care about generating URLs based on ids, it's just the same thing. The id tells you how to retrieve the resource. However, I doubt that this is really applicable in practice. Some concerns that come to my mind:
What if the URL changes because of a new API version (given that it is part of the URL)
or the protocol changes from http to https.
or the application even moves to another domain for whatever reason
Short Ids are handy for referencing resources in parameters. This would not be possible: /books?author=short.author.id
It just puts too much information into an id that does not really belong there because the ide should not be interpreted by any consumer in such a way.
Is this really done in practice? Are there examples of popular public APIs applying this pattern? Or maybe I don't understand it correctly and this is not what REST purists advocate?
Have a look at Hypermedia Driven RESTFul APIs. In HATEOAS, URIs are discoverable (and not documented) so that they can be changed. That is, unless they are the very entry points into your system (Cool URIs, the only ones that can be hard-coded by clients) - and you shouldn't have too many of those if you want the ability to evolve the rest of your system's URI structure in the future. This is in fact one of the most useful features of REST.
For the remaining non-Cool URIs, they can be changed over time, and your API documentation should spell out the fact that they should be discovered at runtime through hypermedia traversal.
Looking at the Richardson's Maturity Model (level 3), this would be where links come into play. For example, from the top level, say /api/version(/1), you would discover there's a link to the groups. Here's how this could look in a tool like HAL Browser:
Root:
{
"_links": {
"self": {
"href": "/api/root"
},
"api:group-add": {
"href": "http://apiname:port/api/group"
},
"api:group-search": {
"href": "http://apiname:port/api/group?pageNumber={pageNumber}&pageSize={pageSize}&sort={sort}"
},
"api:group-by-id": {
"href": "http://apiname:port/api/group/{id}" (OR "href": "http://apiname:port/api/group?id={id}")
}
}
}
The advantage here would be that the client would only need to know the relationship (link) name (well obviously besides the resource structure/properties), while the server would be mostly free to alter the relationship (and resource) url.

REST - What exactly is meant by Uniform Interface?

Wikipedia has:
Uniform interface
The uniform interface constraint is fundamental to the design of any REST service.[14] The uniform interface simplifies and decouples the architecture, which enables each part to evolve independently. The four guiding principles of this interface are:
Identification of resources
Individual resources are identified in requests, for example using URIs in web-based REST systems. The resources themselves are conceptually separate from the representations that are returned to the client. For example, the server may send data from its database as HTML, XML or JSON, none of which are the server's internal representation, and it is the same one resource regardless.
Manipulation of resources through these representations
When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource.
Self-descriptive messages
Each message includes enough information to describe how to process the message. For example, which parser to invoke may be specified by an Internet media type (previously known as a MIME type). Responses also explicitly indicate their cacheability.
Hypermedia as the engine of application state (A.K.A. HATEOAS)
Clients make state transitions only through actions that are dynamically identified within hypermedia by the server (e.g., by hyperlinks within hypertext). Except for simple fixed entry points to the application, a client does not assume that any particular action is available for any particular resources beyond those described in representations previously received from the server.
I'm listening to a lecture on the subject and the lecturer has said:
"When someone comes up to our API, if you are able to get a customer object and you know there are order objects, you should be able to get the order objects with the same pattern that you got the customer objects from. These URI's are going to look like each other."
This strikes me as wrong. It's not so much about what the URI's look like or that there is consistency as it is the way in which the URI's are used (identify resources, manipulate the resources through representations, self-descriptive messages, and hateoas).
I don't think that's what Uniform Interface means at all. What exactly does it mean?
Using interfaces to decouple classes from the implementation of their dependencies is a pretty old concept. In REST you use the same concept to decouple the client from the implementation of the REST service. In order to define such an interface (a contract between the client and the service), you have to use standards. This is because if you want an internet size network of REST services, you have to enforce global concepts, like standards to make them understand each other.
Identification of resources - You use the URI (IRI) standard to identify a resource. In this case, a resource is a web document.
Manipulation of resources through these representations - You use the HTTP standard to describe communication. So for example GET means that you want to retrieve data about the URI-identified resource. You can describe an operation with an HTTP method and a URI.
Self-descriptive messages - You use standard MIME types and (standard) RDF vocabs to make messages self-descriptive. So the client can find the data by checking the semantics, and it doesn't have to know the application-specific data structure the service uses.
Hypermedia as the engine of application state (a.k.a. HATEOAS) - You use hyperlinks and possibly URI templates to decouple the client from the application-specific URI structure. You can annotate these hyperlinks with semantics e.g. IANA link relations, so the client will understand what they mean.
The Uniform Interface constraint, that any ReSTful architecture should comply with, actually means that, along with the data, server responses should also announce available actions and resources.
In chapter 5 ("Reprensational State Transfer") of his dissertation, Roy Fielding states that the aim of using uniform interfaces is to:
ease and improve global architecture and the visibility of interactions
In other words, querying resources should allow the client to request other actions and resources without knowing them in advance.
The JSON-API specs (jsonapi.org) offer a good example in the form of a JSON response to an (hypothetical) GET HTTP request on http://example.com/articles :
{
"links": {
"self": "http://example.com/articles",
"next": "http://example.com/articles?page[offset]=2",
"last": "http://example.com/articles?page[offset]=10"
},
"data": [{
"type": "articles",
"id": "1",
"attributes": {
"title": "JSON API paints my bikeshed!"
},
"relationships": {
"author": {
"links": {
"self": "http://example.com/articles/1/relationships/author",
"related": "http://example.com/articles/1/author"
},
},
"comments": {
"links": {
"self": "http://example.com/articles/1/relationships/comments",
"related": "http://example.com/articles/1/comments"
}
}
},
"links": {
"self": "http://example.com/articles/1"
}
}]
}
Just by analysing this single response, a client knows:
What entities were queried ("articles" in this example);
How these entities are structured (articles have fields: id, title, author, comments);
How to retrieve related entities (i.e. the author and the comments);
That there are more entities of type "articles" (10, based on current response length and pagination links).
For those passionate about the topic, I strongly recommend reading Roy Thomas Fielding's dissertation!
Your question is somewhat broad, you seem to be asking for a restatement of the definitions you have. Are you looking for examples or do you not understand somethings specifically stated.
I agree that the line:
These URI's are going to look like each other
is fundamentally wrong. URIs needn't look anything like each other for the Uniform interface constraint to be met. What needs to be present is a uniform way to discover the URIs that identify the resources. This uniform way is unique to each message type, and there must be some agreed upon format. For example in HTML one document resource links to another via a simple tag:
fallback relationship
HTTP servers return html as a text/html resource type which browsers have an agreed upon way of parsing. The anchor tag is the hypermedia control (HATEOAS) that has the unique identifier for the related resource.
The only point that wasn't covered was manipulation. HTML has another awesome example of this, the form tag:
<form action="URI" method="verb">
<input name=""></input>
</form>
again, browser know how to interpret this meta information to define a representation of the resource acted upon at the URI. Unfortunately HTML only lets you GET and POST for verbs...
more commonly in a JOSN based service, when you retrieve a Person resource, it's easy to manipulate that representation and then PUT or PATCH it right back to it's canonical URL. No pre-existing knowledge of the resource is needed to modify it. Now when we write client code we get all wrapped up with the idea that we do in fact need to know the shape before we consume it...but that really is just to make our parsers efficient and easy. We could make parsers that analyze the semantic meaning of each part of a resource and modify it by interpreting the intent of the modification. IE: a command of make the person 10 years older would parse the resource looking for the age, identify the age, and then add 10 years to that value, then send that resource back to the server. Is it easier to have code that expects the age to be at a JSON path of $.age? absolutely...but it's not specifically necessary.
Ok I think I understand what it means.
From Fieldings dissertation:
The central feature that distinguishes the REST architectural style from other network-based styles is its emphasis on a uniform interface between components (Figure 5-6). By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved.
He's saying that the interface between components must be the same. Ie. between client and server and any intermediaries, all of which are components.

REST API Design links between collections of resources

I have been pondering the correct method of defining resource collections which have interdependence.
For instance, lets consider "documents" and "comments" which are independently accessible via the URI's:
/documents/{doc-uri}
/comments/{comment-id}
However, we typically want the collection of comments related to a specific document. Which creates a design question around how this should be archetected.
I can see a few main options:
1.) Supply an collection uri after the document uri for comments
GET /documents/{doc-uri}/comments/
2.) Provide a parameter to the comments collection to select by document
GET /comments/{comment-id}?related-doc={doc-uri}
3.) Use content negotiation to request the related comments be returned via the Accept header.
// Get all the comments for a document
GET /documents/{doc-uri} Accept: application/vnd.comments+xml
// Create a new comment
POST /documents/{doc-uri} Content-Type: application/vnd.comment+xml <comment>...</comment>
Method 1 has the advantage of automatically putting the comments in context of the document. Which is also nice when creating,updating and deleting comments using POST/PUT. However it does not provide global access to comments outside the context of a document. So if we wanted to do a search over all comments in the system we would need method #2.
Method 2 offers many of the same benefits as #1, however creating a comment without the context of a document makes no sense. Since comments must explicitly be related to a document.
Method 3 is interesting from a GET and POST/create perspective, but gets kinda hairy with update and delete.
I can see pro's and con's to all these methods so I am looking for some more guidance from someone who may have approached and solved this issue before.
I am considering doing both methods 1 & 2, thus I can provide all the functionality needed, but I am concerned I may be over-complicating/duplicating functionality.
REST API must be hypermedia-driven. See Hypermedia as the Engine of Application State (HATEOAS) constraint. So, don't waste your time on URLPatterns, because they are not RESTful. URLPattern implicates tight-coupling between a client and a server; simply, the client must be aware of how URLs look like and has an ability to construct them.
Consider this REST design for your use-case:
The representation of a document contains a link where the client can POST comments or with using of GET get all comments on the document. e.g.
{
...
"comments" : {
"href": ".. url ..",
"rel": ["create-new-comment", "list-comments"]
}
}
A client just takes this URL and performs GET or POST method on the URL; without a knowledge how the URL is, looks like.
See also this post:
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
The combination of methods 1 and 2 looks good, as you say in method 2, don't have too much sense create comments without a document context since a parent child relationship exists between both, if you delete a document is acceptable to delete all his comments also, you can make your/comments/ uri a read-only resource in order to avoid his creation without a document.
As filip26 says rest apis should be hypermedia driven but that not means that the url patterns aren't important, you could have a resource with one uri or many, if your resources have multiple uris is easier for clients to find them, the downside is that could be confusing because some clients use one uri instead of another, so you can use a canonical uri for a resource, when a client access a resource throught this canonical uri you can send back a 200 OK, when a client request one of the others uri you can send back a 303 "See also" along with the canonical uri.`