Connectedness & HATEOAS - rest

It is said that in a well defined RESTful system, the clients only need to know the root URI or few well known URIs and the client shall discover all other links through these initial URIs. I do understand the benefits (decoupled clients) from this approach but the downside for me is that the client needs to discover the links each time it tries access something i.e given the following hierarchy of resources:
/collection1
collection1
|-sub1
|-sub1sub1
|-sub1sub1sub1
|-sub1sub1sub1sub1
|-sub1sub2
|-sub2
|-sub2sub1
|-sub2sub2
|-sub3
|-sub3sub1
|-sub3sub2
If we follow the "Client only need to know the root URI" approach, then a client shall only be aware of the root URI i.e. /collection1 above and the rest of URIs should be discovered by the clients through hypermedia links. I find this cumbersome because each time a client needs to do a GET, say on sub1sub1sub1sub1, should the client first do a GET on /collection1 and the follow link defined in the returned representation and then do several more GETs on sub resources to reach the desired resource? or is my understanding about connectedness completely wrong?
Best regards,
Suresh

You will run into this mismatch when you try and build a REST api that does not match the flow of the user agent that is consuming the API.
Consider when you run a client application, the user is always presented with some initial screen. If you match the content and options on this screen with the root representation then the available links and desired transitions will match nicely. As the user selects options on the screen, you can transition to other representations and the client UI should be updated to reflect the new representation.
If you try and model your REST API as some kind of linked data repository and your client UI as an independent set of transitions then you will find HATEOAS quite painful.

Yes, it's right that the client application should traverse the links, but once it's discovered a resource, there's nothing wrong with keeping a reference to that resource and using it for a longer time than one request. If your client has the possibility of remembering things permanently, it can do so.
consider how a web browser keeps its bookmarks. You probably have maybe ten or a hundred bookmarks in the browser, and you probably found some of these deep in a hierarchy of pages, but the browser dutifully remembers them without requiring remembering the path it took to find them.
A more rich client application could remember the URI of sub1sub1sub1sub1 and reuse it if it still works. It's likely that it still represents the same thing (it ought to). If it no longer exists or fails for any other client reason (4xx) you could retrace your steps to see if you can find a suitable replacement.
And of course what Darrel Miller said :-)

I don't think that that's the strict requirement. From how I understand it, it is legal for a client to access resources directly and start from there. The important thing is that you do not do this for state transitions, i.e. do not automatically proceed with /foo2 after operating on /foo1 and so forth. Retrieving /products/1234 initially to edit it seems perfectly fine. The server could always return, say, a redirect to /shop/products/1234 to remain backwards compatible (which is desirable for search engines, bookmarks and external links as well).

Related

REST on non-CRUD operations

I have a resource called “subscriptions”
I need to update a subscription’s send date. When a request is sent to my endpoint, my server will call a third-party system to update the passed subscription.
“subscriptions” have other types of updates. For instance, you can change a subscription’s frequency. This operation also involves calling a third-party system from my server.
To be truly “RESTful,” must I force these different types of updates to share an endpoint?
PATCH subscriptions/:id
I can hypothetically use my controller behind the endpoint to fire different functions depending on the query string... But what if I need to add a third or fourth “update” type action? Should they ALL run through this single PATCH route?
To be truly “RESTful,” must I force these different types of updates to share an endpoint?
No - but you will often want to.
Consider how you would support this on the web: you might have a number of different HTML forms, each accepting a slightly different set of inputs from the user. When the form is submitted, the browser will use the input controls and form metadata to construct an HTTP (POST) request. The target URI of the request is copied from the form action.
So your question is analogous to: should we use the same action for all of our different forms?
And the answer is yes, if you want the general purpose HTTP application to understand which resource is expected to change in response to the message. One reason that you might want that is cache invalidation; using the right target URI allows all of the caches to understand which previously cached responses should not be reused.
Is that choice free? no - it adds some ambiguity to your access logs, and routing the request to the appropriate handler in your code takes a bit more work.
Trying to use PATCH with different target URI is a little bit weird, and suggests that maybe you are trying to stretch PATCH beyond the standard constraints.
PATCH (and PUT) have remote authoring semantics; what they mean is "make your copy of the target resource look like my copy". These are methods we would use if we were trying to fix a spelling error on a web page.
Trying to change the representation of one resource by sending a remote authoring request to a different resource makes it harder for the general purpose HTTP application components to add value. You are coloring outside of the lines, and that means accepting the liability if anything goes wrong because you are using standardized messages in a non standard way.
That said, it is reasonable to have many different resources that present representations of the same domain entity. Instead of putting everything you know about a user into one web page, you can spread it out among several that are linked together.
You might have, for example, a web page for an invoice, and then another web page for shipping information, and another web page for billing information. You now have a resource model with clearer separation of concerns, and can combine the standardized meanings of PUT/PATCH with this resource model to further your business goals.
We can create as many resources as we need (in the web level; at the REST level) to get a job done. -- Webber, 2011
So, in your example, would I do one endpoint like this user/:id/invoice/:id and then another like this user/:id/billing/:id
Resources, not endpoints.
GET /invoice/12345
GET /invoice/12345/shipping-address
GET /invoice/12345/billing-address
Or
GET /invoice/12345
GET /shipping-address/12345
GET /billing-address/12345
The spelling conventions that you use for resource identifiers don't actually matter very much.
So if it makes life easier for you to stick all of these into a hierarchy that includes both users and invoices, that's also fine.

How to manage HATEOAS links when the server is the client?

I'm learning about HATEOAS. The backend server I'm working on will use a third party REST API that uses HATEOAS. That API has an end point to return the url for each resource and also returns the related resource links with regular requests.
But I'm wondering what's a good way to manage these links on the server to avoid hardcoding them. For example if the third party changes the url of the resource, how will the server detect that change? Are there any standard practices for managing HATEOAS resource links?
Possible ways I can think of
When the server starts, get all the resources urls and cache them. Whenever the third party API needs to be called, reuse these cached urls. Whenever there is a 404 or related error, update the resource url. Or update the url periodically in intervals.
Get the resource url each time before calling the end point. Simplest but essentially doubles the number of requests.
But neither sound like robust ways.
While discovery is generally a good thing and should allow a HATEOAS system to introduce changes in ways that 'hardcoded urls' don't, if urls start breaking arbitrarily I would still consider this a major issue.
You should be able to store urls / links on your side and have some expectation that those keep working.
There are some mechanisms that deal with changes though:
The server should return 301 / 308 redirects if a resource moved. If this were the case, you should also update your references.
The server can emit Sunset or Deprecated headers. See: https://www.rfc-editor.org/rfc/rfc8594
Those are more general answers, but ultimately the existence of best practices does not mean that vendors will abide by them. With that in mind I think your best bet is to try and find out what the deprecation policy is of your vendor and see what they recommend.
Use a cached resource if it is valid, request a refresh when you don't have a local valid copy.
RFC 7234 defines the caching semantics of HTTP.
Ideally, you don't implement the caching rules yourself, but instead you use a general purpose cache.
In its ideal form, your bespoke implementation is talking to a headless browser, and the headless browser worries about the caching rules for you.
In theory, you need the initial URL to start the process, and everything else comes from that.
Each resource you get from the server should include links to other edges on the graph of service for that resource.
So, once you get the initial resource, all of the rest come automatically.
That said, it's not untoward to have "well known" entry points that are, ideally, unchanging URLs. But in the end, those are just "bookmarks", and not necessarily guaranteed end points.
Consider a shopping site such as Amazon. Outside of amazon.com, you don't know any of their URLs. They're all provided on the various forms and pages, and the human simply navigates the site. Those URLs can be changing all the time, and no one would know. With HATEOAS, it's up to the machine to follow the links, rather than a human. But the process of navigation is the same.
As others have mentioned, idea of caching a root resource has merit. Then you rely on the caching headers to direct you to how often you have to refresh the links.
But that said, operationally, there's no difference between following a normal link, and following a cached link. Underneath, the cached resource loads faster, but you still need to "follow the link". Because that's where the caching behavior kicks in. This is different from assuming the link is good, assuming you know the result of a resource lookup. Your application follows the link. Always. The underlying infrastructure is responsible for making it efficient.
So, your code should not, say, load up a root resource, and then stuff a map filled with links, and then assume they're good. Rather, the code should request the root resource, perhaps as a Map of links (datatypes for the win), and let the next layer handle the details. Because it all depends on the type of caching involved. Some have coded durations where no followup is necessary. Others, you make the request anyway, and the server tier responds back "nothing changed", so you can use your local copy, but you're still require to ask in the first place.
Those are implementation details that the SERVER mandates (not the client). It's a server contract. If they want you pinging them each and every time, so be it. That's the contract they're presenting to you and if you want to be a Good Citizen, then you should honor that contact.
Ideally, the server makes good decisions on these kinds of issues for the sake of efficiency, but in the end it's really up to them.
The client has to go along. The client in a HATEOAS system cedes a lot to the server. They're simply not decisions for the client to make.

HTTP header field for URI deprecation/expiration

I'm building a REST service where I want to implement a way to deprecate certain URIs when they shouldn't be supported anymore for one reason or another. As functions are deprecated, they will be replaced by new ones that work in similar (but not identical) ways. This means that at some point, I will have to start responding with 410 Gone.
The idea is that all client software should be updated, and after say six months all users should have had the chance to upgrade. At this time, the deprecated URIs will start to inform the client that it's out of date, so that the client can display a message to the user. This time is not known in advance, though, and can't explicitly be written in the documentation.
The problem I want to solve is:
Is there an HTTP header field I should use to indicate that a certain URI will cease to work at a certain time and, if so, which?
This can't be the first time someone wants to solve this problem. Is there an unofficial header field already in use, or should I design my own? Note that I don't want to add this information to the content itself, as that would mean that every resource was changed and needs to be refreshed by the client, which is of course not what happened.
Strictly speaking, no. The resources should be driving your applications state, so if there is a change, the uri linking would provide the nessessary changes to your application.
For a HTTP header, you are free to add custom headers. Normally starting with X- but its important to know changes to uri's is only interesting to developers not users.

Writing a client for a RESTful (hypermedia) API

I've been reading up on 'real' RESTful APIs for a few days now, and I think I'm near to groking what it's about.
But one of the things that I stumble on is that I can't even begin to imagine how one would write a client for a 'real' hypermedia API:
Most of the examples I've read talk about browsers and spiders, but that's not especially helpful: one is human-directed and 'intelligent', the other is dumb and 'random'. As it stands, I kind of get the impression that you'd need to learn AI to get a client working.
One thing that isn't clear to me is how the client knows which verb to use on any given link? Is that implicit in the 'rel' type of the uri? The alternative (reading here) seems to be using xhtml and having a client which can parse and post forms.
How likely is it that a link will change, but not the route to the link?
In most examples you see around, the route and the link are the same:
eg. if I want to set up a client which will bring me back the list of cakes from Toni's Cake Shop:
http://tonis.com
{ link: { type : "cakes" ; uri : "http://tonis.com/cakes" } }
What happens when Toni's becomes Toni's Food Shop, and the link becomes http://tonis.com/desserts/cakes?
Do we keep the initial cakes link at the root, for reverse-compatibility? And if not, how do we do a 'redirect' for the poor little agent who has been told "go to root, look for cakes"?
What am I missing?
Ok, I'm not a REST expert either, I've been reading much related stuff lately, so what I'm going to write is not my experience or opinion but rather a summary of what I read, especially, the REST In Practice book.
First of all, you can't escape from having some initial agreement between client and server, the goal of REST is to make them agree on the very minimum of things which are relevant to both of them and let each party care about their own stuff themselves. E.g., client should not care about links layout or how the data is stored on server, and server should not care about a client's state. What they agree on in advance (i.e. before the interaction begins) is what aforementioned book's authors call "Domain Application Protocol" (DAP).
The important thing about DAP is that it's stateful, even though HTTP itself is not (since any client-service interaction has state, at least, begin and end). This state can be described in terms of "What a client can/may/is expected to do next": "I've started using the service, what now? Ok, I can search items. Search this item, what's next? Ok, I can order this and that... etc"
The definition of Hypermedia content-type is being able to handle both data exchange and interaction state. As I already mentioned, state is described in terms of possible actions, and as comes from "Resource" in REST, all actions are described in terms of accessible resources. I guess, you have seen the acronym "HATEOAS (Hypermedia as the engine of application state), so that's what it apparently means.
So, to interact with the service, a client uses a hypermedia format they both understand, which can be standard, homegrown or a mixture of those (e.g. XML/XHTML-based). In addition to that, they must also share the protocol, which is most likely HTTP, but since some details are omitted from the standard, there must be some idioms of its usage, like "Use POST to create a resource and PUT to update". Also, such protocol would include the entry points of the service (again, in terms of accessible resources).
Those three aspects fully define the domain protocol. In particular, a client is not supposed to know any internal links before it starts using the service or remember them after the interaction completes. As a result, any changes in the internal navigation, like renaming /cakes to /f5d96b5c will not affect the client as soon as it adhere the initial agreement and enters the shop through the front door.
#Benjol
You must avoid to program clients against particular URI's. When you describe a link main importance has it's meaning and not URI itself. You can change the URI any time, though this shouldn't break your client.
I'd change your example this way:
{"link": {
"rel": "collection http://relations.your-service.com/cakes",
"href": "http://tonis.com/cakes",
"title": "List of cakes",
"type": "application/vnd.yourformat+json"
}}
if there is a client which consumes your service, it needs to understand:
link structure itself
link relationships(in this case "collection" which is RFC and
"http://relations.your-service.com/cakes" which is your domain
specific link relation)
In this case client can just dereference address specified by "href" attribute and display list of cakes. Later, if you change the cake list provider URI client will continue to work, this implies that client still understands semantics of your media type.
P.S.
See registered link relation attributes:
http://www.iana.org/assignments/link-relations/link-relations.xml
Web Linking RFC: https://www.rfc-editor.org/rfc/rfc5988

RESTful Web Services: method names, input parameters, and return values?

I'm trying to develop a simple REST API. I'm still trying to understand the basic architectural paradigms for it. I need some help with the following:
"Resources" should be nouns, right? So, I should have "user", not "getUser", right?
I've seen this approach in some APIs: www.domain.com/users/ (returns list), www.domain.com/users/user (do something specific to a user). Is this approach good?
In most examples I've seen, the input and output values are usually just name/value pairs (e.g. color='red'). What if I wanted to send or return something more complex than that? Am I forced to deal with XML only?
Assume a PUT to /user/ method to add a new user to the system. What would be a good format for input parameter (assume the only fields needed are 'username' and 'password')? What would be a good response if the user is successful? What if the user has failed (and I want to return a descriptive error message)?
What is a good & simple approach to authentication & authorization? I'd like to restrict most of the methods to users who have "logged in" successfully. Is passing username/password at each call OK? Is passing a token considered more secured (if so, how should this be implemented in terms of expiration, etc.)?
For point 1, yes. Nouns are expected.
For point 2, I'd expect /users to give me a list of users. I'd expect /users/123 to give me a particular user.
For point 3, you can return anything. Your client can specify what it wants. e.g. text/xml, application/json etc. by using an HTTP request header, and you should comply as much as you can with that request (although you may only handle, say, text/xml - that would be reasonable in a lot of situations).
For point 4, I'd expect POST to create a new user. PUT would update an existing object. For reporting success or errors, you should be using the existing HTTP success/error codes. e.g. 200 OK. See this SO answer for more info.
the most important constraint of REST is the hypermedia constraint ("hypertext as the engine of application state"). Think of your Web application as a state machine where each state can be requested by the client (e.g. GET /user/1).Once the client has one such state (think: a user looking at a Web page) it sees a bunch of links that it can follow to go to a next state in the application. For example, there might be a link from the 'user state' that the client can follow to go to the details state.
This way, the server presents the client the application's state machine one state at a time at runtime. The clever thing: since the state machine is discovered at runtime on state at a time, the server can dynamically change the state machine at runtime.
Having said that...
on 1. the resources essentially represent the application states you want to present to the client. The will often closely match domain objects (e.g. user) but make sure you understand that the representations you provide for them are not simply serialized domain objects but states of your Web application.
Thinking in terms of GET /users/123 is fine. Do NOT place any action inside a URI. Although not harmful (it is just an opaque string) it is confusing to say the least.
on 2. As Brian said. You might want to take a look at the Atom Publishing Protocol RFC (5023) because it explains create/read/update cycles pretty well.
on 3. Focus on document oriented messages. Media types are an essential part of REST because they provide the application semantics (completely). Do not use generic types such as application/xml or application/json as you'll couple your clients and servers around the often implicit schema. If nothing fits your needs, just make up your own type.
Maybe you are interested in an example I am hacking together using UBL: http://www.nordsc.com/blog/?cat=13
on 4. Normally, use POST /users/ for creation. Have a look at RFC 5023 - this will clarify that. It is an easy to understand spec.
on 5. Since you cannot use sessions (stateful server) and be RESTful you have to send credentials in every request. Various HTTP auth schemes handle that already. It is also important with regard to caching because the HTTP Authorization header has special specified semantics to caches (no public caching). If you stuff your credentials into a cookie, you loose that important piece.
All HTTP status codes have a certain application semantic. Use them, do not tunnel your own error semantics through HTTP.
You can come visit #rest IRC or join rest-discuss on Yahoo for detailed discussions.
Jan