Apiary multiple response - apiary.io

I'm documenting my api with Apiary and I'm using the old syntax of the blueprint.
I read somewhere about the separation of different responses with "+++++", but I do not know how to tell when apiary to use my first or second response.

This is currently not possible, first response is always used.
Support for this will be landing soon, based on status code, content type and prefer header match. See http://support.apiary.io/forums/120125-general/suggestions/3264498-match-on-request-headers for more details.

Related

How should RESTFUL API for tasks look like if I need two GETs?

I want to design a RESTful API for a website scraping service. A user delegates a task to the service. Each task is a website that has to be scaped. User can check tasks' statuses. When a task is done a user can fetch a task result.
The status can be either "Waiting", "In progress" or "Done", when it is done a user can get a data.
What I have now is:
POST /tasks - post a URL to scrape
GET /tasks - returns a list of tasks
I need two more endpoints: one to get a status of a task and one to get scraped data from a website. How should GET look like?
GET /tasks/{id} - return a status? Or return the data?
Or maybe
GET /tasks/{id}/status
GET /tasks/{id}/data
But what would return /tasks/{id}/ then?
And what if I would also like to present scapped data as html?
Should I use
GET /tasks/{id}/data or GET /tasks/{id}/result
POST /tasks - post a URL to scrape
GET /tasks - returns a list of tasks
That's good. Notice that when you POST successfully, cache invalidation kicks in. Generic clients will know that the previously returned representation(s) of the list of tasks is no longer valid.
GET /tasks/{id} - return a status? Or return the data?
Why not both? /tasks/{id} identifies a resource; you can use any sort of representation you like for it. There's no reason that the representation shouldn't include optional elements.
(Herustic: what would the web page look like? Do you really feel like there need to be two different we pages for this one concept? If not, then it can probably be a single resource in your API.)
what if I would also like to present scapped data as html?
Same identifier is fine for multiple representations; the client can use the Accept header to describe its preferences to the server.
You may want to give some thought to the problem of how the client knows what representations are possible. On the web, the specification for HTML describes a number of different kinds of links -- browsers can state different preferences when they encounter a script tag or an image tag, for example. You'll want something similar in your own media types.
There is nothing wrong with deciding that these should all be different resources, too. Either approach can be implemented in a way that is consistent with the REST architectural style.
I don't really know the constraints but GET /tasks/{id} could return both status and data if available.
If you prefer not to (for example, if getting data too often would be a problem from a performance perspective), it seems sensible to have:
GET /tasks/{id} #returns status and other plain task fields
and then:
GET /tasks/{id}/scrappeddata #returns data
Why? Because, that way is probably the most consistent with your model (and/or the mental model in the mind of your API users).
The general rules on resource naming given in Rest API tutorial are helpful: https://www.restapitutorial.com/lessons/restfulresourcenaming.html
There are no hard rules when it comes to naming routes for a RESTFUL api.
You can adhere to a convention, know best practices, advice from SO, but at the end of the day, you're the one designing your API, so you know better than anyone else what would fit your particular use case.
Search for "rest api naming best practices" or "how to structure rest api routes" and you'll get plenty of ideas.
The 2 suggestions me and #jonrsharpe made are both valid, it's up to you to define what makes sense for your project.

Is there a standard way to define CURIEs in HTTP header fields?

Recently, I've designing a RESTful API, and I want to use the Link header field to implement HATEOAS.
This is all fine and works without any real problem, but I want to make it easier for the clients of the API.
The Link header for example might look like this:
Link: <https://api.domain.com/orders/{id}>; rel="https://docs.domain.com/order"
In this case the client would have to find the link by searching for the https://docs.domain.com/order value in the rel attribute.
This works, but since the URI for the docs could change it's fragile, and because of the length it makes it a bit impractical as a key to search for.
So I want to try and use a CURIE to make it something like this instead:
Link: <https://api.domain.com/orders/{id}>; rel="[rc:order]"
That way the problem of a URI changing is mitigated for the most part, and it's much more compact allowing for easier search by the client.
My problem is, that since I'm using a Link header field to implement HATEOAS I feel it would be most consistent to also include the CURIE as an HTTP header field, rather than introducing meta data in the response body.
Since the entire API uses standard HTTP header fields and status codes for all of it's meta data (pagination, versioning etc), I would rather not introduce meta data into the response body just to define a CURIE.
But if I use HTTP header fields, which field should I use for a CURIE?
Is there even a standard way to do this with HTTP header fields?
If not, should I just use a custom header field, like X-Curie: <https://docs.domain.com>; name="rc", and just document it somewhere for clients?
I've looked around and most resources are either in reference to XML or the HAL standard, so any information on this in relation to HTTP header fields would be appreciated.
No, you can't do that. The link relation is a string - nothing more.
The question that you should ask yourself is why you are using an unstable link relation name in the first place...
Even if you don't use the Link header, CURIE's would not solve the problem you present. Because the CURIE's standard state that a shortened URI must be "unwrapped" before any comparison is performed. This would also apply to comparison agains the link relation in question.
A more pragmatic approach would be to coin your own URI like foo:order. Then you can use some custom url shortening method of resolving the documentation url for the relation in question. This method is used by hypermedia formats like HAL+JSON (the HAL formats use of curies is actually misleading, it should only be used as a method for resolving URL's to documentation).
CURIEs in HTTP Link header's rel properties would not get expanded, because all rel properties are equated with simple string matches, none are treated as URIs.
My main concern would be the phrase "since the URI for the docs could change it's fragile" — then choose a URI which won't change. Even if you use a URL that redirects to some location deep in the docs, the URI you choose for the link relation needs to be permanent if you want client devs to be able to resolve it.

Hypermedia Restful API using Link Header and Range

I'm trying to develop a RESTful api using the hypertext as an engine of application state principals.
I've settled on using the Link header (RFC5988) for all 'state transitions', it seems natural to place links there, and doesn't make my response types specific on an implementation (eg XML/json/etc all just work).
What I'm struggling with at the moment is the pagination of a collection of resources. In the past I've used the Range header to control this, so the client can send "Range: MyObjects=0-20" and get the first 20 back. It seems natural to want to include a "next" relation to indicate the next 20 items (but maybe it isn't), but I'm unsure how to do it.
Many examples include the range information as part of the URI. eg it would be
Link: <http://test.com/myitems?start=20&end=40>;rel="next"
Using the range header would I do something like the following?
Link: <http://test.com/myitems;rel="next";range="myitems=20-40"
The concern here is that the link feels non-standard. A client would have to be checking the range header to get the next items.
The other thing is, would I just leave this all as somewhat out-of-band information?. Never showing next/previous ranges (as that sort of depends on what the client is doing), and expect the client to just serialize down what it needs when it needs it?. I can use the "accepts-ranges" link hints in the initial link to the collection to let the client know its 'pageable'
eg something like
OPTIONS http://test.com
-> Link:"<http://test.com/myitems";rel="http://test.com/rels/businessconcept";accepts-ranges="myitem""
Then the client would go, oh it accepts ranges, I will only pull down the first X, and then go for the next range as necessary (this sort of feels like implicit states though).
I can seem to figure out what is really in the best spirit of HATEOAS here.
The link header spec doesn't do exactly what you want AFAIK. It has no concept of identifying client supplied parameters. The rel doc's for your link can and should specify how to apply your range implementation.
The link-template header spec https://datatracker.ietf.org/doc/html/draft-nottingham-link-template-00#page-3 does a lot of what you want, but that expired. This specified how to use https://www.rfc-editor.org/rfc/rfc6570 style templates in headers, and if you want to use headers that the direction i'd suggest.
Personally, although I agree having the links in the header abstracts you from the body's content type, i find the link header to be difficult to develop against as you can't just paste a url in a browser and see links immediately.

REST API - how does the client know what a valid payload is to POST to the resource?

One of the goals of the REST API architecture is decoupling of the client and the server.
One of the questions I have run across in planning a REST API is: "how does the client know what is a valid payload for POST methods?"
Somehow the API needs to communicate to the UI what a valid payload for a given resource’s POST method. Otherwise here we are back at depending on out-of-band knowledge being necessary to work with an API and we are tightly coupled again.
So I’ve had this idea that the API response for a GET on a resource would provide a specification for constructing a valid payload for the POST method on that resource. This would include field names, data type, max length, etc.
This guy has a similar idea.
What's the correct way to handle this? Are most people just relying on out-of-band information? What are people doing in the real world with this problem?
EDIT
Something I have come up with to solve this problem is illustrated in the following sequence diagram:
The client and the api service are separate. The client knows:
Entry point
How to navigate the API via the hypermedia.
Here's what happens:
Someone (user) requests the registration page from the client
The client requests the entry point from the API and receives all hypermedia links with appropriate meta data on how to traverse them legally.
Client constructs the registration form based on the meta data associated with the registration hypermedia POST method.
User fills in the form and submits.
Client POSTs to the API with the correct data and all is well.
No magic /meta resouces, no need to use a method for the meta data. Everything is provided by the API.
Thoughts?
Most people are relying on out-of-band information. This is usually ok, though, because most clients aren't being built dynamically, but statically. They rely on known parts of the API rather than being HATEOAS-driven.
If you are developing or want to support a metadata-driven client, then yes, you're going to need to come up with a schema for providing that information. The implementation you linked to seems reasonable after a quick skim. Note that you've only moved the problem, though. Clients still need to know how to interpret the information in the metadata responses.
Your are right, the client should understand the semantics of the links in the response, and choose the right one from them to achieve its goal. The client is coupled to the semantics the API provides about this and not to the API itself. So for example a client should not retrieve information from the URI structure, since it is tightly coupled to the actual API.
I know of 2 current solution types about this:
by HAL+JSON you use IANA link relations to describe what the link does, and vendor specific MIME types to describe the schema of the fields
by JSON-LD (or any other RDF format) with Hydra vocab you send back RDF metadata according to the operation the link calls. This meta-data can contain the validation details of the fields (xsd vocab) and the semantics of the fields (microdata, microformats, etc...). This information is completely decoupled from the API implementation, so it might be a better option than using vendor specific MIME types, but Hydra is still under development and HAL is much simpler.
However your solution is valid as well, I think you should check both of these, since they are already standard solutions, and the uniform interface / self-descripting message constraint of REST encourages the usage of existing standards instead of custom solutions. But it is up to you if you want to create an own standard.
I think you are asking about, Rest API meta data handling. Unlike SOAP, Rest APIs doesn't use meta data normally, but sometimes it can be pretty useful, once your api size gets bigger.
I think you should look into swagger. It is the most elegant you can find out for rest apis. I have being using it for sometime and with the annotation support it is being rather easy to work with. It also has many examples found on github. Other advantage is, it contains nice configurable ui.
Apart from that you can find other ways of doing it like WADL and WSDL 2.0. Even-though I haven't being using them, you can read more about them here.
With RFC 6861, you can link to your form with create-form and edit-form Link Relations, instead of the client constructing the form by itself. The corresponding form should have the necessary schema to construct the POST request.

In REST, how should a GET request to a findAll operation be handled when the Resources are paged?

In a RESTful Service, Resources that cannot all be retrieved at once are paginated. For example:
GET /foo?page=1
The question is, how should I handle a getAll request such as:
GET /foo
Taking discoverability/HATEOAS into consideration, I see a few options:
return a 405 Method Not Allowed and include a Link header to the first page:
Link=<http://localhost:8080/rest/foo?page=0>; rel=”first“
return a 400 Bad Request and include the Link header (same as above)
return a 303 See Other to the first paginated page
return a 200 OK but actually return only the first page (and include the URI of the next page into the Link):
Link=<http://localhost:8080/rest/foo?page=1>; rel=”next“
note: I would rather not do this, having learned not to manage anything for the client by default, if they haven't explicitly asked for it.
These are of course only a few options. I'm leaning towards the first, but I'm not sure if there is a best practice on this that I am not aware of.
Any feedback is appreciated.
Thanks.
Lets start with the fact that REST is not a set-in-stone protocol like SOAP, it's simply a means of structuring a service, similar to how languages are described as being Object-Oriented.
That all being said, I'd recommend the handling this as follows.
Treat a RESTful call like a function declaration.
GET /foo
foo()
Some functions require parameters.
GET /foo?start=??&count=??
foo(start, count)
Some languages support default parameters, others don't; you get to decide for yourself how you want to handle parameters.
With default parameters, you could assume that the function was defined as
foo(start = 0, count = 10)
so that a call to GET /foo would actually be equivalent to GET /foo?start=0&count=10, whereas a call to GET /foo?start=100 would be equivalent to GET /foo?start=100&count=10.
If you don't want default parameters, you could force the user of the API to explicitly set start and count:
foo(start, count)
so that a call to GET /foo would return a 400 Bad Request status code, but a call to GET /foo?start=0&count=10 would return a 200 OK status code along with the content contained by the specified range.
In either case you'll have to decide how you'll handle errors, such as
GET /foo?start=-10&count=99999999
If parameters have maximums and minimums, you'll need to decide whether to normalize the parameters, or simply return errors. The previous example might return a 400 Bad Request status code, but it could also be constrained to turn into:
GET /foo?start=0&count=1000
In the end it's up to you to decide what makes the most sense in the context of your application.
From a RESTful point of view, I think it perfectly alright to handle both representations the same. Consider a software with several versions you want to download, the latest one being 3.8. So if you want to get the latest version, you could address it with both GET /software/version/latest.zip and GET /software/version/3.8.zip until there comes a newer version. So two different links point to the same resource.
I like to imagine pagination pretty much the same. On the first page there are always the latest articles. So if no page-parameter is provided, you could simply imply it's 1.
The approach with the rel attribute goes in a slightly different direction. It's a creation of Google to better handle the problem with duplicate content and is primarily considered to be used in order to distinguish between a "main" page and pagination-pages. Here's how to use it:
//first page:
<link rel="next" href="http://www.foo.com/foo?page=2" />
//second page:
<link rel="prev" href="http://www.foo.com/foo?page=1" />
<link rel="next" href="http://www.foo.com/foo?page=3" />
//third and last page:
<link rel="prev" href="http://www.foo.com/foo?page=2" />
So from a SEO point of view it's a good idea (and recommended by Google) to use those elements. They also go perfectly with the resource-orientated idea of REST and the hypermedia representation of the resources.
Choosing one of your suggestions, I think the 303 See Other is the right way to go. It was intended to be used for this kind of purposes and is a good way to canonicalize your resources. You can make them available through many URIs, but have one "real" URI for a representation (like the software with different versions).
According to the specification, the response should look something like this:
303 See Other
Location: http:www.foo.com/foo?page=1
http:www.foo.com/foo?page=1
So you provide a Location-header with the "real" representation, and the body should contain a hypertext document linking to the new URI. Note that according to the specification the client is expected to send a GET request to the value of Location, but it doesn't have to.
//EDIT as answer to your comment (yep, it's really bad practice to claim something without proving it :-) - my bad!):
Google presented the rel="next" and rel="prev" attributes in September 2011 on the Official Webmaster Central Blog. They can be used additionally to (or in some cases instead of) the rel="canonical" tag.
Under those links you can find the differences between them explained:
rel="next" and rel="prev" link elements are "to indicate the relationship between component URLs in a paginated series"
the rel="canonical" "allows you to publicly specify your preferred version of a URL"
So there is a slight difference between them. So you can break down your problem to a canonical issue: There are several URLs pointing to the same resource (/foo and foo?page=1 but you have a preferred version of the URL (foo?page=1). So now there are a few options for a RESTful approach:
If there is no page-parameter given in the query, use a default value (e.g. 1) when processing it. I think in this specific case it is OK to use a default value even though you point it out as bad practice.
Respond with 303 See Other providing the preferred URL in the Location-header (as described above). I think a 3xx-response is the best (and most likely RESTfully intended) way to deal with duplicate/canonical content.
Respond with 400 Bad Request in case you want to force the client to provide a page-parameter (as explained by zzzzBov in his answer). Note that this response does not have something like a Location header (as assumed in your question), so the explanation why the request failed and/or the correct URL (if given) must go to the entity-body of the response. Also, note that according to the specification this response is commonly used when the client submits a bad/malformed representation (! not URL !) along with a PUT or POST request. So keep in mind that this also might be a little ambiguous for the client.
Personally, I don't think your suggestion to respond with 405 Method Not Allowed is a good idea. According to the specification, you must provide an Allow-header listing the allowed methods. But what methods could be allowed on this resource? I can only think of POST. But if you do not want the client to POST to it either, you could also respond with 403 Forbidden with an explanation why it is forbidden, or 404 Not Found if you do not want to tell why it is forbidden. So it might be a little ambiguous, too (in my opinion).
Using link-elements with the mentioned rel-attributes as you propose in your question is not essentially 'RESTful' because it's only hypermedia which is settled in the representation of the resource. But your problem (as far as I understand it) is that you want to decide how to respond to a specific request and which representation to serve. But still it's not absolutely pointless:
You can consider the whole SEO issue as a side effect of using rel="next/prev/canonical", but keep in mind that they also create connectedness (as the quality of having links) which is one of the characteristics of REST (see Roy Fielding's dissertation).
If you want to dive into RESTful Web Services (which is totally worth it) I recommend reading the book RESTful Web Services by Leonard Richardson and Sam Ruby.
In some cases not implicitly managing anything for the client can lead to a overlay complex interface, examples would be where the consumer isn't technical or isn't intending on building on top of interface, for example in a web page. In such cases even a 200 may be appropriate.
In other cases I would agree implicit management would be a bad idea as the where a consumer would want to be able to predict the response correctly and where a simple specification may be required. In such cases 405, 400 and 303.
It's a matter of context.