Testing HATEOAS URLs - rest

I'm developing a service that has a RESTful API. The API is JSON-based and uses HAL for HATEOAS links between resources.
The implementation shouldn't matter to the question, but I'm using Java and Spring MVC.
Some example requests:
GET /api/projects
{
"_links" : {
"self" : {
"href" : "example.org/api/projects"
},
"projects" : [ {
"href" : "example.org/api/projects/1234",
"title" : "The Project Name"
}, {
"href" : "example.org/api/projects/1235",
"title" : "The Second Project"
} ]
},
"totalProjects" : 2,
}
GET /api/projects/1234
{
"_links" : {
"self" : {
"href" : "example.org/api/projects/1234"
},
"tasks" : [ {
"href" : "example.org/api/projects/1234/tasks/543",
"title" : "First Task"
}, {
"href" : "example.org/api/projects/1234/tasks/544",
"title" : "Second Task"
} ]
},
"id" : 1234,
"name" : "The Project Name",
"progress" : 60,
"status" : "ontime",
"targetDate" : "2014-06-01",
}
Now, how should I test GET requests to a single project? I have two options and I'm not sure which one is better:
Testing for /api/projects/{projectId} in the tests, replacing {projectId} with the id of the project the mock service layer expects/returns.
Requesting /api/projects/ first then testing the links returned in the response. So the test will not have /api/projects/{projectId} hardcoded.
The first option makes the tests much simpler, but it basically hardcodes the URLs, which is the thing HATEOAS was designed to avoid in the first place. The tests will also need to change if I ever change the URL structure for one reason or another.
The second option is more "correct" in the HATEOAS sense, but the tests will be much more convoluted; I need to traverse all parent resources to test a child resource. For example, to test GET requests to a task, I need to request /api/projects/, get the link to /api/projects/1234, request that and get the link to /api/projects/2345/tasks/543, and finally test that! I'll also need to mock a lot more in each test if I test this way.
The advantage of the second option is that I can freely change the URLs without changing the tests.

If your goal is testing a Hypermedia API, then your testing tools need to understand how to process and act on the hypermedia contained in a resource.
And yes, the challenge is how deep you decide to traverse the link hierarchy. Also, you need to account for non-GET methods.
If these are automated tests a strategy would be to organize the tests in resource units. Only test the links returned in the resource under test: a module for projects, and others for project, tasks, task, and so on. This does require some hard-coding of well-known URLs for each module, but allows you to manage the tests more easily around your resource model.

I don't know about HATEOAS . But what I can say.
You may try a swat - a perl,curl based DSL for web, rest services test automation. Swat was designed to simplify URL "juggling" you probably talking about here. A quick reference for how this could be done by SWAT ( a strait forward way, but there are more elegant solutions ):
$ mkdir -p api/project/project_id
$ echo '200 OK' > api/project/project_id/get.txt
$ nano api/project/project_id/hook.pm
modify_resource(sub{
my $r = shift; # this is original rout api/project/project_id/
my $pid = $ENV{project_id};
$r=~s{/project_id}{$pid} # dynamically setup route to api/project/{project_id}
return $r;
});
$ project_id=12345 swat http://your-rest-api # run swat test suite!
A more complicated examples could be found at the documentation.
(*) Disclosure - I am the tool author.

If you use Spring HATEOAS you can use ControllerLinkBuilder (http://docs.spring.io/autorepo/docs/spring-hateoas/0.19.0.RELEASE/api/org/springframework/hateoas/mvc/ControllerLinkBuilder.html) for link creation in your tests as described in http://docs.spring.io/spring-hateoas/docs/0.19.0.RELEASE/reference/html/#fundamentals.obtaining-links. With ControllerLinkBuilder, there is no hard-coded URL-s.
ControllerLinkBuilderUnitTest.java (https://github.com/spring-projects/spring-hateoas/blob/4e1e5ed934953aabcf5490d96d7ac43c88bc1d60/src/test/java/org/springframework/hateoas/mvc/ControllerLinkBuilderUnitTest.java) shows how to use ControllerLinkBuilder in tests.

Related

When creating a REST resource, must the representation of that resource be used?

As an example imagine a dynamic pricing system which you can ask for offers moving boxes from 1 place to another. Since the price does not exist before you ask for it it's not as simple as retrieving data from some data-base, but an actual search process needs to run.
What I often see in such scenario's is a request/response based endpoint:
POST /api/offers
{
"customerId" : "123",
"origin" : {"city" : "Amsterdam"},
"destination" : {"city" : "New York"},
"boxes": [{"weight" : 100},{"weight": "200"}]
}
201:
{
"id" : "offerId_123"
"product" : {
"id" : "product_abc",
"name": "box-moving"
}
"totalPrice" : 123.43
}
The request has nothing to do with the response except that one is required to find all information for the other.
The way I interpret "manipulation of resources through representations" I think that this also goes for creation. Following that I would say that one should create the process of searching in stead:
POST /api/offer-searches
{
"request" : {
"customerId" : "123",
"origin" : {"city" : "Amsterdam"},
"destination" : {"city" : "New York"},
"boxes": [{"weight" : 100},{"weight": "200"}]
}
}
201:
{
"id" : "offerSearch_123"
"request" : {
"customerId" : "123",
"origin" : {"city" : "Amsterdam"},
"destination" : {"city" : "New York"},
"boxes": [{"weight" : 100},{"weight": "200"}]
}
offers: [
"id" : "offerId_123"
"product" : {
"id" : "product_abc",
"name": "box-moving"
}
"totalPrice" : 123.43
]
}
Here the request and the response are the same object, during the process it's enhanced with results, but both are still a representation of the same thing, the search process.
This has the advantage of being able to "track" the process, by identifying it it can be read again later. You could still have /api/offers/offerId_123 return the created offer to not have to go through the clutter of the search resource. But it also has quite the trade-off: complexity.
Now my question is, is this first, more RPC like approach something we can even call REST? Or to comply to REST constraints should the 2nd approach be used?
Now my question is, is this first, more RPC like approach something we can even call REST? Or to comply to REST constraints should the 2nd approach be used?
How does the approach compare to how we do things on the web?
For the most part, sending information to a server is realized using HTML forms. So we are dealing with a lot of requests that look something like
POST /efc913bf-ac21-4bf4-8080-467ca8e3e656
Content-Type: application/x-www-form-urlencoded
a=b&c=d
and the responses then look like
201 Created
Location: /a2596478-624f-4775-a490-09edb551a929
Content-Location: /a2596478-624f-4775-a490-09edb551a929
Content-Type: text/html
<html>....</html>
In other words, it's perfectly normal that (a) the representations of the resource are not the information that was sent to the server, but intead something the server computed from the information it was sent and (b) not necessarily of the same schema as the payload of the request... not necessarily even the same media type.
In an anemic document store, you are more likely to be using PUT or PATCH. PATCH requests normally have a patch document in the request-body, so you would expect the representations to be different (think application/json-patch+json). But even in the case of PUT, the server is permitted to make changes to the representation when creating its resource (to make it consistent with its own constraints).
And of course, when you are dealing with responses that contain a representation of "the action", or representations of errors, then once again the response may be quite dissimilar from the request.
TL;DR REST doesn't care if the representation of a bid "matches" the representation of the RFP.
You might decided its a good idea anyway - but it isn't necessary to satisfy REST's constraints, or the semantics of HTTP.

API call in Response and another response types

How can I call API url (Fulfillment or Webhook as named by API.AI) from Watson conversation API Response.
I don't need to enter the full list of expected responses from the Response section.
I need to call an API with the understood JSON object to handle the response from the backend (fetch DB for example) and return the expected JSON to the user (requester).
Any advice?!
Not sure I understand your question, you cannot program within the Conversation service tooling to call out to other services etc, as part of a response or message.
These are the types of actions that the middleware or service layer part of your application should handle. Not to be recommended but you could program the client element of your application to process additional api calls.
You will find both on this forum, and in the IBM docs examples of the use of whats called "action" json elements that can be added to your conversation response payload. Thus along with the response output text ( or in place ) you add an "action" json element to the output and context json object that includes instructions to your middleware or client part of your application. i.e.
"output" : { "Text" : "Hi there" },
"action" " { "api_url" : "http://bluemixservice.ibm.com", "Task" : "insert", "data" : "User asked for an alarm at 5pm" },
"context" : { "conversation_id" : "asdada" }
Hope this helps.

RESTful API and real life example

We have a web application (AngularJS and Web API) which has quite a simple functionality - displays a list of jobs and allows users to select and cancel selected jobs.
We are trying to follow RESTful approach with our API, but that's where it gets confusing.
Getting jobs is easy - simple GET: /jobs
How shall we cancel the selected jobs? Bearing in mind that this is the only operation on jobs we need to implement. The easiest and most logical approach (to me) is to send the list of selected jobs IDs to the API (server) and do necessary procedures. But that's not RESTful way.
If we are to do it following RESTful approach it seams that we need to send PATCH request to jobs, with json similar to this:
PATCH: /jobs
[
{
"op": "replace",
"path": "/jobs/123",
"status": "cancelled"
},
{
"op": "replace",
"path": "/jobs/321",
"status": "cancelled"
},
]
That will require generating this json on client, then mapping it to some the model on server, parsing "path" property to get the job ID and then do actual cancellation. This seems very convoluted and artificial to me.
What is the general advice on this kind of operation? I'm curious what people do in real life when a lot of operations can't be simply mapped to RESTful resource paradigm.
Thanks!
If by cancelling a job you mean deleting it then you could use the DELETE verb:
DELETE /jobs?ids=123,321,...
If by cancelling a job you mean setting some status field to cancelled then you could use the PATCH verb:
PATCH /jobs
Content-Type: application/json
[ { "id": 123, "status": "cancelled" }, { "id": 321, "status": "cancelled" } ]
POST for Business Process
POST is often an overlooked solution in this situation. Treating resources as nouns is a useful and common practice in REST, and as such, POST is often mapped to the "CREATE" operation from CRUD semantics - however the HTTP Spec for POST mandates no such thing:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. For example, POST is used for the following functions (among others):
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process;
Posting a message to a bulletin board, newsgroup, mailing list, blog, or similar group of articles;
Creating a new resource that has yet to be identified by the origin server; and
Appending data to a resource's existing representation(s).
In your case, you could use:
POST /jobs/123/cancel
and consider it an example of the first option - providing a block of data to a data handling process - and is analogous to html forms using POST to submit the form.
With this technique, you could return the job representation in the body and/or return a 303 See Other status code with the Location set to /jobs/123
Some people complain that this looks 'too RPC' - but there is nothing that is not RESTful about it if you read the spec - and personally I find it much clearer than trying to find an arbitrary mapping from CRUD operations to real business processes.
Ideally, if you are concerned with following the REST spec, the URI for the cancel operation should be provided to the client via a hypermedia link in your job representation. e.g. if you were using HAL, you'd have:
GET /jobs/123
{
"id": 123,
"name": "some job name",
"_links" : {
"cancel" : {
"href" : "/jobs/123/cancel"
},
"self" : {
"href" : "/jobs/123"
}
}
}
The client could then obtain the href of the "cancel" rel link, and POST to it to effect the cancellation.
Treat Processes as Resources
Another option is, depending on if it makes sense in your domain, to make a 'cancellation' a noun and associate data with it, such as who cancelled it, when it was cancelled etc. - this is especially useful if a job may be cancelled, reopened and cancelled again, as the history of changes could be useful business data, or if the act of cancelling is an asynchronous process that requires tracking the state of the cancellation request over time. With this approach, you could use:
POST /jobs/123/cancellations
which would "create" a job cancellation - you could then have operations like:
GET /jobs/123/cancellations/1
to return the data associated with the cancellation, e.g.
{
"cancelledBy": "Joe Smith",
"requestedAt": "2016-09-01T12:43:22Z",
"status": "in process"
"completedAt": null
}
and:
GET /jobs/123/cancellations
to return a collection of cancellations that have been applied to the job and their current status.
Example 1: Let’s compare it with a real-world example: You go to a restaurant you sit at your table and you choose that you need ABC. You will have your waiter coming up and taking a note of what you want. You tell him that you want ABC. So, you are requesting ABC, the waiter responds back with ABC he gets in the kitchen and serves you the food. In this case, who is your interface in between you and the kitchen is your waiter. It’s his responsibility to carry the request from you to the kitchen, make sure it’s getting done, and you know once it is ready he gets back to you as a response.
Example 2: Another important example that we can relate is travel booking systems. For instance, take Kayak the biggest online site for booking tickets. You enter your destination, once you select dates and click on search, what you get back are the results from different airlines. How is Kayak communicating with all these airlines? There must be some ways that these airlines are actually exposing some level of information to Kayak. That’s all the talking, it’s through API’s
Example 3: Now open UBER and see. Once the site is loaded, it gives you an ability to log in or continue with Facebook and Google. In this case, Google and Facebook are also exposing some level of users’ information. There is an agreement between UBER and Google/Facebook that has already happened. That’s the reason it is letting you sign up with Google/ Facebook.
PUT /jobs{/ids}/status "cancelled"
so for example
PUT /jobs/123,321/status "cancelled"
if you want to cancel multiple jobs. Be aware, that the job id must not contain the comma character.
https://www.rfc-editor.org/rfc/rfc6570#page-25

Is HTTP 303 considered harmful for asynchronous operations?

While researching RESTful APIs for asynchronous operations I ran across the following design pattern:
POST uri:longOperation returns:
HTTP 202
Location: uri:pendingOperation
GET uri:pendingOperation returns:
If operation is running
Return a progress report.
If operation is complete
HTTP 303
Location: uri:operationResponse
GET uri:operationResponse
The response of the asynchronous operation
I find the last step questionable. Consider what happens if the asynchronous operation completes with an error code that doesn't make sense for HTTP GET, such as HTTP 409 ("Conflict").
Isn't HTTP 303 required to point to the response associated with uri:pendingOperation as opposed to uri:operationResponse?
Is using HTTP 303 in this way considered harmful? If not, why?
Is this the best we can do, or is there a better way?
Isn't HTTP 303 required to point to the response associated with uri:pendingOperation as opposed to uri:operationResponse?
The spec doesn't explicitly say it is required, but I tend to agree with you.
Is using HTTP 303 in this way considered harmful? If not, why?
I think you lose capabilities by doing a 303. While it is "nice" to auto-redirect when done, it makes it so that you don't have an opportunity to provide meta data around the results, that can be leveraged for reporting etc... Also many clients don't auto follow 303, so the client may need to do work to follow the 303 Location header anyways.
Is this the best we can do, or is there a better way?
I tend to recommend having the GET uri:pendingOperation return 200 with a status resource always with a reference to the output when it is "complete". Something like
When Incomplete
{
"status" : "PENDING"
}
When Error
{
"status" : "COMPLETE"
"errors" : [
{
"typeId" : "OPERATION_TIMEOUT",
"description" : " "The request was unable to complete because the systems are unresponsive".
}
]
}
When Successful
{
"status" : "COMPLETE"
"links" : {
"result" : {
"href" : "http://api.example.com/finished-resource/1234",
}
]
}

REST design for API accessing multiple resources

Imagine an API that returns JSON data for a TV listings app like zap2it TV listings.
It's basically a list of TV channels and for each channel the shows that are on currently and beyond. Currently, I have an API that returns all the channels GET /channels. However, there is a need to add the show currently on for each channel in that data. I am thinking of adding a new API, GET /channels/on_now, to differentiate it from the current API. I want to be clear about this for the new API, I don't want to make individual call for each channel, the show-on-now data needs to be returned for all channels. Is this a good REST API design?
Current GET /channels JSON data
[
"channel": {
"channelName": "KRON4",
},
"channel": {
"channelName": "KTOV5",
},
...
]
Expected JSON data for new API GET /channels/on_now below
[
{
"channel": {
"channelName": "KRON4",
},
"on_now": {
"startTime": "2012-06-04T11:30:00",
"endTime": "2012-06-04T12:00:00",
"shortDescription": "Latest local, statewide & national news events, along with sports & weather.",
"shortTitle": "4:30am Newscast"
}
},
{
"channel": {
"channelName": "KTOV5",
},
"on_now": {
"startTime": "2012-06-04T11:30:00",
"endTime": "2012-06-04T12:30:00",
"shortDescription": "Local morning news and weather report",
"shortTitle": "Morning Newscast"
}
},
...next channel...
]
I would advice to concentrate on content, not on URLs.
Example: you've got an entry point, '/'. This is the only URL in the API. GET on it return st like
{
"channels" : {
"href" : "path.to/channels"
},
"programs" : {
"href" : "path.to/programs"
}
}
To retrieve the list of channels, you GET on the corresponding URL - which you then don't need to know before - and obtain, for example:
[
{
"name" : "BBC",
"id" : 452,
"href" : "path.to/channels/452"
},
{
"name" : "FOO",
"id" : 112,
"href" : "path.to/channels/112"
}
]
For detailled information about BBC, you GET on the provided URL:
{
"name" : "BBC",
"id" : 452,
"self" : "path.to/channels/452",
"live_url" : "link.to.bbc.cast",
"whatever" : "bar",
"current" : "path.to/channels/452/current",
"program" : "path.to/channels/452/program"
}
And so on. URLs are discovered on the fly; you are free to modify them anytime. What makes your API is the content: you have to agree with clients about what is returned (fields, types, ...).
You finally call the "current" URL above to obtain information about current program.
Read here for more: http://kellabyte.com/2011/09/04/clarifying-rest/
Edit after OP-comment:
You could introduce an 'embed' parameter so as to limit amount of requests:
GET path.to/channels/452?embed=current
would return:
{
"name" : "BBC",
"id" : 452,
"self" : "path.to/channels/452",
"live_url" : "link.to.bbc.cast",
"whatever" : "bar",
"current" : {
"self" : "path.to/channels/452/current",
"name" : "Morning Show",
"start_time" : "(datetime here)",
"end_time" : "(datetime here)",
"next" : "whatever.comes.ne/xt"
},
"program" : "path.to/channels/452/program"
}
You asked:
Is this a good REST API design?
YES, it is.
Contrary to the other people who have answered, you are free to define any resource you want to, as long as it represents a noun. That includes time-dependent services such as "what's on TV now" or the perrenial example, "current weather in <city>". These service resources are just as valid as more static ones representing a show or channel.
I would however change the URI. /channels looks like a collection resource URI. I would expect it's children to be channels, such as /channels/kron4 (you can use any unique string, not jsut the ID, to identify instance resources).
As such, /channels/on_now looks odd. It looks like a channel called "on_now". Although there's nothing preventing you from using that, it may later conflict with a channel that is called "On Now"! I would simply use /on_now as your URI. /channels/kron4/on_now would obviously be good for a single channel's response too.
/Channels -----------------------> Get All Channels
/Channels/bbc ------------------> Get BBC Channel
/Channels/bbc/Shows -------------> Get All shows in BBC
/Channels/bbc/Shows/Baseball ----> Get the show called "Baseball", in bbc channel
/Channels/bbc/Shows/current -----> Get the Current show running, in bbc channel
Assuming you do not (and will not ) have a show called Current for any of your channels ! :) .
Just appending to the above answer:
/Channels/bbc/Shows/time/now -----> Get all the show played on BBC now
/Channels/bbc/Shows/time/2011-03-27T03:00:00.000+02:00 -----> Get all the show played on BBC on 2011-03-27T03:00:00.000+02:00 .
This is more extensible and you wont have to worry about any show with the name current.
EDIT:
You can get a good headstart of doing such thing if you can get an api-doc access over here https://developer.sdp.nds.com/page/about
As per me, there would be more data needed and api would be something like:
//epg?time=&start=0&limit=1&duration=
This would define a generic api to get the location based tv_listing information based on time and duration. Result would be paginated with all the show between the channel listing occuring in the given time span.
I'm no API expert, but I think you should be thinking in what you are returning instead of where 'looks like makes sense' to place the resource.
One solution will be to treat on_now as a resource.
so your api will be:
/channels (all channels)
/channels/{channel-id} (the {channel-id} channel - could be bbc and can have a collection of shows)
/channels/{channel-id}/shows (shows of channel-id)
/channels/{channel-id}/shows?filter=on_now (you are filtering a result, so i guess it's better to use query string, as if you were doing a query)
then you want to returns what's on now, that's no a property of the channel but a resource of itself. so how to implement that ?
/on_now/ (return a collection of on_now objects, which may be anything, channels, shows, whatever)
/on_now/?channel={channel-id} (this is a filter of the list by channel-id, you are just narrowing the list)
so isn't /channels/{channel-id}/shows?filter=on_now
the same as /on_now/?channel={channel-id} ?
actually, NO.
In the first uri you are getting shows filtered by a on_now.
In the second you are getting on_nows (which can be any representation, not exclusively a show) filtered by channel.
Why I think on_now should be treated as a resource and why is it important ?
While you make this resource separate, you can now have different representations of your resources. Also you have greater flexibility and no collision. Let's say tomorrow you want to show also in the on_now another 'show' that isn't on any channel, this can easily be done, on the other approachs it just has to be on a channel.
You can also later filter the on_now by different criteria, because they are independent objects.
You can also do:
/on_now/{on_now_id}
that will give details of the current show, like when it started, when it will end and also a place a location to /shows/{show-id} so you can reach it later after it's not on now anymore.
Yet, I think best solution would be to have shows as an unconnected resource to channel.
But the most important thing is, i think you should also need to ask yourself if you want shows to be underlying of channels...
And what hints to think of that is the
I don't want to make individual call for each channel, the
show-on-now data needs to be returned for all channels
part.
That leads me to think that shows should NOT be inside the /channels/ path.
That's because another approach will be to have /shows/?filter=on_now if you are only returning shows.
you can have:
/shows/?filters=on_now&channel=bbc
I like to think of resources as the 'thing' i'm returning instead of the standard thinking of relations alone. Underlying in the graph is great for properties, not so sure about collection of 'other things'.
Following the same example, I would rather have /channels/{channel-id}/program instead of /channels/{channel-id}/shows