design pattern for dependent resource in REST

design pattern for dependent resource in REST - rest

I am developing specs doc for resource URIs. Most everything is fairly well discussed around on the netz, and is all very helpful. However, I am a bit stuck on the pattern for a dependent resource. So, a dependent resource is something that exists at the pleasure of its parent resource. And, if the parent ceases to exist then the dependent also goes away. So, if I have books, a dependent resource would be the count of books. For any given query, if there are no books then there will be no count. Which is different from, say, an author... you could have no books, but still have authors. Ok. So I have something like this URI and the returned data
http://example.com/books.json?author=Homer
{"books": [
{"id": 33, "title": "Iliad", "author": "Homer", "pubyear": "800 BC"},
{"id": 33, "title": "Odyssey", "author": "Homer", "pubyear": "750 BC"}
]}
The URI ends in the plural version of the common noun, and the QUERY_STRING is used to filter the return set. The root node in the return "hash" is the common noun that was queried, and its key is an array each element of which is a hash with key/value pairs.
For the count, my instinct is to do the following
http://example.com/books/count.json?author=Homer
{"books": [
{"count": 2}
]}
or even
http://example.com/books/stats.json?author=Homer
{"books": [
{"stats": {
"count": 2,
"units": 10,
"sold": 3
}
]}
But, it seems the correct way really should be
http://example.com/books.json/count?author=Homer or
http://example.com/books.json?aggregate=count&author=Homer
any suggestions, thoughts?

The reason both seem to feel weird is that you are mixing the content type and the content identifier by putting ".json" on it. The content type should be in the request's "Accept" header. If you eliminate the ".json", the two possibilities you are considering reduce to the same thing.
That's a purist answer. If for some reason you must use the extension (framework or client limitations), then putting the extension on the last path element is more standard.

Related

REST: HTTP PUT - Parent with Child Entities

Is there a standard/convention in REST that dictates the expected behavior with respect to Child entities when I use an HTTP PUT on Parent record?
For example, the initial state of my Parent object is:
{
"id": 1,
"children": [
{"id": 1, ...},
{"id": 2, ...},
{"id": 3, ...}
],
...
}
And then I perform an HTTP PUT on /parents:
{
"id": 1,
"children": [
{"id": 2, ...}, // I changed a property in here
],
...
}
I would be inclined to update the Parent, and the Child with id 2, but are Children with id's 1 and 3 supposed to be deleted or not?

Is there a standard/convention in REST that dictates the expected behavior with respect to Child entities when I use an HTTP PUT on Parent record?
No
REST doesn't have "entities" or "records". It has "resources".
REST doesn't have "children". Common identifier spellings do not imply a relationship between two resources.
PUT /parents HTTP/1.1
Content-Type: application/json
{
"id": 1,
"children": [
{"id": 2, ...}, // I changed a property in here
],
...
}
What this message means is "make the representation of the resource /parents match the body of this message". In other words, save my copy of this document on top of your document.
In this case, it says that there should be exactly one entry in the children array, with id: 2.
How the server does that is an implementation detail hidden behind the REST facade. The message only describes what the client wants, not what the client gets. The server owns its own resources, and has a lot of freedom to choose how to modify them. That could include deleting the underlying entities, or marking them as end of life, or removing them from the list without changing them, or even none of those things.
The server does need to be a little careful with its response, to be sure not to imply that the new representation matches the body of the request unless that's actually what it has done.

HTTP and REST doesn't have a concept of 'children'. If you do a GET request on a resource and there's something called "children" there, then those children are basically just part of that resource.
A PUT request should replace the state of the resource. If you are replacing the list of children with a new list of children, then yes I would expect those changes to stick.

How to model a progress "resource" in a REST api?

I have the following data structure that contains an array of sectionIds. They are stored in the order in which they were completed:
applicationProgress: ["sectionG", "sectionZ", "sectionA"]
I’d like to be able to do something like:
GET /application-progress - expected: sectionG, sectionZ, sectionA
GET /application-progress?filter[first]=true - expected: sectionG
GET /application-progress?filter[current]=true - expected: sectionA
GET /application-progress?filter[previous]=sectionZ - expected: sectionG
I appreciated the above URLs are incorrect, but I’m not sure how to name/structure them to get the expected data e.g. Are the resources here "sectionids"?
I'd like to adhere to the JSON:API specification.
UPDATE
I'm looking to adhere to JSON:API v1.0
In terms of resources I believe I have "Section" and "ProgressEntry". Each ProgressEntry will have a one-to-one relationship with a Section.
I'd like to be able to query within the collection e.g.
Get the first item in the collection:
GET /progress-entries?filter[first]
Returns:
{
"data": {
"type": "progress-entries",
"id": "progressL",
"attributes": {
"sectionId": "sectionG"
},
"relationships": {
"section": {
"links": {
"related": "http://example.com/sections/sectionG"
}
}
}
},
"included": [
{
"links": {
"self": "http://example.com/sections/sectionG"
},
"type": "sections",
"id": "sectionG",
"attributes": {
"id": "sectionG",
"title": "Some title"
}
}
]
}
Get the previous ProgressEntry given a relative ProgressEntry. So in the following example find a ProgressEntry whose sectionId attribute equals "sectionZ" and then get the previous entry (sectionG). I wasn't clear before that the filtering of this is based on the ProgressEntry's attributes:
GET /progress-entries?filter[attributes][sectionId]=sectionZ&filterAction=getPreviousEntry
Returns:
{
"data": {
"type": "progress-entries",
"id": "progressL",
"attributes": {
"sectionId": "sectionG"
},
"relationships": {
"section": {
"links": {
"related": "http://example.com/sections/sectionG"
}
}
}
},
"included": [
{
"links": {
"self": "http://example.com/sections/sectionG"
},
"type": "sections",
"id": "sectionG",
"attributes": {
"id": "sectionG",
"title": "Some title"
}
}
]
}

I started to comment on jelhan's reply though my answer was just to long for a reasonable comment on his objection, hence I include it here as it more or less provides a good introduction into the answer anyways.
A resource is identified by a unique identifier (URI). A URI is in general independent from any representation format else content-type negotiation would be useless. json-api is a media-type that defines the structure and semantics of representations exchanged for a specific resource. A media-type SHOULD NOT force any constraints on the URI structure of a resource as it is independent from it. One can't deduce the media-type to use based on a given URI even if the URI contains something like vnd.api+json as this might just be a Web page talking about json:api. A client may as well request application/hal+json instead of application/vnd.api+json on the same URI and receive the same state information just packaged in a different representation syntax, if the server supports both representation formats.
Profiles, as mentioned by jelhan, are just extension mechanisms to the actual media-type that allow a general media-type to specialize through adding further constraints, conventions or extensions. Such profiles use URIs similar to XML namespaces, and those URIs NEED NOT but SHOULD BE de-referencable to allow access to further documentation. There is no talk about the URI of the actual resource other than given by Web Linking that URIs may hint a client on the media-type to use, which I would not recommend as this requires a client to have certain knowledge about that hint.
As mentioned in my initial comments, URIs shouldn't convey semantics as link relations are there for!
Link-relations
By that, your outlined resource seems to be a collection of some further resources, sections by your domain language. While pagination as defined in json:api does not directly map here perfectly, unless you have so many sections that you want to split these into multiple pages, the same concept can be used using standardized link relations defined by IANA.
Here, at one point a server may provide you a link to the collection resource which may look like this:
{
"links": {
"self": "https://api.acme.org/section-queue",
"collection": "https://api.acme.org/app-progression",
...
},
...
}
Due to the collection link relation standardized by IANA you know that this resource may hold a collection of entries which upon invoking may return a json:api representation such as:
{
"links": {
"self": "https://api.acme.org/app-progression",
"first": "https://api.acme.org/app-progression/sectionG",
"last": "https://api/acme.org/app-progression/sectionA",
"current": "https://api.acme.org/app-progression",
"up": "https://api.acme.org/section-queue",
"https://api/acme.org/rel/section": "https://api.acme.org/app-progression/sectionG",
"https://api/acme.org/rel/section": "https://api.acme.org/app-progression/sectionZ",
"https://api/acme.org/rel/section": "https://api.acme.org/app-progression/sectionA",
...
},
...
}
where you have further links to go up or down the hierarchy or select the first or last section that finished. Note the last 3 sample URIs that leverages the extension relation types mechanism defined by RFC 5988 (Web Linking).
On drilling down the hierarchy further you might find links such as
{
"links": {
"self": "https://api.acme.org/app-progression/sectionZ",
"first": "https://api.acme.org/app-progression/sectionG",
"prev": "https://api.acme.org/app-progression/sectionG",
"next": "https://api.acme.org/app-progression/sectionA",
"last": "https://api.acme.org/app-progression/sectionA",
"current": "https://api.acme.org/app-progression/sectionA",
"up": "https://api.acme.org/app-progression",
...
},
...
}
This example should just showcase how a server is providing you with all the options a client may need to progress through its task. It will simply follow the links it is interested in. Based on the link relation names provided a client can make informed choices on whether the provided link is of interest or not. If it i.e. knows that a resource is a collection it might to traverse through all the elements and processes them one by one (or by multiple threads concurrently).
This approach is quite common on the Internet and allows the server to easily change its URI scheme over time as clients will only act upon the link relation name and only invoke the URI without attempting to deduce any logic from it. This technique is also easily usable for other media-types such as application/hal+json or the like and allows each of the respective resources to be cached and reused by default, which might take away load from your server, depending on the amount of queries it has to deal with.
Note that no word on the actual content of that section was yet said. It might be a complex summary of things typical to sections or it might just be a word. We could classify it and give it a name, as such even a simple word is a valid target for a resource. Further, as Jim Webber mentioned, your resources you expose via HTTP (REST) and your domain model are not identical and usually do not map one-to-one.
Filtering
json:api allows to group parameters together semantically by defining a customized x-www-form-urlencoded parsing. If content-type negotiation is used to agree on json:api as representation format, the likelihood of interoperability issues is rather low, though if such a representation is sent unsolicitedly parsing of such query parameters might fail.
It is important to mention that in a REST architecture clients should only use links provided by the server and not generate one on their own. A client usually is not interested in the URI but in the content of the URI, hence the server needs to know how to act upon the URI.
The outlined suggestions can be used but also URIs of the form
.../application-progress?filter=first
.../application-progress?filter=current
.../application-progress?filter=previous&on=sectionZ
can be used instead. This approach should in addition to that also work on almost all clients without the need to change their url-encoded parsing mechanism. In addition to that he management overhead to return URIs for other media-types generated may be minimized as well. Note that each of the URIs in the example above represent their own resource and a cache will store responses to such resources based on the URI used to retrieve such results. Queries like .../application-progress?filter=next&on=sectionG and .../application-progress?filter=previous&on=sectionA which retrieve basically the same representations are two distinctive resources which will be processed two times by your API as the response of the first query can't be reused as the cache key (URI) is different. According to Fielding caching is one of the few constraints REST has which has to be respected otherwise you are violating it.
How you design such URIs is completely up to you here. The important thing is, how you teach a client when to invoke such URIs and when it should not. Here, again, link-relations can and should be used.
Summary
In summary, which approach you prefer is up to you as well as which URI style you choose. Clients, especially in a REST environment, do not care about the structure of the URI. They operate on link-relations and use the URI just for invoking it to progress on with their task. As such, a server API should help a client by teaching it what it needs to know like in a text-based computer game in the 70/80's as mentioned by Jim Webber. It is helpful to think of the interaction model to design as affordances and state machine as explained by Asbjørn Ulsberg .
While you could apply filtering on grouped parameters provided by json:api such links may only be usable within the `json:api´ representation. If you copy & paste such a link to a browser or to some other channel, it might not be processable by that client. Therefore this would not be my first choice, TBH. Whether or not you design sections to be their own resource or just properties you want to retrieve is your choice here as well. We don't know really what sections are in your domain model, IMO it sounds like a valid resource though that may or may not have further properties.

API Design: Querying a sub-resource

We have a resource which could be modeled as a nested object
GET /A/
[
{
"name": "my_a",
"B": [
{"name": "my_b", "address": "0xbeef"}
]
}
]
or a sub resource, like
GET /A/my_a/B
[
{
"name":"my_b", "address": "0xbeef"
}
]
Our customers want a way to query for objects of type A based on properties of type B, e.g. "get me all the A objects who have B objects with name 'my_b'".
It seems preferable to write the API using the "B as a sub-resource" style of writing because it lends itself to pagination if there are many B object types. Additionally, retrieving B objects can be expensive, so if only some clients are interested in B, it makes sense to required the seperate calls to retrieve subresource B. However, it also seems strange to allow users to query on a sub resource if the sub resource is not returned in the results.
For example, a query feels quite natural when in the form:
GET /A?query=B.address[equals]0xbeef
[
{
"name": "my_a",
"B": [
{"name": "my_b", "address": "0xbeef"}
]
}
]
but less so when the query looks like
GET /A?query=B.address[equals]0xbeef
[
{
"name": "my_a"
}
]
A compromise I'm considering is using the nested approach but not include the B objects by default. A query parameter can expose B. So,
GET /A?query=B.address[equals]0xbeef&include_b=true
[
{
"name": "my_a",
"B": [
{"name": "my_b", "address": "0xbeef"}
]
}
]
I researched "REST, nested objects, querying" and found examples. Most of these examples included the subresource as a nested object, the include_b parameter seems unique to my design.
So, SO, I'm looking for general feedback on this approach, and to see if this is a common problem with a known solution. Curious to hear what comes back.
edit 1:
Updated the example to show that querying can be on arbitrary properties.

As #RomanVottner pointed out, I'm actually not designing a RESTful API. Instead, the API is closer to an RPC translated to use HTTP/JSON. In fact, my team follows the Google API Design guide which itself is dictating how to write GRPC APIs which are then (I presume) automatically translated into web endpoints.
So, at the end of the day, I have not had my style question answered, other than to learn that my question wasn't accurate. I will most likely use the solution I purposed in the question.

Should I use an array or an object for REST API collections?

My API contains a Book entity which represents a collection of several possible BookContent depending on the language. A BookContent is another entity with the attributes Title and Content, which both depend on the Language.
Should a Book look like:
1)
[
{
"language": "English",
"title": "First title",
"content": "First content"
},
{
"language": "French",
"title": "Premier titre",
"content": "Premier contenu"
}
]
or like:
2)
{
"English": {
"title": "First title",
"content": "First content"
},
"French": {
"title": "Premier titre",
"content": "Premier contenu"
}
}
The option 1):
produces "self-contained" elements, i.e. each object contains all the information.
The language attribute can contain any Unicode character (whereas as a key it cannot).
is the only available option if the content depends on multiple criteria, i.e. language and year.
creates a clearer separation between the two entities. For example, it makes it easier to replace each element by its ID, if we were to decide not to embed the BookContent entity anymore but only return the BookContent ID.
is probably more familiar to developers using my API, since I believe it is more common to find this kind of structure in other REST APIs.
The option 2):
produces smaller elements.
makes it faster to look for the elements according to the language without traversing through the whole collection.
This is a general question to know which one looks more like a "best practice", with no assumption on how the clients are querying/using the REST API, nor how much performance matters as opposed to flexibility, etc.
Which option is generally more often a "best practice"?

You state in your question that:
This is a general question to know which one looks more like a "best practice", with no assumption on how the clients are querying/using the REST API.
You are writing a REST service that is designed to help clients get information out - if that wasn't your aim, there'd be no point writing it. Because of this, the most important question you need to answer is "what information do clients want?". The answer to that will dictate the structure of your data, not "which one looks better?" - we're not your service's end users.
Personally, I'd opt for the first, simply because it would seem wrong to have an array called books.English in my object, it also allows for languages with characters outside of A-Z, and caters for books where the language is not known (or mixed). If simplicity of the individual book is key (and the list of languages is well-defined and finite), then consider:
[
{
"language": "English",
"books": [{
"title": "First title",
"content": "First content"
}]
},
{
"language": "French",
"books": [{
"title": "Premier titre",
"content": "Premier contenu"
}]
}
]
In essence, however, there's no single best practise for the data structures you're building other than "make them useful".

Additional fields (author, isbn) for /{user}/books.reads

Introduction
/me/books.reads returns books[1].
It includes an array of books and the following fields for each book:
title
type
id
url
Problem
I'd like to get the author name(s) at least. I know that written_by is an existing field for books.
I'd like to get ISBN, if possible.
Current situation
I tried this:
/me/books.reads?fields=data.fields(author)
or
/me/books.reads?fields=data.fields(book.fields(author))
But the error response is:
"Subfields are not supported by data"
The books.reads response looks like this (just one book included):
{
"data": [
{
"id": "00000",
"from": {
"name": "User name",
"id": "11111"
},
"start_time": "2013-07-18T23:50:37+0000",
"publish_time": "2013-07-18T23:50:37+0000",
"application": {
"name": "Books",
"id": "174275722710475"
},
"data": {
"book": {
"id": "192511337557794",
"url": "https://www.facebook.com/pages/A-Semantic-Web-Primer/192511337557794",
"type": "books.book",
"title": "A Semantic Web Primer"
}
},
"type": "books.reads",
"no_feed_story": false,
"likes": {
"count": 0,
"can_like": true,
"user_likes": false
},
"comments": {
"count": 0,
"can_comment": true,
"comment_order": "chronological"
}
}
}
If I take the id of a book, I can get its metadata from the open graph, for example http://graph.facebook.com/192511337557794 returns something like this:
{
"category": "Book",
"description": "\u003CP>The development of the Semantic Web...",
"genre": "Computers",
"is_community_page": true,
"is_published": true,
"talking_about_count": 0,
"were_here_count": 0,
"written_by": "Grigoris Antoniou, Paul Groth, Frank Van Harmelen",
"id": "192511337557794",
"name": "A Semantic Web Primer",
"link": "http://www.facebook.com/pages/A-Semantic-Web-Primer/192511337557794",
"likes": 1
}
The response includes ~10 fields, including written_by which has the authors of the book.
Curiously, link field seems to map to url of the books.reads response. However, the field names are different, so I'm starting to loose hope that I would be able to ask for written_by in books.reads request..
The only reference that I've found about /me/books is https://developers.facebook.com/docs/reference/opengraph/object-type/books.book/
This is essentially about user sharing that he/she has read a book, not the details of the book itself.
The data structure is focused on the occasion of reading a book: when reading was started, when this story was published, etc.
[1] I know this thanks to How to get "read books"

FQl does not looks very promising – although you can request books from the user table, it seems to deliver just a string value with only the book titles comma-separated.
You can search page table by name – but I doubt it will work with name in (subquery) when what that subquery delivers is just one string of the format 'title 1,title 2,…'.
Can’t really test this right now, because I have read only one book so far (ahm, one that I have set as “books I read” on FB, not in general …) – but using that to search the page table by name already delivers a multitude of pages, and even if I narrow that selection down by AND is_community_page=1, I still get several, so no real way of telling which would be the right one, I guess.
So, using the Graph API and a batch request seems to be more promising.
Similar to an FQL multi-query, batch requests also allow you to refer data from the previous “operation” in a batch, by giving operations a “name”, and then referring to data from the first operation by using JSONPath expression format (see Specifying dependencies between operations in the request for details).
So a batch query for this could look like this,
[
{"method":"GET","name":"get-books","relative_url":"me\/books?fields=id"},
{"method":"GET","relative_url":"?ids={result=get-books:$.data.*.id}
&fields=description,name,written_by"}
]
Here all in one line, for easier copy&paste, so that line breaks don’t cause syntax errors:
[{"method":"GET","name":"get-books","relative_url":"me\/books?fields=id"},{"method":"GET","relative_url":"?ids={result=get-books:$.data.*.id}&fields=description,name,written_by"}]
So, to test this:
Go to Graph API Explorer.
Change method to POST via the dropdown, and clear whatever is in the field right next to it.
Click “Add a field”, and input name batch, and as value insert the line copy&pasted from above.
Since that will also get you a lot of “headers” you might not be interested in, you can add one more field, name include_headers and value false to get rid of those.
In the result, you will get a field named body, that contains the JSON-encoded data for the second query. If you want more fields, add them to the fields parameter of the second query, or leave that parameter out completely if you want all of them.
OK, after some trial-and-error I managed to create a direct link to Graph API Explorer to test this – the right amount of URL-encoding to use is a little fiddly to figure out :-)
(I left out the fields parameter for the second operation here, so this will give you all the info for the book that there is.)
As I said, I only got one book on FB, but this should work for a user with multiple books the same way (since the second operation just takes however many IDs it is given from the first one).
But I can’t tell you off the top of my head how this will work for a lot of books – how slow the second operation might get with that, when you set a high limit for the first one. And I also don’t know how this will behave in regard to pagination, which you might run into when me/books delivers a lot of books for a user.
But I think this should be a good enough starting point for you to figure the rest out by trying it on users with more data. HTH.
Edit: ISBN does not seem to be part of the info for a book’s community page, at least not for the ones I checked. And also written_by is optional – my book doesn’t have it. So you’ll only get that info if it is actually provided.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse