Elasticsearch truly RESTful? - rest

I am designing an API that will need to accept a large amount of data in order to return resources. I thought about using a POST request instead of a GET so I can pass a body with the request. That has been largely frowned upon in the REST community:
Switching to a POST request simply because there's too much data to fit in a GET request makes little sense
https://stackoverflow.com/a/812935/7489702
Another:
Switching to POST discards a number of very useful features though. POST is defined as a non-safe, non-idempotent method. This means that if a POST request fails, an intermediate (such as a proxy) cannot just assume they can make the same request again. https://evertpot.com/dropbox-post-api/
Another: HTTP GET with request body
But contrary to this, Elasticsearch uses POST methods to get around the issue of queries being too long to put in a url.
Both HTTP GET and HTTP POST can be used to execute search with body. Since not all clients support GET with body, POST is allowed as well.https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html
So, is Elasticsearch not truly restful? Or, does the difference between POST and GET not matter as much in modern browsers?

ElasticSearch intent is not to be RESTful but to provide a (pragmatic) Web-API to clients in order to index documents and offer services like fulltext search or aggregations to help the client in its needs.
Not everything that is exposed via HTTP is automatically RESTful. I claim that most of the so called RESTful services aren't as RESTful as they think they are. In order to be RESTful a service has to adhere to a couple of constraints which Fielding, the inventor of REST, precisied further in a blog post.
Basically RESTful services should adhere to and not violate the underlying protocol and put a strong focus on resources and their presentation via media-types. Altough REST is used via HTTP most of the time, it is not restricted to this protocol.
Clients on the other hand should not have initial knowledge or assumptions on the available resources or their returned state ("typed" resource) in an API but learn them on the fly via issued requests and analyzed responses. This gives the server the opportunity to move arround or rename resources easily without breaking a client implementation.
HATEOAS (Hypertext as the engion of aplication state) enriches a resource state with links a client can use to trigger further requests in order to update its knowlege base or perform some state changes. Here a client should determine the semantics of an URI by the given relation name rather than parse the URI as the relation name should not change if the server moves arround a resource for whatever reason.
The client furthermore should use the relation name to determine what content type a resource may have. A relation name like news could force the client to request the resource as application/atom+xml representation while a contact relation might lead to a representation request of media-type text/vcard, vcard+json or vcard+xml.
If you look at an ElasticSearch sample I took from dzone you will see that ES does not support HATEOAS at all:
Request
GET /bookdb_index/book/_search?q=guide
Response:
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.28168046,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.24144039,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
The problem here is, that the response contains ElasticSearch related stuff that obviously is some arbitrary metadata for the returned results. While this could be handled via special media-types that teaches a client what each fields semantics are the actual payload kept in the _source element is still generic. Here you'd need further custom media-type extensions for each possible type.
If ES changes the response format in future clients which assume that _type will determine the type of a resource and _source will define the current state of some object of that type may break and hence stop working. Instead a client should ask the server to return a resource in a format it understands. If the client does not know any of the requested representation formats it will notify the client accordingly. If it knows at least one it will transform the state of the requested resource to a representation the client understands.
Long story short, ElasticSearch is by no means RESTful and it also does not try to be. Instead your "RESTful" service should use it and use the results to generate a response in accordance with the requested representation by the client.

So, is Elasticsearch not truly restful? Or, does the difference between POST and GET not matter as much in modern browsers?
I think ES is not truly restful, because it's query is more complex than normal Web Application.
REST proponents tend to favor URLs, such as
http://myserver.com/catalog/item/1729
but the REST architecture does not require these “pretty URLs”. A GET request with a parameter
http://myserver.com/catalog?item=1729 (Elasticsearch do this)
It is difference POST and GET in modern developer.
GET requests should be idempotent. That is, issuing a request twice should be no different from issuing it once. That’s what makes the requests cacheable. An “add to cart” request is not idempotent—issuing it twice adds two copies of the item to the cart. A POST request is clearly appropriate in this context. Thus, even a RESTful web application needs its share of POST requests.
reference What exactly is RESTful programming?

Related

What HTTP method should i use when i need to pass a lot of data to a single endpoint?

Currently i've asking if the current HTTP method that i'm using on my rest api is the correct one to the occasion, my endpoint need a lot of data to "match" a query, so i've made a POST endpoint where the user can send a json with the parameters, e.g:
{
"product_id": 1
"partner": 99,
"category": [{
"id": 8,
"subcategories": [
{"id": "x"}, {"id": "y"}
]
}]
"brands": ["apple", "samsung"]
}
Note that brands and category are a list.
Reading the mozzila page about http methods i found:
The POST method is used to submit an entity to the specified resource, often causing a change in state or side effects on the server.
My POST endpoint does not take any effect on my server/database so in theory i'm using it wrong(?), but if i use a GET request how can i make it more "readable" and how can i manage lists on this method?
What HTTP method should i use when i need to pass a lot of data to a single endpoint?
From RFC 7230
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.
That effectively limits the amount of information you can encode into the target-uri. Your limit in practice may be lower still if you have to support clients or intermediaries that don't follow this recommendation.
If the server needs the information, and you cannot encode it into the URI, then you are basically stuck with encoding it into the message-body; which in turn means that GET -- however otherwise suitable the semantics might be for your setting -- is out of the picture:
A payload within a GET request message has no defined semantics
So that's that - you are stuck with POST, and you lose safe semantics, idempotent semantics, and caching.
A possible alternative to consider is creating a resource which the client can later GET to retrieve the current representation of the matches. That doesn't make things any better for the first adhoc query, but it does give you nicer semantics for repeat queries.
You might, for example, copy the message-body into an document store, and encode the key to the document store (for example, a hash of the document) into the URI to be used by the client in subsequent GETs.
For cases where the boilerplate of the JSON document is large, and the encoding of the variation small, you might consider a family of resources that encode the variations into the URI. In effect, the variation is extracted from the URI and applied to the server's copy of the template, and then the fully reconstituted JSON document is used to achieve... whatever.
You should be using a POST anyways. With Get you can only "upload" data via URL parameters or HTTP Headers. Both are unsuitable for structured data like yours. Do use POST even though no "change" happens on the server.

What is the most RESTful way to update a resource in a non-idempotent way?

What is the most RESTful way to update part of a resource, where the generation of that resource is done server side, not on the client side. This is not an idempotent action, as the supporting data on the server may change between requests.
I'm creating a Rest API, and I've come to a design choice where I'm quite sure of the way to move forward.
I have a resource that I want to refresh, which involves creating a large json blob based on support data, then saving that json blob to a database before serving it back to the user.
My question is, what is the most RESTful way to perform this action? As the client doesn't perform the calculations, and it also isn't idempotent as the data set may change between each call, I feel it is unnatural to use a PUT.
I settled on a POST, but that doesn't sit right either.
A third option would be to have a sub-resource that describes the action of refreshing - this doesn't feel correct either.
For example, I have a document:
GET /document/<documentId>
which would return something like:
"body": {
"createdAt": "2019-01-01 12:00:00",
"updatedAt": "2019-01-01 13:00:00",
"name": "example",
"location": "example",
"city": "example"
}
These fields are generated by the server when the document is created, the client doesn't update them.
To allow the client to signal that they would like the server to regenerate the document, I have settled on:
POST /document/<documentId>
"body": {
"param1": "updatedparam1",
"param2": "updatedparam2"
}
An alternative approach would be to do something like:
POST /document/<documentId>/refresh
"body": {...}
but that feels more like an RPC call rather than REST.
Does this make sense logically? I haven't seen many suggestions that POST can be to a single resource as opposed to a collection.
Please do let me know if I can expand on anything, I've been banging my head against this for a little while and have probably missed something.
I settled on a POST, but that doesn't sit right either.
POST is fine.
HTTP semantics include rules about invalidating cached representations of resources. Presumably, when you tell the server to regenerate the document, you don't want to keep using the old copy yourself. So the target uri of the request should be the same as that which you use to GET the resource.
So:
POST /document/<documentId>
Is a good start.
An alternative, assuming the semantics match, would be to use PATCH -- that's an appropriate choice to make if what you are doing is proposing replacement values for the representation. In that case, the body of the request should be a "patch document". You can, of course, define your own type of patch document; generic clients may already understand one or more of the standards RFC-6902:JSON Patch or RFC-7386:JSON Merge Patch so you can potentially save some work by supporting one or more of the standard formats.
I haven't seen many suggestions that POST can be to a single resource as opposed to a collection.
Part of the point of REST is that resources support a uniform interface - "single resources" and "collection resources" look the same. Historically, we got a bit unlucky with the early specifications for POST, which were easily misinterpreted as CREATE.
But generic clients don't know, or care, whether or not the resource you specify in a web form is a "collection resource"; it just packs up the data and sends the request, confident that the server will know what to do.

Is it really practical to use URLs instead of ids in a REST API

The proper design of REST APIs seems to be a controversial topic. As far as I understand it, the purist approach with regard to ids would be that the URL is the only identifier of a resource for the outside world, so neither does the client have to interpret the URL in any way (e.g. knowing that the latest segment is the id) nor does the id have to be included explicitly in the representation returned for a simple GET request.
At first sight this seems to be a good rule because the client does not have to care about generating URLs based on ids, it's just the same thing. The id tells you how to retrieve the resource. However, I doubt that this is really applicable in practice. Some concerns that come to my mind:
What if the URL changes because of a new API version (given that it is part of the URL)
or the protocol changes from http to https.
or the application even moves to another domain for whatever reason
Short Ids are handy for referencing resources in parameters. This would not be possible: /books?author=short.author.id
It just puts too much information into an id that does not really belong there because the ide should not be interpreted by any consumer in such a way.
Is this really done in practice? Are there examples of popular public APIs applying this pattern? Or maybe I don't understand it correctly and this is not what REST purists advocate?
Have a look at Hypermedia Driven RESTFul APIs. In HATEOAS, URIs are discoverable (and not documented) so that they can be changed. That is, unless they are the very entry points into your system (Cool URIs, the only ones that can be hard-coded by clients) - and you shouldn't have too many of those if you want the ability to evolve the rest of your system's URI structure in the future. This is in fact one of the most useful features of REST.
For the remaining non-Cool URIs, they can be changed over time, and your API documentation should spell out the fact that they should be discovered at runtime through hypermedia traversal.
Looking at the Richardson's Maturity Model (level 3), this would be where links come into play. For example, from the top level, say /api/version(/1), you would discover there's a link to the groups. Here's how this could look in a tool like HAL Browser:
Root:
{
"_links": {
"self": {
"href": "/api/root"
},
"api:group-add": {
"href": "http://apiname:port/api/group"
},
"api:group-search": {
"href": "http://apiname:port/api/group?pageNumber={pageNumber}&pageSize={pageSize}&sort={sort}"
},
"api:group-by-id": {
"href": "http://apiname:port/api/group/{id}" (OR "href": "http://apiname:port/api/group?id={id}")
}
}
}
The advantage here would be that the client would only need to know the relationship (link) name (well obviously besides the resource structure/properties), while the server would be mostly free to alter the relationship (and resource) url.

Saving two new related objects in one request

I have two models in Django:
class Thread(models.Model):
entity = models.ForeignKey(Entity, null=True, blank=True)
...
class ThreadMessage(models.Model):
thread = models.ForeignKey(Thread, related_name='messages')
author = models.ForeignKey(User)
...
Now a client wants to create a new thread with first message in it. It has first to do a POST /threads to create a new thread and find out its id and then do POST /messages passing the found id in thread field.
I am thinking if it's reasonable and possible to do all of this in one request from Ember like:
POST /messages
{"message": {"text": "text", ...},
"thread": {"entity": 1}}
And the response would be:
{"message": {"text": "text", "id": 5678, "thread": 1234, ...},
"thread": {"entitity": 1, "id": 1234, ...}}
Yes it is perfectly reasonable.
People seem to interpret REST in a very strange and largely ignorant way. A cursory read of the HTTP RFC 7231 for POST and PUT will confirm you are on solid ground.
A resource representation can represent ANYTHING. The key thing is to preserve the semantics of the REST operations. So PUT can be used for both CREATE and REPLACE like operations (I tend to think of PUT being REPLACE rather than UPDATE as REPLACE is closer to an idempotent semantic than UPDATE in my mind)
A PUT to an endpoint where supported, should accept whatever representation a GET returns. A POST can do literally anything you want as it doesn't need to support idempotent semantics.
HTTP and REST is designed and intended to support resource representations that may overlap other resources and the RFC is explicit about this. You do this all the time when doing a GET on a collection endpoint.
You are NOT breaking REST by having a thread containing a child message in a single request and IMO that is a very valid use use case for sane referential integrity on the server. Any time a transactional semantic is required, a POST or PUT is perfectly valid to create a graph of objects on the server in a single request. It is really simple, if you can GET it in a single request, you should be able to PUT it in a single request, so think carefully about your URL's and parameters.
For example, you may have a thread endpoint that returns all messages and that endpoint may support a parameter to just return some subset of the information /api/threads?include=hasRead which returns just id and hasRead for each message in the thread, or perhaps just some range of 'pages'. You can then PUT using that same endpoint and parameters and just update thehasRead property in bulk.
Anyone who gets hung up on this has probably never considered access controls either. Access control necessitates a different view of a resource from one user to another based on what they are allowed to access. This different view of a resource is conveyed in HTTP auth headers and/or in the request URL; again REST is not being broken by sub-setting or overlapping resources.
So go ahead and create the minimal graph of objects you need and either PUT or POST them. I use V4 UUID's so clients can assign ID's (and thus resource endpoints) themselves and this allows me to use PUT for both create and replace like actions and wire up complex object graphs without client<->server id mapping issues.
What you are trying to do will be break concept of REST and EmberJS itself.
If you have two separate APIs you should make two REST calls.
First save the parent thread model , after successful return save the child message. Then use addObject to reflect changes in views.
This is best way. Don't try to optimize by reducing API calls here and breaking REST in the way.

REST - What exactly is meant by Uniform Interface?

Wikipedia has:
Uniform interface
The uniform interface constraint is fundamental to the design of any REST service.[14] The uniform interface simplifies and decouples the architecture, which enables each part to evolve independently. The four guiding principles of this interface are:
Identification of resources
Individual resources are identified in requests, for example using URIs in web-based REST systems. The resources themselves are conceptually separate from the representations that are returned to the client. For example, the server may send data from its database as HTML, XML or JSON, none of which are the server's internal representation, and it is the same one resource regardless.
Manipulation of resources through these representations
When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource.
Self-descriptive messages
Each message includes enough information to describe how to process the message. For example, which parser to invoke may be specified by an Internet media type (previously known as a MIME type). Responses also explicitly indicate their cacheability.
Hypermedia as the engine of application state (A.K.A. HATEOAS)
Clients make state transitions only through actions that are dynamically identified within hypermedia by the server (e.g., by hyperlinks within hypertext). Except for simple fixed entry points to the application, a client does not assume that any particular action is available for any particular resources beyond those described in representations previously received from the server.
I'm listening to a lecture on the subject and the lecturer has said:
"When someone comes up to our API, if you are able to get a customer object and you know there are order objects, you should be able to get the order objects with the same pattern that you got the customer objects from. These URI's are going to look like each other."
This strikes me as wrong. It's not so much about what the URI's look like or that there is consistency as it is the way in which the URI's are used (identify resources, manipulate the resources through representations, self-descriptive messages, and hateoas).
I don't think that's what Uniform Interface means at all. What exactly does it mean?
Using interfaces to decouple classes from the implementation of their dependencies is a pretty old concept. In REST you use the same concept to decouple the client from the implementation of the REST service. In order to define such an interface (a contract between the client and the service), you have to use standards. This is because if you want an internet size network of REST services, you have to enforce global concepts, like standards to make them understand each other.
Identification of resources - You use the URI (IRI) standard to identify a resource. In this case, a resource is a web document.
Manipulation of resources through these representations - You use the HTTP standard to describe communication. So for example GET means that you want to retrieve data about the URI-identified resource. You can describe an operation with an HTTP method and a URI.
Self-descriptive messages - You use standard MIME types and (standard) RDF vocabs to make messages self-descriptive. So the client can find the data by checking the semantics, and it doesn't have to know the application-specific data structure the service uses.
Hypermedia as the engine of application state (a.k.a. HATEOAS) - You use hyperlinks and possibly URI templates to decouple the client from the application-specific URI structure. You can annotate these hyperlinks with semantics e.g. IANA link relations, so the client will understand what they mean.
The Uniform Interface constraint, that any ReSTful architecture should comply with, actually means that, along with the data, server responses should also announce available actions and resources.
In chapter 5 ("Reprensational State Transfer") of his dissertation, Roy Fielding states that the aim of using uniform interfaces is to:
ease and improve global architecture and the visibility of interactions
In other words, querying resources should allow the client to request other actions and resources without knowing them in advance.
The JSON-API specs (jsonapi.org) offer a good example in the form of a JSON response to an (hypothetical) GET HTTP request on http://example.com/articles :
{
"links": {
"self": "http://example.com/articles",
"next": "http://example.com/articles?page[offset]=2",
"last": "http://example.com/articles?page[offset]=10"
},
"data": [{
"type": "articles",
"id": "1",
"attributes": {
"title": "JSON API paints my bikeshed!"
},
"relationships": {
"author": {
"links": {
"self": "http://example.com/articles/1/relationships/author",
"related": "http://example.com/articles/1/author"
},
},
"comments": {
"links": {
"self": "http://example.com/articles/1/relationships/comments",
"related": "http://example.com/articles/1/comments"
}
}
},
"links": {
"self": "http://example.com/articles/1"
}
}]
}
Just by analysing this single response, a client knows:
What entities were queried ("articles" in this example);
How these entities are structured (articles have fields: id, title, author, comments);
How to retrieve related entities (i.e. the author and the comments);
That there are more entities of type "articles" (10, based on current response length and pagination links).
For those passionate about the topic, I strongly recommend reading Roy Thomas Fielding's dissertation!
Your question is somewhat broad, you seem to be asking for a restatement of the definitions you have. Are you looking for examples or do you not understand somethings specifically stated.
I agree that the line:
These URI's are going to look like each other
is fundamentally wrong. URIs needn't look anything like each other for the Uniform interface constraint to be met. What needs to be present is a uniform way to discover the URIs that identify the resources. This uniform way is unique to each message type, and there must be some agreed upon format. For example in HTML one document resource links to another via a simple tag:
fallback relationship
HTTP servers return html as a text/html resource type which browsers have an agreed upon way of parsing. The anchor tag is the hypermedia control (HATEOAS) that has the unique identifier for the related resource.
The only point that wasn't covered was manipulation. HTML has another awesome example of this, the form tag:
<form action="URI" method="verb">
<input name=""></input>
</form>
again, browser know how to interpret this meta information to define a representation of the resource acted upon at the URI. Unfortunately HTML only lets you GET and POST for verbs...
more commonly in a JOSN based service, when you retrieve a Person resource, it's easy to manipulate that representation and then PUT or PATCH it right back to it's canonical URL. No pre-existing knowledge of the resource is needed to modify it. Now when we write client code we get all wrapped up with the idea that we do in fact need to know the shape before we consume it...but that really is just to make our parsers efficient and easy. We could make parsers that analyze the semantic meaning of each part of a resource and modify it by interpreting the intent of the modification. IE: a command of make the person 10 years older would parse the resource looking for the age, identify the age, and then add 10 years to that value, then send that resource back to the server. Is it easier to have code that expects the age to be at a JSON path of $.age? absolutely...but it's not specifically necessary.
Ok I think I understand what it means.
From Fieldings dissertation:
The central feature that distinguishes the REST architectural style from other network-based styles is its emphasis on a uniform interface between components (Figure 5-6). By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved.
He's saying that the interface between components must be the same. Ie. between client and server and any intermediaries, all of which are components.