Advanced Queries in REST - rest

I'm trying to create a more advanced query mechanism for REST. Assume I have the following:
GET /data/users
and it returns a list of users. Then to filter the users returned for example I'd say:
GET /data/users?age=30
to get a list of 30 year old users. Now lets say I want users aged 30 - 40. I'd like to have essentially a set of reusable operators such as:
GET /data/users?greaterThan(age)=30&lessThan(age)=40
The greaterThan and lessThan would be reusable on other numeric, date, etc fields. This would also allow me to add other operators (contains, starts with, ends with, etc). I'm a REST noob so I'm not sure if this violates any of the core principles REST follows. Any thoughts?

Alternately, you might simply be better off with optional parameters "minAge" and "maxAge".
Alternative 2: encode the value(s) for parameters to indicate the test to be performed: inequalities, pattern matching etc.
This gets messy no matter what you do for complex boolean expressions. At some point, you almost want to make a document format for the query description itself, but it's hard to think of it as a "GET" anymore.

I would look into setting the value of the query parameter to include syntax for operators and such .. something like this for a range of values
/data/users?age=[30,40]
or
/data/users?age=>30&age=<40
would make it a little easier to read, just make sure to url encode if you are using any reserved characters

Related

REST API - string or numerical identifier in URL

We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.

How do I create a validator for a single collection?

I need to build a custom id validator that will apply to a single collection, whose id will always be pre-defined (won't need a generator).
In the docs about id generators, it's written:
Currently the configuration of the custom generator applies to every resources (buckets, groups, collections, records). This tiny limitation can easily be fixed, don’t hesitate to get in touch with us!
But there is nothing documented about id validation.
So, how do I:
Implement an id validator, that
Will apply to one collection only?
By default cliquet uses a generator which accepts the following regular expression r'^[a-zA-Z0-9][a-zA-Z0-9_-]*$' (All letters and numbers + underscore and "-").
Before you chose to have a different ID validation mechanism, ensure you really need to.
Now, if that's not enough, you would need to select the proper validator depending on some configuration or already existing values, but this is not implemented in cliquet / kinto.
https://github.com/mozilla-services/cliquet/blob/master/cliquet/resource/init.py#L147 is probably a good place to look at / start with.

Complex URL handling conception

I'm currently struggling at a complex URL handling concept question. The application have a product property database table/collection with all the different product types (i.e. categories, colors, manufacturers, materials, etc.).
{_id:1,alias:"mercedes-benz",type:"brand"},
{_id:2,alias:"suv-cars",type:"category"},
{_id:3,alias:"cars",type:"category"},
{_‌​id:4,alias:"toyota",type:"manufacturer"},
{_id:5,alias:"red",type:"color"},
{_id:6,alias:"yellow",type:"color"},
{_id:7,alias:"bmw",type:"manufacturer"},
{_id:8,alias:"leather",type:"material"}
...
Now the mission is to handle URL requests in the style below in every(!) possible order to retrieve the included product properties. The only allowed character is the dash (settled SEO requirement, some properties also can include dashes by themselve - i think also an important point - i.e. the category "suv-cars" or the manufacturer "mercedes-benz"):
http:\\www.example.com\{category}-{color}-{manufacturer}-{material}
http:\\www.example.com\{color}-{manufacturer}
http:\\www.example.com\{color}-{category}-{material}-{manufacturer}
http:\\www.example.com\{category}-{color}-nonexistingproperty-{manufacturer}
http:\\www.example.com\{color}-{category}-{manufacturer}
http:\\www.example.com\{manufacturer}
http:\\www.example.com\{manufacturer}-{category}-{color}-{material}
http:\\www.example.com\{category}
http:\\www.example.com\{manufacturer}-nonexistingproperty-{category}-{color}-{material}
http:\\www.example.com\{color}-crap-{manufacturer}
...
...so: every order of the properties should be allowed! The result have to be the information about the used properties per URL-Request (BTW yes, the duplicate content will be fixed by redirects and a predefined schema). The "nonexistingproperties"/"crap" are possible and just should be ignored.
UPDATE:
Idea 1: One way i'm thinking about the question is to split the query string by dashes and analyze them value by value, the problem: At the two or three or more word combinations at some properties there are too many different combinations and variations so a loooot of queries which kills this idea i think..
Idea 2: The other way is to build a (in my opinion) too large Alias/URL-Table with all of the different combinations, but i think that's just an ugly workaround. There are about 15.000 of different properties so the count of the aliases in the different sort orders is killing this idea.
Idea 3: It's your turn! Thanks for your mind and your time.
While your question is a bit broad, below are some ideas. There isn't a single awesome answer unless you find a free or commercial engine for this that works exactly the way you want.
The way I thought about your problem was to consider the URL as a list of keywords.
use Lucene as a keyword/tag system. It's good at the types of searches you suggest you want, including phrases, stems, etc.
store and index the data in DB of choice, but pull the keywords into memory and build a bit index of all keywords vs items. Iterate through the keyword table producing weighted results. If order of keywords matters, you'll also need make a pass through the result set to weight based on word order. These types of searches always need to cap their result set quickly in order to return results quickly.
cache the results like crazy from working matches, and give precedence to results that users seem to click on the most for a given URL.
attack the database by using tag indexes in MongoDB. You'd still need to merge and weight results. Very intensive and not likely a good use of DB resources.
read some of the academic papers on keyword searches. It's a popular topic.
build a table of words that have dashes in them, and normalize/convert those before running your queries
always check for full exact matches first
The only way this may work, if you restrict all property values to be unique. So, you make a set of categories+colors+manufacturers, etc. All values have to be unique. This will allow you to find to what property the value belongs.
The data structure for this should be fairly simple:
{_id:ValueOfTheProperty, Property:TypeOfProperty}
Here are some possible samples:
{ _id: Red, Property: Color }
{ _id: Green, Property: Color }
{ _id: Boots, Property: Category }
{ _id: Shoes, Property: Category }
...
This way, the order does not matter, and you are able to convert them in a single pass to a map:
{ Color: Red, Category: Boots }
Though, I predict some problems with ambigous names here.

Rest Search Query handling special characters design standards

I am designing a rest api where users can pass in queries using a search query language I will define.
The language will allow a number of operators eq, ne, gt, lt (equals, not equals, greater than, less than) etc etc.
The language will allow grouping and logical operators AND and OR.
So for example a query about companies may look like the following
/api/companies?q=(CompanyName eq Microsoft Or CompanyName eq Apple) And State eq California
So this should give me all companies where company name equals 'Microsoft' or 'Apple' and the state is California.
So this all works fine except for the fact that the system that I am writing the api against is extremely flexible and allows almost any character to be inserted into fields values. Additionally, I also must support custom fields and those are able to have special characters in the field name.
Initially my main concern was fields that contained parentheses. I will be converting this query into a SQL server query and I need a way to ensure that I do not confuse a parentheses in a field value with one that is intended for grouping. My second thought was to force field values to be quoted, but I think this will also cause similar problems.
I was also considering that there may be a simple approach involving html encoding, but I am unable to see exactly how that would work.
What I am looking for is any advice or examples of reasonable approaches to handle a rest search query with such flexible data.
You should use percent encoding to escape characters in your query string, see RFC 3986. This previous StackOverflow post contains some useful background information about URI encoding.
Initially my main concern was fields that contained parentheses. I will be converting this
query into a SQL server query and I need a way to ensure that I do not confuse a parentheses
in a field value with one that is intended for grouping
If this might be a problem then it sounds like your application will be susceptible to SQL injection. You should be escaping any external data before constructing an SQL query.
/api/companies?q=(CompanyName eq Microsoft Or CompanyName eq Apple) And State eq California
Based on this example you could take advantage of the URI query string to better represent your query:
/api/companies?CompanyName=Microsoft%20OR%20Apple&State=California
Here is an example.
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/

How to structure a RESTful URI with mulitple inter-related parameters

I'm building a RESTful API in which the user can issue a query about a given object, with a weight attached to that object. E.g.:
http://host.domain.com/cars?id=100&weight=50
(This is a contrived, simplified example, so apologies if this doesn't make much semantic sense!)
The complication is that the user might need to combine multiple objects in a single query. What I'm wondering is if there is a standard RESTful way to do this? For example, options that occur to me include:
http://host.domain.com/cars?id1=100&weight1=50&id2=200&weight2=90
http://host.domain.com/cars?ids=100,200&weights=50,90
I don't like the second one, because, for example, weights are optional, so you'd need to allow something like this:
http://host.domain.com/cars?ids=100,200&weights=,90
The first one seems preferable to me, but it seems like it could become complicated, particularly as I already have indexed arguments (e.g. x1, x2) meaning I'll need to have two levels of indexes (x1_1, x1_2, ...)
Anyone know of a standard approach to this kind of thing? Or can anyone think of a pragmatic, sensible solution?
I am not sure your question is covered by Cool URIs - http://www.w3.org/TR/cooluris/
My personal choice, with no citations to support it, would be to firstly get rid of the query string using the server configuration (redirects or aliases), so that the base resource would appear as:
http://host.domain.com/cars
The list of IDs and weights could then be appended (in the URI's 'path info'), delimited as you see fit -- semi-colons, or slashes. My choice would be the latter, simply as it makes the URI cleaner to read and easier to type. The only time that becomes a problem is if weights are sometimes omitted, though that could be overcome if the IDs were alphanumeric (perhaps hashes), and the weights always numeric.
I still don't know if this is right or not, and LeeGee's suggestion seems reasonable, but I've ended up going with something like this:
http://host.domain.com/cars?id_1=100&weight_1=50&id_2=200&weight_2=90
It ends up creating ugly looking URIs, but it seems to me that they're consistent, and unambiguous, particularly when optional arguments are omitted.