REST API - string or numerical identifier in URL - rest

We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?

It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.

Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service

Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.

Related

Datatype for a URL in PostgreSQL

I need to store a URL in a PostgreSQL table. What is the best datatype for a field that will hold a URL with an undetermined length?
Thanks in advance.
The answer depends on what you intend to do with the data.
If you just need to store some uris in order to print them when requested, the text datatype seems indicated. There seems to be no standard about the maximum length of an url (note that browsers have their own limits, for example at least some years ago IE was limited to 2083 characters, but this is unrelated to our problem).
If you need some advanced operations on uris (for example, computing the base uri or extract some other parts), then you may want to use some libraries designed for this purpose. One example of such library (actually I know of no alternative) is pguri.

How to structure a RESTful URI with mulitple inter-related parameters

I'm building a RESTful API in which the user can issue a query about a given object, with a weight attached to that object. E.g.:
http://host.domain.com/cars?id=100&weight=50
(This is a contrived, simplified example, so apologies if this doesn't make much semantic sense!)
The complication is that the user might need to combine multiple objects in a single query. What I'm wondering is if there is a standard RESTful way to do this? For example, options that occur to me include:
http://host.domain.com/cars?id1=100&weight1=50&id2=200&weight2=90
http://host.domain.com/cars?ids=100,200&weights=50,90
I don't like the second one, because, for example, weights are optional, so you'd need to allow something like this:
http://host.domain.com/cars?ids=100,200&weights=,90
The first one seems preferable to me, but it seems like it could become complicated, particularly as I already have indexed arguments (e.g. x1, x2) meaning I'll need to have two levels of indexes (x1_1, x1_2, ...)
Anyone know of a standard approach to this kind of thing? Or can anyone think of a pragmatic, sensible solution?
I am not sure your question is covered by Cool URIs - http://www.w3.org/TR/cooluris/
My personal choice, with no citations to support it, would be to firstly get rid of the query string using the server configuration (redirects or aliases), so that the base resource would appear as:
http://host.domain.com/cars
The list of IDs and weights could then be appended (in the URI's 'path info'), delimited as you see fit -- semi-colons, or slashes. My choice would be the latter, simply as it makes the URI cleaner to read and easier to type. The only time that becomes a problem is if weights are sometimes omitted, though that could be overcome if the IDs were alphanumeric (perhaps hashes), and the weights always numeric.
I still don't know if this is right or not, and LeeGee's suggestion seems reasonable, but I've ended up going with something like this:
http://host.domain.com/cars?id_1=100&weight_1=50&id_2=200&weight_2=90
It ends up creating ugly looking URIs, but it seems to me that they're consistent, and unambiguous, particularly when optional arguments are omitted.

Benefits of RESTful URL

What are the benefits of
http://www.example.com/app/servlet/cat1/cat2/item
URL
over
http://www.example.com/app/servlet?catid=12345
URL
Could there be any problems if we use first URL because initially we were using the first URL and change to second URL. This is in context of large constantly changing content on website. Here categories can be infinite in number.
In relation to a RESTful application, you should not care about the URL template. The "better" one is the one that is easier for the application to generate.
In relation to indexing and SEO, sorry, but it is unlikely that the search engines are going to understand your hypermedia API to be able to index it.
To get a better understanding in regards to the URLs, have a look at:
Is That REST API Really RPC? Roy Fielding Seems to Think So
Richardson Maturity Model
One difference is that the second URL doesn't name the categories, so the client code and indeed human users need to look up some category name to number mapping page first, store those mappings, use them all the time, and refresh the list when previously unknown categories are encountered etc.. Given the first URL you necessarily know the categories even if the item page doesn't mention them (but the site may still need a list of categories somewhere anyway).
Another difference is that the first format encodes two levels of categorisation, whereas the second hides the number of levels. That might make things easier or harder depending on how variable you want the depth to be (now or later) and whether someone inappropriately couples code to 2-level depth (for example, by parsing the URLs with a regexp capturing the categories using two subgroups). Of course, the same problem could exist if they couple themselves to the current depth of categories listed in a id->category-path mapping page anyway....
In terms of SEO, if this is something you want indexed by search engines the first is better assuming the category names are descriptive of the content under them. Most engines favor URLs that match the search query. However, if category names can change you likely need to maintain 301 redirects when they do.
The first form will be better indexed by search engines, and is more cache friendly. The latter is both an advantage (you can decrease the load on your server) and a disadvantage (you aren't necessarily aware of people re-visiting your page, and page changes may not propagate immediately to the users: a little care must be taken to achieve this).
The first form also requires (somewhat) heavier processing to get the desired item from the URL.
If you can control the URL syntax, I'd suggest something like:
http://www.example.com/app/servlet/cat1/cat2/item/12345
or better yet, through URL rewrite,
http://www.example.com/cat1/cat2/item/12345
where 12345 is the resource ID. Then when you access the data (which you would have done anyway), are able to do so quickly; and you just verify that the record does match cat1, cat2 and item. Experiment with page cache settings and be sure to send out ETag (maybe based on ID?) and Last-Modified headers, as well as checking If-Modified-Since and If-None-Match header requests.
What we have here is not a matter of "better" indexing but of relevancy.
And so, 1st URL will mark your page as a more relevant to the subject (assuming correlation between page/cat name and subject matter).
For example: Let`s say we both want to rank for "Red Nike shoes", say (for a simplicity sake) that we both got the same "score" on all SEO factors except for URL.
In 1st case the URL can be http://www.example.com/app/servlet/shoes/nike/red-nice
and in the second http://www.example.com/app/servlet?itemid=12345.
Just by looking on both string you can intuitively sense which one is more relevant...
The 1st one tells you up-front "Heck yes, I`m all about Red Nike Shoes" while the 2nd one kinda mumbles "Red Nike Shoes? Did you meant item code 12345?"
Also, Having part of the KW in the URL will help you get more relevancy and also it can help you win "long-tail" goals without much work. (just having KW in URL can sometimes be enough)
But the issue goes even deeper.
The second type of URL includes parameters and those can (an 99.9% will) lead to duplicated content issue. When using parameters you`ll have to deal with questions like:
What happens for non-existent catid?
Is there a parameter verification? (and how full proof is it?)
and etc.
So why choose the second version? Because sometime you just don`t have a choice... :)

REST best practice for getting a subset list

I read the article at REST - complex applications and it answers some of my questions, but not all.
I am designing my first REST application and need to return "subset" lists to GET requests. Which of the following is more "RESTful"?
/patients;listType=appointments;date=2010-02-22;user_id=1234
or
/patients/appointments-list;date=2010-02-22;user_id=1234
or even
/appointments/2010-02-22/patients;user_id=1234
There will be about a dozen different lists that I need to return. In some of these, there will be several filtering parameters and I don't want to have big 'if' statements in my server code to select the subsets based on which parameters are present. For example, I might need all patients for a specific doctor where the covering doctor is another and the primary doctor is yet another. I could select with
/patients;rounds=true;specific_id=xxxx;covering_id=yyyy;primary_id=zzzz
but that would require complicated branching logic to get the right list, where asking for a specific subset (rounds-list) will achieve that same thing.
Note that I need to use matrix parameters instead of query parameters because I need to do filtering at several levels of the URL. The framework I am using (RestEasy), fully supports matrix parameters.
Ralph,
the particular URI patterns are orthogonal to the question how RESTful your application will be.
What matters with regard to RESTfulness is that the client discovers how to construct the URIs at runtime. This can be achieved either with forms or URI templates. Both hypermedia controls tell the client what parameters can be used and where to put them in the URI.
For this to work RESTfully, client and server must know the possible parameters at design time. This is usually achieved by making them part of the specification of the link relationship.
You might for example define a 'my-subset' link relation to have the meaning of linking to subsets of collections and with it you would define the following parameters:
listType, date, userID.
In a link template that spec could be used as
<link rel="my-subset' template="/{listType}/{date}/patients;user_id={userID}"/>
Note how the actual parameter name in the URI is decoupled from the specified parameter name. The value for userID is late-bound to the URI parameter user_id.
This makes it possible for the URI parameter name to change without affecting the client.
You can look at OpenSearch description documents (http://www.opensearch.org) to see how this is done in practice.
Actually, you should be able to leverage OpenSearch quite a bit for your use case. Especially the ability to predefine queries would allow you to describe particular subsets in your 'forms'.
But see for yourself and then ask back again :-)
Jan
I would recommend that you use this URL structure:
/appointments;user_id=1234;date=2010-02-22
Why? I chose /appointments because it is simple and clear. (If you have more than one kind of appointment, let me know in the comments and I can adjust my answer.) I chose the semicolons because they don't imply hierarchy between user_id and date.
One more thing, there is no reason why you should limit yourself to just one URL. It is just fine to have multiple URL structures that refer to the same resource. So you might also use:
/users/1234/appointments;date=2010-02-22
To return a similar result.
That said, I would not recommend using /dates/2010-02-22/appointments;user_id=1234. Why? I don't think, in practice, that /dates refers to a resource. Date is an attribute of an appointment but is not a noun on its own (i.e. it is not a first-class kind of thing).
I can relate to what David James answered.
The format of your URIs can be like he suggested:
/appointments;user_id=1234;date=2010-02-22
and / or
/users/1234/appointments;date=2010-02-22
while still maintaining the discoverability (at runtime) of your resource's URIs (like Jan Algermissen suggested).

How can I generate a unique, small, random, and user-friendly key?

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.