What are the benefits of
http://www.example.com/app/servlet/cat1/cat2/item
URL
over
http://www.example.com/app/servlet?catid=12345
URL
Could there be any problems if we use first URL because initially we were using the first URL and change to second URL. This is in context of large constantly changing content on website. Here categories can be infinite in number.
In relation to a RESTful application, you should not care about the URL template. The "better" one is the one that is easier for the application to generate.
In relation to indexing and SEO, sorry, but it is unlikely that the search engines are going to understand your hypermedia API to be able to index it.
To get a better understanding in regards to the URLs, have a look at:
Is That REST API Really RPC? Roy Fielding Seems to Think So
Richardson Maturity Model
One difference is that the second URL doesn't name the categories, so the client code and indeed human users need to look up some category name to number mapping page first, store those mappings, use them all the time, and refresh the list when previously unknown categories are encountered etc.. Given the first URL you necessarily know the categories even if the item page doesn't mention them (but the site may still need a list of categories somewhere anyway).
Another difference is that the first format encodes two levels of categorisation, whereas the second hides the number of levels. That might make things easier or harder depending on how variable you want the depth to be (now or later) and whether someone inappropriately couples code to 2-level depth (for example, by parsing the URLs with a regexp capturing the categories using two subgroups). Of course, the same problem could exist if they couple themselves to the current depth of categories listed in a id->category-path mapping page anyway....
In terms of SEO, if this is something you want indexed by search engines the first is better assuming the category names are descriptive of the content under them. Most engines favor URLs that match the search query. However, if category names can change you likely need to maintain 301 redirects when they do.
The first form will be better indexed by search engines, and is more cache friendly. The latter is both an advantage (you can decrease the load on your server) and a disadvantage (you aren't necessarily aware of people re-visiting your page, and page changes may not propagate immediately to the users: a little care must be taken to achieve this).
The first form also requires (somewhat) heavier processing to get the desired item from the URL.
If you can control the URL syntax, I'd suggest something like:
http://www.example.com/app/servlet/cat1/cat2/item/12345
or better yet, through URL rewrite,
http://www.example.com/cat1/cat2/item/12345
where 12345 is the resource ID. Then when you access the data (which you would have done anyway), are able to do so quickly; and you just verify that the record does match cat1, cat2 and item. Experiment with page cache settings and be sure to send out ETag (maybe based on ID?) and Last-Modified headers, as well as checking If-Modified-Since and If-None-Match header requests.
What we have here is not a matter of "better" indexing but of relevancy.
And so, 1st URL will mark your page as a more relevant to the subject (assuming correlation between page/cat name and subject matter).
For example: Let`s say we both want to rank for "Red Nike shoes", say (for a simplicity sake) that we both got the same "score" on all SEO factors except for URL.
In 1st case the URL can be http://www.example.com/app/servlet/shoes/nike/red-nice
and in the second http://www.example.com/app/servlet?itemid=12345.
Just by looking on both string you can intuitively sense which one is more relevant...
The 1st one tells you up-front "Heck yes, I`m all about Red Nike Shoes" while the 2nd one kinda mumbles "Red Nike Shoes? Did you meant item code 12345?"
Also, Having part of the KW in the URL will help you get more relevancy and also it can help you win "long-tail" goals without much work. (just having KW in URL can sometimes be enough)
But the issue goes even deeper.
The second type of URL includes parameters and those can (an 99.9% will) lead to duplicated content issue. When using parameters you`ll have to deal with questions like:
What happens for non-existent catid?
Is there a parameter verification? (and how full proof is it?)
and etc.
So why choose the second version? Because sometime you just don`t have a choice... :)
Related
I have been struggling to find information on how a resource that contains generated values is modified. Below is a real world example:
Let's say we have 2 endpoints:
/categories and /products.
A category is used to contain various parameters that define any product belonging to it. For example, based on a category a product expiration date might be calculated, or some other properties might or might not be attached to a product.
Let's say we create a new product by sending a POST request to /products and among other fields we include the category ID property. Based on the category set a server creates and stores a new product along with various properties generated (expiration date, delivery policies) etc.
Now the problem arises when needing to modify (PATCH/ PUT) the mentioned product.
How are generated values edited? We can for example change a delivery policy, but then the product will contain a field that doesn't match what its attached category describes. Likewise, it might be very handy to modify its generated expiration date, however yet again that can create confusion about why a category says it should expire in 3 days but the product is set to expire in 20 days.
Another solution would be to make all these properties read-only and only allow regenerating them by changing the category, just like at creation.
However that poses 2 problems:
The biggest one being that a different category might not contain the same policy layout. For example, one category might enable generating GPS coordinates to ease the delivery, the other category does not. If we change the category, what do we do with these valuable properties already present? Do we drop them for the sake of clarity?
Another issue is limited flexibility. There might be cases when a property needs to be changed but the category needs to remain the same.
I think these questions are met and answered in probably every single REST API development and probably I am just missing something very simple and obvious. Could you help me understand the right way of going about this?
Thank you very much.
I think these questions are met and answered in probably every single REST API development and probably I am just missing something very simple and obvious. Could you help me understand the right way of going about this?
You write code to ensure that all of the invariants hold for the server's copy of the resource.
That can mean either (a) inspecting the body of the request, and returning a client error if the body doesn't satisfy the constraints you need to maintain, or (b) changing your resource in a way that doesn't exactly match the request you've received.
In the second case, you need to have a little bit of care with the response metadata, so that you don't imply that the representation of the request has been adopted "as is".
The code you are writing here is part of the origin server's implementation, deliberately hidden by the HTTP facade you present. The general purpose components in the middle don't care about those details; they just want you to use messaging semantics consistent with the HTTP (and related) specifications.
I'm creating a restful web service. As JPA I use Hibernate.
I have such entities like Country, City, Store, Sale and others.
Is it a good idea to have URIs like this in terms of length and nesting: http://example.com/countries/{countryId}/cities/{cityId}/sales/{saleId}/article/{articleId} ?
Is there any rule of number of nestings? I mean the number of pairs "entity/{entityId}" ?
URIs are opaque. As far as HTTP and the RESTful principles behind it go, there's no difference between http://example.com/countries/{countryId}/cities/{cityId}/sales/{saleId}/article/{articleId}, http://example.net/sfdaikwjepfiaosnd and http://example.org/ followed by the URI-encoded contents of Finnegans Wake. Indeed it's perfectly possible for all three of those to be URIs for the same resource, perhaps with one permanently redirecting to the other.
So it's non-REST concerns that are most at work here.
One is that if you are at risk of going beyond practical size limits you will have problems.
Another is that entities containing lots of URIs will obviously be shorter if those URIs are small. It's not generally a big concern, but it does have a bit of an effect on network use if URIs are truly massive and every entity contains kilobyte upon kilobyte of such URIs.
Another is just how useful the modelling is: If calling code will never care about the country when it wants an article then your modelling isn't helping that calling code.
Related to that is the practicality of relative links in helping the HATEOS aspect of REST: If you are likely to often be able to just have article/2 as a relative link that is useful in the entity describing a sale, or a (potentially hard-coded) ../../ to get to the entity describing a sale from one describing an article then this is convenient. The question is are you making the most useful links the most convenient. For example, if it is far more common to go from country to city than country to anything else, then why have the /cities/ part to the path, and not just {countryid}/{cityid}?
This relative link matter can counter the question of large URIs causing large entities: If the majority of URIs in an entity are "close" to the resource the entity describes then the majority of URIs can be represented by very small relative URIs.
Another aspect is whether those IDs are human-reader-friendly. Is the ID for New York something like 193 or something like NewYork or New%20York or perhaps Nueva%20York? To a REST perspective those are all of equal value, but 193 gives shorter URIs with advantages noted above while the others are handier to deal with as a consuming developer or when debugging.
Nesting also affects this human-readable aspect. On the one hand lots of nesting can make each element within that simple to identify but on the other it can be confusing in itself to have too many sections in the path. For the most part though if the split is understandable then a human reader can filter out much of the URI and focus on the part they care about.
In all the only hard limit is that of the practical URI size limits, and beyond that there aren't so much strict rules as pros and cons to consider the trade-offs of in your design.
We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.
I'm creating v2 of an existing RESTful web api.
The responses are JSON lists of objects, roughly in the form:
[
{
name1=value1,
name2=value2,
},
{
name1=value3,
name2=value4,
}
]
One problem we've observed with v1 is that some clients will access fields by integer position, instead of by name. This means that if we decide to add fields to the response (which we had originally considered a compatibility-preserving change), then some of our client's code breaks, unless we add the fields at the end. Even then, other clients code breaks anyway, because they will fail in some way when they encounter an unexpected attribute name.
To counter this in v2, we are considering randomly reordering the fields in every response. This will force clients to index fields by name instead of by position.
Additionally, we are considering adding a randomly-named field to every response. This will force clients to ignore fields they don't recognize.
While this sounds somewhat barmy, it does have the advantage that we will be able to add new fields, safe in the knowledge that this isn't breaking any clients. This means we can issue compatible updates to v2.1, v2.3, etc at the same URL, and that means we will only have to maintain & support a smaller number of API versions.
The alternative is to issue compatibility-breaking v3, v4, at new URLs, which means that we will have to maintain & support many incompatible API versions, which will stretch us that little bit thinner.
Is this a bad idea, and if so, why? Are there any other similar ideas I should think about?
Update: The first couple of responses are pointing out that if I document the issue (i.e. indicate in the docs that fields may be added or reordered) then I am no longer to blame if client code breaks when I subsequently add or reorder fields. Sadly I don't think this is an appropriate option for us: Many dozens of organisations rely on the functionality of our APIs for real-world transactions with substantial financial impact. These organisations are not technically oriented - and the resulting implementations at the client end cover the whole spectrum of technical proficiency. We already did document that fields may get added or reordered in the docs for v1, and that clearly didn't work, because now we're having to issue v2 because many clients, due to lack of time or experience or ability, still wrote code that breaks when we add new fields. If I were now to add fields to the interface, it breaks a dozen different company's interfaces to us, which means they (and us) are bleeding money every minute. If I were to refuse to revert the change or fix it, saying "They should have read the docs!", then I will soon be out of the job, and rightly so. We may attempt to educate the 'failing' partners, but this is doomed to fail as the problem gets larger every month as we continue to grow. My question is, can I systemically head the whole issue off at the pass, preventing this situation from ever arising, no matter what clients try to do? If the techniques I suggest would work, why shouldn't I use them? Why isn't everyone else using them?
If you want your media types to be "evolvable", make that point very clear in the documentation. Similarly, if the placement order of fields is not guaranteed, make that explicitly clear too. If you supply sample code for your API, make sure it does not rely on field ordering.
However, even assuming that you have to maintain different versions of your media types, you don't have to version the URI. REST gives you the ability to maintain the same version-agnostic URI but use HTTP content negotiation (via the Accept and Content-Type headers) to offer different payloads at the same URI.
Therefore any client that doesn't explicitly wish to accept your new v2/v3/etc encoding won't get it. By default, you can return the old v1 encoding with the original field ordering and all of those brittle client apps will work fine. However, new client developers will know (thanks to your documentation) to indicate via Accept that they are willing and able to see the new fields and they don't care about their order. Best of all, you can (and should) use the same URI throughout. Remember - different payloads like this are just different representations of the same underlying resource, so the URI should be the same.
I've decided to run with the described techniques, to the max. I haven't heard any objections to them that hold any water for me. Brian's answer, about re-using the same URI for different API versions, is a solid and much-appreciated complementary idea (with upvote), but I can't award it 'best answer' because it doesn't get to the core of my original question.
I read the article at REST - complex applications and it answers some of my questions, but not all.
I am designing my first REST application and need to return "subset" lists to GET requests. Which of the following is more "RESTful"?
/patients;listType=appointments;date=2010-02-22;user_id=1234
or
/patients/appointments-list;date=2010-02-22;user_id=1234
or even
/appointments/2010-02-22/patients;user_id=1234
There will be about a dozen different lists that I need to return. In some of these, there will be several filtering parameters and I don't want to have big 'if' statements in my server code to select the subsets based on which parameters are present. For example, I might need all patients for a specific doctor where the covering doctor is another and the primary doctor is yet another. I could select with
/patients;rounds=true;specific_id=xxxx;covering_id=yyyy;primary_id=zzzz
but that would require complicated branching logic to get the right list, where asking for a specific subset (rounds-list) will achieve that same thing.
Note that I need to use matrix parameters instead of query parameters because I need to do filtering at several levels of the URL. The framework I am using (RestEasy), fully supports matrix parameters.
Ralph,
the particular URI patterns are orthogonal to the question how RESTful your application will be.
What matters with regard to RESTfulness is that the client discovers how to construct the URIs at runtime. This can be achieved either with forms or URI templates. Both hypermedia controls tell the client what parameters can be used and where to put them in the URI.
For this to work RESTfully, client and server must know the possible parameters at design time. This is usually achieved by making them part of the specification of the link relationship.
You might for example define a 'my-subset' link relation to have the meaning of linking to subsets of collections and with it you would define the following parameters:
listType, date, userID.
In a link template that spec could be used as
<link rel="my-subset' template="/{listType}/{date}/patients;user_id={userID}"/>
Note how the actual parameter name in the URI is decoupled from the specified parameter name. The value for userID is late-bound to the URI parameter user_id.
This makes it possible for the URI parameter name to change without affecting the client.
You can look at OpenSearch description documents (http://www.opensearch.org) to see how this is done in practice.
Actually, you should be able to leverage OpenSearch quite a bit for your use case. Especially the ability to predefine queries would allow you to describe particular subsets in your 'forms'.
But see for yourself and then ask back again :-)
Jan
I would recommend that you use this URL structure:
/appointments;user_id=1234;date=2010-02-22
Why? I chose /appointments because it is simple and clear. (If you have more than one kind of appointment, let me know in the comments and I can adjust my answer.) I chose the semicolons because they don't imply hierarchy between user_id and date.
One more thing, there is no reason why you should limit yourself to just one URL. It is just fine to have multiple URL structures that refer to the same resource. So you might also use:
/users/1234/appointments;date=2010-02-22
To return a similar result.
That said, I would not recommend using /dates/2010-02-22/appointments;user_id=1234. Why? I don't think, in practice, that /dates refers to a resource. Date is an attribute of an appointment but is not a noun on its own (i.e. it is not a first-class kind of thing).
I can relate to what David James answered.
The format of your URIs can be like he suggested:
/appointments;user_id=1234;date=2010-02-22
and / or
/users/1234/appointments;date=2010-02-22
while still maintaining the discoverability (at runtime) of your resource's URIs (like Jan Algermissen suggested).