Sling provides a functionality to ease resource resolution. It's ability to resolve to exact resource representation we need is very useful in content based application.
However I am not able to understand one question is the use of suffix.
Example:
http://localhost:4502/content/app/mycomponent.large.html/something.html
Here, "something.html" is the suffix. I want to know under what circumstances would I go for a suffix ? What advantages do we get when compared to passing the information as a selector ?
Pretty hard question, but I will try to clear up it a bit.
According to best practices, selectors should not be treated as input parameters in functions. It means, that you should use selectors only for registering servlets (or JSP file names) and selectors should notify sling about the operation you want to do with given resource or the way it should be displayed.
For example, let's imagine, that you have page /page/a.html and you have some special representation for mobile devices. Then, accessing it with /page/a.mobile.html will open this page in a mobile friendly way.
On the other hand, suffix usually used to provide additional information to your servlet/JSP page. Just check editor interface in TouchUI: the url looks like
localhost:4502/editor.html/content/pageYouEdit.html
So you always stays on the same page /editor.html, but suffix notifies Edit Interface which page to edit.
Also another example:
there is a nice library for include content dynamically - https://github.com/Cognifide/Sling-Dynamic-Include.
When it's configured for some component, then after the page is loaded, your component will be included with AJAX call, like this:
publish/pathToThePage/_jcr_content/pathToTheComponentNode.nocache.html//apps/pathToTheRenderer
In this example, you can see, that both selector and suffix is used. Selector tells, what is special about a representation of this component we need and suffix tells which component should render requested data.
It's used to provide different versions of a resource, which are cacheable. This plays nicely with the Apache HTTP module known as "Dispatcher" which Adobe architects will recommend in any AEM implementation.
http://me.com/page.html/todays_promotion <-- cacheable
http://me.com/page.html?todays_promotion <-- not cacheable
The second example there, with a request parameter, should be treated as a variable resource that could produce different results upon each request.
Related
I was wondering if it is technically RESTful to use an identifier as a grouping and not be a particular resource (no corresponding id).
For example:
get /location/address
get /location/coverage
get /location/region
These things are all a location hence having them behind the location identifier. Is this correct?
Or is it better to rethink the structure of these endpoints or break them into just /address - /coverage - /region?
First and foremost, a URI in a REST architecture is just a URI, a reference to an other resource. The actual spelling used inside the URI is furthermore not of relevance as the URI as a whole is a pointer that shouldn't get segmented or analyzed. While a set of URIs may span a whole tree of available paths to use, there is no requirement actually therefore.
An URI itself also does not necessarily tell anything about the content of the resource it points to. Fielding even claimed that
A REST API should never have “typed” resources that are significant to the client. Specification authors may use resource types for describing server implementation behind the interface, but those types must be irrelevant and invisible to the client. The only types that are significant to a client are the current representation’s media type and standardized relation names. (Source)
This blog post further describes what Fielding meant with this statement which basically just states that proper content type negotiation should be used instead of assuming that a resource has a certain type. On the Web you might retrieve a data page of your preferred car or sports team though you will most-likely receive a page in HTML format that is generic enough and therefore widely spread that plenty of clients (browsers) support it. Through the affordance of each of the defined elements your client (browser) knows how to present these to you for more convenient access of the content.
I'm not sure about you but if I have the choice between reading some line summarizing the content of a link and having to decipher the URI itself, I definitely prefer the former one. Such metadata is usually attached to a link in some way. In HTML the anchor (<a>) tag not only ships with meta data such as href or target but also rel which basically allows to annotate a URI with a set of keywords that tell a client about the purpose of the link. Such a keywords may be something like first, last, prev or next for paging through a sub-set of a collection, or preload for telling a client that it can load the content of the referenced URI early to speed up load times as the client will most likely be interested in the content of that link next. Such keywords should be standardized to gain wider acceptance or at least base off of an extension mechanism defined in Web Linking, as Dublin Core does it i.e.
Looking at your respective URIs they already seem to express what link relation are there for. As such they could make up for good candidates for defining them as link relations such as:
http://www.acme.com/rel/address
http://www.acme.com/rel/coverage
http://www.acme.com/rel/region
A link can basically have multiple relation names assigned simultaneously, depending on the media-type the payload is exchanged for. A client that does not know what a certain link-relation means will simply ignore it, clients that however have the knowledge will act accordingly upon finding URIs with such annotations. I admit that an arbitrary client will not be able to make use of all of the link-relations out of the box, especially the extension ones, but such link-relation names might be enforced by media-types as well or support for such relation names may be added through updates or plug-ins later on.
Media types, after all, are just a human-readable description on how a payload received for such a form will look like and how it has to be processed by some automata. Hence, a generic application/json is usually a bad media-type in a REST API as it just defines that the content should be embedded between curly braces and primarily represents a key-value structure with the addition of objects and arrays, but it lacks to hint a client on what a link is or on the semantics of certain elements found within the payload. application/hal+json is an extension of the basic JSON notation and adds some processing rules and semantic definitions of certain elements such as _links or _embedded, which a client can use. Here, curies could be used a link-relations as well I guess as in the end they also end up with unique URIs as requested by the Web linking extension definition.
Certain media-types also allow to pass along further processing hints in the media-type itself through the use of profiles. As such a server could hint a client i.e. that the collection expressed by the requested resource contains entries that follow a certain logic, i.e. that the collection contains a set of orders, where a processing entity can apply additional checks upon, i.e. that certain fields must be specified or that certain inputs have to be in bound between two values and stuff.
As writing a whole new media-type is quite some effort, investigating into already defined ones is for sure a good idea. If you really think there is no media type available yet that really is applicable to your domain, you should start to write one, probably in a community effort. Keep in mind though that this media-type should be as generic as possible to allow adoption by third parties otherwise hardly any client will really support it and thus limit the number of potential peers your applications can interact with. It is further a good idea to take reference at the HTML spec to see how elements got defined and how they maintained backwards compatibility as you don't want to register a new media-type on each change.
If you just want to implement some filter mechanism to only show locations that represent addresses rather than regions you may take reference on how it is done on the Web. Here usually a server will provide you a form to select a choice among the given set of choices and upon clicking the submit button (or enter key) a request is issued to the server which will return the subset of entries matching the query. Instead of using a form a server may already provide a client with a link that is annotated, as mentioned above, with some hints a client knows that they refer to addresses, regions or whatever options you have available and can chose based on the annotated link relations the URI which is of interest to the client. Again, the actual form of the URI is not of relevance here either as a client should just use the URIs that were provided by the server.
You might ask yourself why all that is needed?! The basic answer to this question is simply to avoid breaking things when stuff changes over time. Think of a case where a client wants to retrieve details on a previously sent order. If it had the knowledge of the URI hardcoded into it and the server changes the URI style the client might not be able to retrieve this data. On the other side if it had the knowledge to look for any URI annotated with http://www.acme/rel/order and just use that URI it couldn't care less if the URI changed as it just uses the given information to send it to the mentioned endpoint. By relying on well-defined media-types and the semantics of the defined elements, any peer supporting this media-type will also be able to process it. Almost every client is able to handle HTML in some way, though hardly any generic HTTP client can really act upon a custom JSON payload format, i.e. present you a clickable link or render a nice form you can update the data of an existing resource. Custom payload makes it also difficult to reuse the same client for different endpoints. I.e. you probably wont be able to use the same client to shop on two different web-shops that expose Web-RPC-based HTTP endpoints.
So, to sum up my post, as mentioned, the form of the URI is not really of relevance in a REST architecture as a server should always provide it to the client anyway and a client shouldn't deduct the meaning from it. Instead link-relations, media-type support and content-type negotiation should be used consistently.
REST doesn't care what spellings you use for your URI, beyond the fact that they should be consistent with the production rules defined in RFC 3986.
GET /location/region
GET /region
Those are both fine.
Recently, I've designing a RESTful API, and I want to use the Link header field to implement HATEOAS.
This is all fine and works without any real problem, but I want to make it easier for the clients of the API.
The Link header for example might look like this:
Link: <https://api.domain.com/orders/{id}>; rel="https://docs.domain.com/order"
In this case the client would have to find the link by searching for the https://docs.domain.com/order value in the rel attribute.
This works, but since the URI for the docs could change it's fragile, and because of the length it makes it a bit impractical as a key to search for.
So I want to try and use a CURIE to make it something like this instead:
Link: <https://api.domain.com/orders/{id}>; rel="[rc:order]"
That way the problem of a URI changing is mitigated for the most part, and it's much more compact allowing for easier search by the client.
My problem is, that since I'm using a Link header field to implement HATEOAS I feel it would be most consistent to also include the CURIE as an HTTP header field, rather than introducing meta data in the response body.
Since the entire API uses standard HTTP header fields and status codes for all of it's meta data (pagination, versioning etc), I would rather not introduce meta data into the response body just to define a CURIE.
But if I use HTTP header fields, which field should I use for a CURIE?
Is there even a standard way to do this with HTTP header fields?
If not, should I just use a custom header field, like X-Curie: <https://docs.domain.com>; name="rc", and just document it somewhere for clients?
I've looked around and most resources are either in reference to XML or the HAL standard, so any information on this in relation to HTTP header fields would be appreciated.
No, you can't do that. The link relation is a string - nothing more.
The question that you should ask yourself is why you are using an unstable link relation name in the first place...
Even if you don't use the Link header, CURIE's would not solve the problem you present. Because the CURIE's standard state that a shortened URI must be "unwrapped" before any comparison is performed. This would also apply to comparison agains the link relation in question.
A more pragmatic approach would be to coin your own URI like foo:order. Then you can use some custom url shortening method of resolving the documentation url for the relation in question. This method is used by hypermedia formats like HAL+JSON (the HAL formats use of curies is actually misleading, it should only be used as a method for resolving URL's to documentation).
CURIEs in HTTP Link header's rel properties would not get expanded, because all rel properties are equated with simple string matches, none are treated as URIs.
My main concern would be the phrase "since the URI for the docs could change it's fragile" — then choose a URI which won't change. Even if you use a URL that redirects to some location deep in the docs, the URI you choose for the link relation needs to be permanent if you want client devs to be able to resolve it.
I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.
If I have a URL that represents a collection, is there a good way to describe filters?
e.g. http://example.com/comic_books?after=2001-01-01&before=2002-03-09
If I make these filters part of the service contract, aren't I violating the idea of hypermedia as the engine of application state?
Do I need to have another resource that links to my collection and describes the filters e.g. via an HTML form?
You can consider an HTML form as a URL template representing your resource and the filter, there's nothing wrong with that, we all do it every day (google.com?s=query). There are those that argue that you don't need a form to represent a URL template, that documentation alone is adequate, so for many the form itself is optional.
The hyper media aspect is mostly related to the presence of the links themselves. Documenting the service is not an "out of band" consideration. What you want though is to present the link options as part of the hyper media that the client can follow. The form can be nice, but it's not required. You can use (even require) the same link and query parameters for the filter without explicitly listing them as part of the payload.
Consider HAL as an alternative to HTML, and it has no concept of forms, yet is consider by many to be a fine hyper media compatible media type.
This seems to be the REST principal that I've had the hardest time wrapping my head around. I understand that when desiging a rest api most of the effort should go into designing/describing hypertext for the application. Any pointers to real world applications of this principal ? How does atom protocol apply this principal ?
Can some one explain that in simple terms how one would apply that to a hypothetical shopping cart rest api.
When attempting to explain hypermedia, I like to use the example of navigating in a car via signposts versus a map. I realize it doesn't directly answer you question but it may help.
When driving a car and you reach a particular intersection you are provided signposts to indicate where you can go from that point. Similarly, hypermedia provides you with a set of options based on your current state.
A traditional RPC based API is more like a map. With a map you tend to plan your route out based on a static set of road data. One problem with maps is that they can become out of date and they provide no information about traffic or other dynamic factors.
The advantage of signposts is that they can be changed on the fly to detour traffic due to construction or to control traffic flow.
I'm not suggesting that signposts are always a better option than a map. Obviously there are pros and cons but it is valuable to be aware of both options. It is the same with hypermedia. It is a valuable alternative to the traditional RPC interface.
Consider yourself navigating a regular website. When you visit, you read the contents of the pages, and based on what you've read and what you want to do, you follow various links on the page. That's really the core of what "hypertext as the engine of application state" boils down to. In this example, application state is the state in your head and the page you're on. Based on that, you traverse further links, which alters the application state in your head.
There's one other element to it, mind: the other side of it is that you shouldn't need to guess those URIs: there should be enough context in the page to infer the URIs (such as the information the application would have of the content type, and things like URI template), or the URIs to follow should be present. Beyond that, a RESTful HTTP application shouldn't care about the structure of the URIs.
UPDATE: To expand on things, HTML forms demonstrate HATEOAS too. Forms that use GET are analogous to the use of URI templates. And HATEOS isn't limited to just traversing links using HTTP GET: forms using POST (or some other method, if the browser just happens to support it) can be though of as describing a representation to send to the server.
Another way to look at this concept is that the state is represented by the current page and the links embedded in it. Traversing a link changes the state of the application which is represented by the next page. It is a little hard to explain... the links that are available at any point in time define what actions are available based on the actions that have already occurred. This is one definition of "the current state".
The trick is to represent the available actions are URIs which "act" on a resource. Retrieving the representation associated with a URI implicitly performs the action and retrieves the representation that results. URIs are embedded in the representation and the user understands the action associated with a specific URI. The various HTTP methods help define the "actions" that occur and specifies when no action is allowed. This is usually what people are getting at when describing the whole RESTful paradigm.