I can't find any traces of etags support in Wicket. There is a way to use etags with it?
No, there's no ETag support. Wicket is not made to serve static content. If you have to serve many static resource you can use another framework like Spring MVC in addition to Wicket.
Wicket supports caching via a strong caching mechanism. Resources are mounted to URLs which contain the timestamp of the file (in development mode) or a MD5 hash (in deployment mode). This makes resources unique and they can be cached forever, because if the content changes, the URL also changes. The Expires header is automatically set by Wicket to one year from the current date. This makes the use of a weak hashing mechanism like ETag unnecessary for such resources.
You can change this behavior by setting an IResourceCachingStrategy in IResourceSettings.
Related
I want to add a custom key to the manifest.json file for a progressive web app.
The MDN page doesn't mention custom keys:
Web App Manifest | MDN
The spec:
Web App Manifest
includes this text in the section "3.1 Media type registration" under a sub-heading "Security and privacy considerations":
As the manifest format is JSON and will commonly be encoded using [UNICODE], the security considerations described in [ECMA-404] and [UNICODE-SECURITY] apply. In addition, because there is no way to prevent developers from including custom/unrestrained data in a manifest, implementors need to impose their own implementation-specific limits on the values of otherwise unconstrained member types, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
Are there known limitations or restrictions on the use of custom keys in manifest.json files?
According to the standard SO-59512547
Browsers shall ignore any values starting with X- which is a common
abbreviation for custom headers in HTTP and E-Mail. As they are to be
used exclusively by developers.
My use case is sending boot up data early with HTTP 2.0, things that would normally be in headers or env variables, but that I would non the less want to keep dynamic... such as socket endpoints, custom loading UI and console logging level. Always loading manifest.json is extremely common, and thus it can serve as standard boot config file better than a custom named json file that we would have to tell the server we want prefectched along any request for index.html.
Australians, Mooners, Martians will thank me.
I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.
I am designing a REST API and lately I put some thought on how to make most of caching for dynamic content (after the response that I got on this topic), while respecting the principles of HTTP (and thus REST).
Obviously the canonical solution (at least in my understanding) is to use etags, but this will not decrease the number of requests in any way, just the size.
I was thinking of embedding a version in the URL (it will be server produced, based on the actual content - be it serial number or some hash). I will explain the scheme and the user scenario and how I think it will help, and then ask my questions.
Setup
GET /entity/{id}/
returns temporary redirect to /entity/{id}/{current_version} and no-cache headers.
GET /entity/{latest_version}/
returns OK response with cache forever.
GET /entity/{old_version}/
returns 410 Gone (I don't want to actually keep old versions).
GET /entity/?[query]
is some search that returns a list of links to current versions of result entities. No cache.
Use scenario and how I think it would help
User application (AJAX) will always start with some kind of query, then it has to pull the descriptions of entities. Since it is expected that changes for a single client result set will not be very dynamic, it seems good idea to use the above scheme and client pull fresh results from the query every time, but if most of the entities did not change since last visit, they will be already cached in browser. If this hypothesis is true, this will lead to significant decrease in the number of requests, as well as total size.
Using etags would result in much simpler URI scheme, but probably more complicated and heavy server side implementation.
Notes and questions
1
I know somebody will propose that /entity/{id}/ should be a collection that returns list versions, but versions are not actually stored, useful or desired. It is more a synonym for the latest one. My question here is if somebody sees any problem with that, besides general principles. This is protected API, I do not care about SEO in this case and it is transparent for client. Actually, as API will be more or less hyperlinked, it is not expected to actually call /entity/{id}/ directly normally, but use whatever results returns. It can be used, for example for context free links.
2
I have some doubts for 410 Gone for old versions. On one hand this version is not available anymore and clients should not be accessing it anyway. On the other hand, if client asks for it after all (for whatever reason), it may make sense to return permanent redirect to /entity/{id}/ (probably better that temporary redirect to current version).
3
Speaking of redirects. 301 is cemented for permanent redirect, but is 302 the best choice for temporary? Most important is browser support (it will be AJAX).
4
Of course, the main issue is the usage of URLs instead of etags for caching (hoping on the browser caches). If somebody has real experience under high load (relative to servers capabilities, cough), I will appreciate sharing it.
Additional notes
After some more research there is an issue with versioned resources and it is propagation of updates for linked resources. There are two options:
Link a specific version of the resource. This means that server side logic will be heavy and cumbersome, as updates have to be propagated for linked resources through reverse links;
Link the /latest/ version. This means that even if both resource and linked resourced concrete versions are cached locally, clients (browsers) will have to make a request to /latest/ in order to 'check' latest version of a linked resource. Of course it is a small request (only redirect) and if resource didn't change location is already cached. One problem may be that resources are often pulled from such links (in opposite to query result to particular version). Another (much worse) problem is that actually old version of the resource is linking the newest version of another - it can be data inconsistency (i.e. somebody edited document and also changed a linked attachment - client will have old version of the document and new one for the attachment).
Both options are unsatisfactory. In this light caching of dynamic data is possible only for 'leaf' level resources - ones that do not link to any other, bust just have direct attribute values.
Final notes
After research and discussions, versioned resources are not the brightest idea as general architecture. After measurement and given the opportunity, something can be retrofitted in a canonical API for 'plain' resources. I would accept Roysvork's comment (' It is my opinion that the reason this is difficult is that it is not really a very good idea.') as solution, if it was a separate answer :)
After having read a lot of material on REST versioning, I am thinking of versioning the calls instead of the API. For example:
http://api.mydomain.com/callfoo/v2.0/param1/param2/param3
http://api.mydomain.com/verifyfoo/v1.0/param1/param2
instead of first having
http://api.mydomain.com/v1.0/callfoo/param1/param2
http://api.mydomain.com/v1.0/verifyfoo/param1/param2
then going to
http://api.mydomain.com/v2.0/callfoo/param1/param2/param3
http://api.mydomain.com/v2.0/verifyfoo/param1/param2
The advantage I see are:
When the calls change, I do not have to rewrite my entire client - only the parts that are affected by the changed calls.
Those parts of the client that work can continue as is (we have a lot of testing hours invested to ensure both the client and the server sides are stable.)
I can use permanent or non-permanent redirects for calls that have changed.
Backward compatibility would be a breeze as I can leave older call versions as is.
Am I missing something? Please advise.
Require an HTTP header.
Version: 1
The Version header is provisionally registered in RFC 4229 and there some legitimate reasons to avoid using an X- prefix or a usage-specific URI. A more typical header was proposed by yfeldblum at https://stackoverflow.com/a/2028664:
X-API-Version: 1
In either case, if the header is missing or doesn't match what the server can deliver, send a 412 Precondition Failed response code along with the reason for the failure. This requires clients to specify the version they support every single time but enforces consistent responses between client and server. (Optionally supporting a ?version= query parameter would give clients an extra bit of flexibility.)
This approach is simple, easy to implement and standards-compliant.
Alternatives
I'm aware that some very smart, well-intentioned people have suggested URL versioning and content negotiation. Both have significant problems in certain cases and in the form that they're usually proposed.
URL Versioning
Endpoint/service URL versioning works if you control all servers and clients. Otherwise, you'll need to handle newer clients falling back to older servers, which you'll end up doing with custom HTTP headers because system administrators of server software deployed on heterogeneous servers outside of your control can do all sorts of things to screw up the URLs you think will be easy to parse if you use something like 302 Moved Temporarily.
Content Negotiation
Content negotiation via the Accept header works if you are deeply concerned about following the HTTP standard but also want to ignore what the HTTP/1.1 standard documents actually say. The proposed MIME Type you tend to see is something of the form application/vnd.example.v1+json. There are a few problems:
There are cases where the vendor extensions are actually appropriate, of course, but slightly different communication behaviors between client and server doesn't really fit the definition of a new 'media type'. Also, RFC 2616 (HTTP/1.1) reads, "Media-type values are registered with the Internet Assigned Number Authority. The media type registration process is outlined in RFC 1590. Use of non-registered media types is discouraged." I don't want to see a separate media type for every version of every software product that has a REST API.
Any subtype ranges (e.g., application/*) don't make sense. For REST APIs that return structured data to clients for processing and formatting, what good is accepting */* ?
The Accept header takes some effort to parse correctly. There's both an implied and explicit precedence that should be followed to minimize the back-and-forth required to actually do content negotiation correctly. If you're concerned about implementing this standard correctly, this is important to get right.
RFC 2616 (HTTP/1.1) describes the behavior for any client that does not include an Accept header: "If no Accept header field is present, then it is assumed that the client accepts all media types." So, for clients you don't write yourself (where you have the least control), the most correct thing to do would be to respond to requests using the newest, most prone-to-breaking-old-versions version that the server knows about. In other words, you could have not implemented versioning at all and those clients would still be breaking in exactly the same way.
Edited, 2014:
I've read a lot of the other answers and everyone's thoughtful comments; I hope I can improve on this with the benefit of a couple of years of feedback:
Don't use an 'X-' prefix. I think Accept-Version is probably more meaningful in 2014, and there are some valid concerns about the semantics of re-using Version raised in the comments. There's overlap with defined headers like Content-Version and the relative opaqueness of the URI for sure, and I try to be careful about confusing the two with variations on content negotiation, which the Version header effectively is. The third 'version' of the URL https://example.com/api/212315c2-668d-11e4-80c7-20c9d048772b is wholly different than the 'second', regardless of whether it contains data or a document.
Regarding what I said above about URL versioning (endpoints like https://example.com/v1/users, for instance) the converse probably holds more truth: if you control all servers and clients, URL/URI versioning is probably what you want. For a large-scale service that could publish a single service URL, I would go with a different endpoint for every version, like most do. My particular take is heavily influenced by the fact that the implementation as described above is most commonly deployed on lots of different servers by lots of different organizations, and, perhaps most importantly, on servers I don't control. I always want a canonical service URL, and if a site is still running the v3 version of the API, I definitely don't want a request to https://example.com/v4/ to come back with their web server's 404 Not Found page (or even worse, 200 OK that returns their homepage as 500k of HTML over cellular data back to an iPhone app.)
If you want very simple /client/ implementations (and wider adoption), it's very hard to argue that requiring a custom header in the HTTP request is as simple for client authors as GET-ting a vanilla URL. (Although authentication often requires your token or credentials to be passed in the headers, anyway. Using Version or Accept-Version as a secret handshake along with an actual secret handshake fits pretty well.)
Content negotiation using the Accept header is good for getting different MIME types for the same content (e.g., XML vs. JSON vs. Adobe PDF), but not defined for versions of those things (Dublin Core 1.1 vs. JSONP vs. PDF/A). If you want to support the Accept header because it's important to respect industry standards, then you won't want a made-up MIME Type interfering with the media type negotiation you might need to use in your requests. A bespoke API version header is guaranteed not to interfere with the heavily-used, oft-cited Accept, whereas conflating them into the same usage will just be confusing for both server and client. That said, namespacing what you expect into a named profile per 2013's RFC6906 is preferable to a separate header for lots of reasons. This is pretty clever, and I think people should seriously consider this approach.
Adding a header for every request is one particular downside to working within a stateless protocol.
Malicious proxy servers can do almost anything to destroy HTTP requests and responses. They shouldn't, and while I don't talk about the Cache-Control or Vary headers in this context, all service creators should carefully consider how their content is consumed in lots of different environments.
This is a matter of opinion; here's mine, along with the motivation behind the opinion.
include the version in the URL.
For those who say, it belongs in the HTTP header, I say: maybe. But putting in the URL is the accepted way to do it according to the early leaders in the field. (Google, yahoo, twitter, and more). This is what developers expect and doing what developers expect, in other words acting in accordance with the principle of least astonishment, is probably a good idea. It absolutely does not make it "harder for clients to upgrade". If the change in URL somehow represents an obstacle to the developer of a consuming application, as suggested in a different answer here, that developer needs to be fired.
Skip the minor version
There are plenty of integers. You're not gonna run out. You don't need the decimal in there. Any change from 1.0 to 1.1 of your API shouldn't break existing clients anyway. So just use the natural numbers. If you like to use separation to imply larger changes, you can start at v100 and do v200 and so on, but even there I think YAGNI and it's overkill.
Put the version leftmost in the URI
Presumably there are going to be multiple resources in your model. They all need to be versioned in synchrony. You can't have people using v1 of resource X, and v2 of resource Y. It's going to break something. If you try to support that it will create a maintenance nightmare as you add versions, and there's no value add for the developer anyway. So, http://api.mydomain.com/v1/Resource/12345 , where Resource is the type of resource, and 12345 gets replaced by the resource id.
You didn't ask, but...
Omit verbs from your URL path
REST is resource oriented. You have things like "CallFoo" in your URL path, which looks suspiciously like a verb, and unlike a noun. This is wrong. Use the Force, Luke. Use the verbs that are part of REST: GET PUT POST DELETE and so on. If you want to get the verification on a resource, then do GET http://domain/v1/Foo/12345/verification. If you want to update it, do POST /v1/Foo/12345.
Put optional params as a query param or payload
The optional params should not be in the URL path (before the first question mark) unless you are suggesting that those optional params constitute a self-standing resource. So, POST /v1/Foo/12345?action=partialUpdate¶m1=123¶m2=abc.
Don't do either of those things, because they push the version into the URI structure, and that's going to have downsides for your client applications. It will make it harder for them to upgrade to take advantage of new features in your application.
Instead, you should version your media types, not your URIs. This will give you maximum flexibility and evolutionary ability. For more information, see this answer I gave to another question.
I like using the profile media type parameter:
application/json; profile="http://www.myapp.com/schema/entity/v1"
More Info:
https://www.rfc-editor.org/rfc/rfc6906
http://buzzword.org.uk/2009/draft-inkster-profile-parameter-00.html
It depends on what you call versions in your API, if you call versions to different representations (xml, json, etc) of the entities then you should use the accept headers or a custom header. That is the way http is designed for working with representations. It is RESTful because if I call the same resource at the same time but requesting different representations, the returned entities will have exactly the same information and property structure but with different format, this kind of versioning is cosmetic.
In the other hand if you understand 'versions' as changes in entity structure, for example adding a field 'age' to the 'user' entity. Then you should approach this from a resource perspective which is in my opinion the RESTful approach. As described by Roy Fielding in his disseration ...a REST resource is a mapping from an identifier to a set of entities... Therefore makes sense that when changing the structure of an entity you need to have a proper resource that points to that version. This kind of versioning is structural.
I made a similar comment in: http://codebetter.com/howarddierking/2012/11/09/versioning-restful-services/
When working with url versioning the version should come later and not earlier in the url:
GET/DELETE/PUT onlinemall.com/grocery-store/customer/v1/{id}
POST onlinemall.com/grocery-store/customer/v1
Another way of doing that in a cleaner way but which could be problematic when implementing:
GET/DELETE/PUT onlinemall.com/grocery-store/customer.v1/{id}
POST onlinemall.com/grocery-store/customer.v1
Doing it this way allows the client to request specifically the resource they want which maps to the entity they need. Without having to mess with headers and custom media types which is really problematic when implementing in a production environment.
Also having the url late in the url allows the clients to have more granularity when choosing specifically the resources they want, even at method level.
But the most important thing from a developer perspective, you don't need to maintain the whole mappings (paths) for every version to all the resources and methods. Which is very valuable when you have lot of sub-resources (embedded resources).
From an implementation perspective having it at the level of resource is really easy to implement, for example if using Jersey/JAX-RS:
#Path("/customer")
public class CustomerResource {
...
#GET
#Path("/v{version}/{id}")
public IDto getCustomer(#PathParam("version") String version, #PathParam("id") String id) {
return locateVersion(version, customerService.findCustomer(id));
}
...
#POST
#Path("/v1")
#Consumes(MediaType.APPLICATION_JSON)
public IDto insertCustomerV1(CustomerV1Dto customer) {
return customerService.createCustomer(customer);
}
#POST
#Path("/v2")
#Consumes(MediaType.APPLICATION_JSON)
public IDto insertCustomerV2(CustomerV2Dto customer) {
return customerService.createCustomer(customer);
}
...
}
IDto is just an interface for returning a polymorphic object, CustomerV1 and CustomerV2 implement that interface.
Facebook does verisoning in the url. I feel url versioning is cleaner and easier to maintain as well in the real world.
.Net makes it super easy to do versioning this way:
[HttpPost]
[Route("{version}/someCall/{id}")]
public HttpResponseMessage someCall(string version, int id))
while developing GWT apps we ran into lots of problems with project configuration. Let me explain... As usual in development, we have few environments for our application: local, demo, preview and live. Of course they are running on different machines, some are using SSL while others don't. But most importantly - all of them have different URLs.
Now, in few places in our application we need specify some URLs. Usually we would use *.properties files stored on server, and tools like Spring taglibs and it's <spring:message /> tag. But since GWT does not have such tools, we ended up in leaving hard coded URLs and performing code replacement on different SVN branches. As you can imagine - this is the worst possible scenario, causing us much problems.
So, my question is:
how one could build proper, flexible mechanism of storing config properties shared for both client and server side in GWT application. This properties have to be available for server-side handlers, client app (compiled JavaScript), UiBinder, other code running on server (workers, Spring, etc.).
The preferred way would be to avoid gwtc build if we change value of some property, but I guess it will be hard to achieve. So I will accept any reasonable alternative.
How about using relative URI references (e.g. absolute paths, without scheme or authority; i.e. /path/to/foo instead of http://example.com/path/to/foo)?
And in the few places where you absolutely need an URI (with scheme and authority), then use another property to store the "prefix" (e.g. http://example.com), and then concatenate with the above path.
Those places where you need a full URI should all be on the server, which means you don't have to recompile your GWT project when you change the "prefix", so everything is only runtime configuration and you can deploy the same artifacts in all environments.
That being said, if you ever need something configurable at runtime in GWT, then use a dynamic host page and JSNI (or a com.google.gwt.i18n.client.Dictionary); see http://code.google.com/webtoolkit/articles/dynamic_host_page.html