SOP issue behind reverse proxy

SOP issue behind reverse proxy - gwt

I've spent the last 5 months developing a gwt app, and it's now become time for third party people to start using it. In preparation for this one of them has set up my app behind a reverse proxy, and this immediately resulted in problems with the browser's same origin policy. I guess there's a problem in the response headers, but I can't seem to rewrite them in any way to make the problem go away. I've tried this
response.setHeader("Server", request.getRemoteAddress());
in some sort of naive attempt to mimic the behaviour I want. Didn't work (to the surprise of no-one).
Anyone knowing anything about this will most likely snicker and shake their heads when reading this, and I do not blame them. I would snicker too, if it was me... I know nothing at all about this, and that naturally makes this problem awfully hard to solve. Any help at all will be greatly appreciated.
How can I get the header rewrite to work and get away from the SOP issues I'm dealing with?
Edit: The exact problem I'm getting is a pop-up saying:
"SmartClient can't directly contact
URL
'https://localhost/app/resource?action='doStuffs'"
due to browser same-origin policy.
Remove the host and port number (even
if localhost) to avoid this problem,
or use XJSONDataSource protocol (which
allows cross-site calls), or use the
server-side HttpProxy included with
SmartClient Server."
But I shouldn't need the smartclient HttpProxy, since I have a proxy on top of the server, should I? I've gotten no indications that this could be a serialisation problem, but maybe this message is hiding the real issue...
Solution
chris_l and saret both helped to find the solution, but since I can only mark one I marked the answer from chris_l. Readers are encouraged to bump them both up, they really came through for me here. The solution was quite simple, just remove any absolute paths to your server and use only relative ones, that did the trick for me. Thanks guys!

The SOP (for AJAX requests) applies, when the URL of the HTML page, and the URL of the AJAX requests differ in their "origin". The origin includes host, port and protocol.
So if the page is http://www.example.com/index.html, your AJAX request must also point to something under http://www.example.com. For the SOP, it doesn't matter, if there is a reverse proxy - just make sure, that the URL - as it appears to the browser (including port and protocol) - isn't different. The URL you use internally is irrelevant - but don't use that internal URL in your GWT app!
Note: The solution in the special case of SmartClient turned out to be using relative URLs (instead of absolute URLs to the same origin). Since relative URLs aren't an SOP requirement in browsers, I'd say that's a bug in SmartClient.

What issue are you having exactly?
Having previously had to write a reverseproxy for a GWT app I can't remember hitting any SOP issues, one thing you need to do though is make sure response headers and uri's are rewritten to the reverseproxies url - this includes ajax callback urls.
One issue I hit (which you might also experience) when running behind a reverseproxy was with the serialization policy of GWT server.
Fixing this required writing an implementation of RemoteServiceServlet. While this was in early/mid 2009, it seems the issue still exists.
Seems like others have hit this as well - see this for further details (the answer by Michele Renda in particular)

Related

Why are websites requiring referer headers (and failing silently)?

I've been noticing a very quirky trend lately and I'm baffled by it. In the past month or two, I've begun to notice sites breaking without a referer header.
As background: you'll of course remember the archaic days where referer headers were misused to do a whole bunch of things from feature detection to some misguided appearance of security. There are still some legacy sites that depend on it, but for the most part refer headers have been relegated to shitty device detection.
Imagine my surprise when not one, but three modern websites are suddenly breaking without a referer.
Codepen: pen previews and full page views just break (i.imgur.com/3abXqsC.png). But editor view works perfectly.
Twitter: basically every interactive function breaks. If you try to tweet, retweet, favourite, etc. you get a generic no-descriptive error (i.imgur.com/E6tIKFo.png). If you try to update a setting, it just flat out refuses (403) (i.imgur.com/51e2d0M.png).
Imgur: It just can't upload anything (i.imgur.com/xCWpkGX.png) and eventually gives up (i.imgur.com/iO2UlR6.png).
All three are modern websites. Codepen was already broken since I started using it so I'm not sure if it was always like that, but Twitter and Imgur used to work perfectly fine with no referer. In fact I had just noticed Imgur breaking.
Furthermore, all of them only generate non-descriptive error messages, if at all, which do not identify the problem at all. It took a lot of trial and error for me to figure it out the first two times, now I try referer headers as one of the first things. But wait! There's more! All it takes to un-bork them is to send a generic referer that's the root of the host (i.e. twitter.com, codepen.io, imgur.com). You don't even need to use actual URLs with directory paths!
One website, I can chalk it up to shitty code. But three, major, modern websites - especially when they used to work - is a huge head scratcher.
Has anybody else noticed this trend or know wtf is going on?

While Referer headers don't "add security", they can be used to trim out attempts from browsers (that play by refer rules) which invoke the request. It's not making the site "secure" from any HTTP attempt, but it is a fair filter for browsers (running on behalf of, possibly unsuspecting, users) acting-as proxies.
Here are some possibilities:
Might prevent hijacked (or phished) users, and/or other injection attacks on form POSTS (non-idempotent requests), which are not constrained to Same-Origin Policy.
Some requests can leak a little bit of information, event with Same-Origin Policy.
Limit 3rd-party use of embedded content such as iframes, videos/images, and other hotlinking.
That is, while it definitely should not be considered a last line of defence (eg. it should not replace proper authentication and CSRF tokens), it does help reduce some exposure of undesired access from browsers.

Shiro web filter chain order policy

Probably I'm in a kind of misunderstanding problem, but here is the thing:
I want my <domain>/index.html publicly available and everything else should be protected.
I'm using shiro web + guice:
...
bindConstant().annotatedWith(Names.named("shiro.loginUrl")).to("/index.html");
addFilterChain("/index.html", ANON);
addFilterChain("/**", AUTHC);
...
This configuration is leading me to a "TOO MANY REDIRECTS LOOP" problem. Shiro documentation says here it uses a FIRST MATCH WIN policy, but I think I didn't get it well.
Any thoughts?

On the surface, your filter chain looks like it should work. I probably can't diagnose your too many redirects issue without a bit more information - content of index.html, what http is actually returned from the server when you hit index.html, etc.
However, I CAN tell you that you shouldn't need to do this. The AUTHC filter has a special case for the "loginUrl" page - it will let it through. So try removing the ANON filter, and see how things go.

HTTP header field for URI deprecation/expiration

I'm building a REST service where I want to implement a way to deprecate certain URIs when they shouldn't be supported anymore for one reason or another. As functions are deprecated, they will be replaced by new ones that work in similar (but not identical) ways. This means that at some point, I will have to start responding with 410 Gone.
The idea is that all client software should be updated, and after say six months all users should have had the chance to upgrade. At this time, the deprecated URIs will start to inform the client that it's out of date, so that the client can display a message to the user. This time is not known in advance, though, and can't explicitly be written in the documentation.
The problem I want to solve is:
Is there an HTTP header field I should use to indicate that a certain URI will cease to work at a certain time and, if so, which?
This can't be the first time someone wants to solve this problem. Is there an unofficial header field already in use, or should I design my own? Note that I don't want to add this information to the content itself, as that would mean that every resource was changed and needs to be refreshed by the client, which is of course not what happened.

Strictly speaking, no. The resources should be driving your applications state, so if there is a change, the uri linking would provide the nessessary changes to your application.
For a HTTP header, you are free to add custom headers. Normally starting with X- but its important to know changes to uri's is only interesting to developers not users.

Connectedness & HATEOAS

It is said that in a well defined RESTful system, the clients only need to know the root URI or few well known URIs and the client shall discover all other links through these initial URIs. I do understand the benefits (decoupled clients) from this approach but the downside for me is that the client needs to discover the links each time it tries access something i.e given the following hierarchy of resources:
/collection1
collection1
|-sub1
|-sub1sub1
|-sub1sub1sub1
|-sub1sub1sub1sub1
|-sub1sub2
|-sub2
|-sub2sub1
|-sub2sub2
|-sub3
|-sub3sub1
|-sub3sub2
If we follow the "Client only need to know the root URI" approach, then a client shall only be aware of the root URI i.e. /collection1 above and the rest of URIs should be discovered by the clients through hypermedia links. I find this cumbersome because each time a client needs to do a GET, say on sub1sub1sub1sub1, should the client first do a GET on /collection1 and the follow link defined in the returned representation and then do several more GETs on sub resources to reach the desired resource? or is my understanding about connectedness completely wrong?
Best regards,
Suresh

You will run into this mismatch when you try and build a REST api that does not match the flow of the user agent that is consuming the API.
Consider when you run a client application, the user is always presented with some initial screen. If you match the content and options on this screen with the root representation then the available links and desired transitions will match nicely. As the user selects options on the screen, you can transition to other representations and the client UI should be updated to reflect the new representation.
If you try and model your REST API as some kind of linked data repository and your client UI as an independent set of transitions then you will find HATEOAS quite painful.

Yes, it's right that the client application should traverse the links, but once it's discovered a resource, there's nothing wrong with keeping a reference to that resource and using it for a longer time than one request. If your client has the possibility of remembering things permanently, it can do so.
consider how a web browser keeps its bookmarks. You probably have maybe ten or a hundred bookmarks in the browser, and you probably found some of these deep in a hierarchy of pages, but the browser dutifully remembers them without requiring remembering the path it took to find them.
A more rich client application could remember the URI of sub1sub1sub1sub1 and reuse it if it still works. It's likely that it still represents the same thing (it ought to). If it no longer exists or fails for any other client reason (4xx) you could retrace your steps to see if you can find a suitable replacement.
And of course what Darrel Miller said :-)

I don't think that that's the strict requirement. From how I understand it, it is legal for a client to access resources directly and start from there. The important thing is that you do not do this for state transitions, i.e. do not automatically proceed with /foo2 after operating on /foo1 and so forth. Retrieving /products/1234 initially to edit it seems perfectly fine. The server could always return, say, a redirect to /shop/products/1234 to remain backwards compatible (which is desirable for search engines, bookmarks and external links as well).

making LWP Useragent faster

I need to perform a large number of HTTP post requests, and ignore the response. I am currently doing this using LWP::UserAgent. It seems to run somewhat slow though I am not sure if it is waiting for a response or what, is there anyway to speed it up and possibly just ignore the responses?

bigian's answer is probably the best for this, but another way to speed things up is to use LWP::ConnCache to allow LWP to re-use existing connections rather than build a new connection for every request.
Enabling it is this simple if you're pounding on just one site --
my $conn_cache = LWP::ConnCache->new;
$conn_cache->total_capacity([1]) ;
$ua->conn_cache($conn_cache) ;
I've found this to double the speed of some operations on http site, and more than double it for https sites.

LWP::Parallel
http://metacpan.org/pod/LWP::Parallel
"Introduction
ParallelUserAgent is an extension to the existing libwww module. It allows you to take a list of URLs (it currently supports HTTP, FTP, and FILE URLs. HTTPS might work, too) and connect to all of them in parallel, then wait for the results to come in."
It's great, it's worked wonders for me...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse