HAProxy / Head requests with unicode characters in URI

HAProxy / Head requests with unicode characters in URI - unicode

I just figured out HAProxy has an issue (returns 400 Bad Request) when using unencoded unicode characters in the URI - like this:
http://www.something.com/адрес
I am intentionally not encoding the path part of the URI as it's a cyr HTML5+ website and if i have to escape everything things become really messy.
So far i haven't had any issues with ordinary web servers and browsers but as soon.
But - today when testing with a SEO web tool which apparently uses HEAD requests to check the validity of links it told me pretty much most links were bad/broken.
Then i tested a HEAD request as noted and it really fails - every time.
The only way to 'fix' this is if i use the 'accept-invalid-http-request' option but that is not a production thing to use...
Any other suggestions?
Is it a MUST that i have to encode all URI unicode characters (keeping in mind i do not care for compat with old OS-es like XP or so)?

Related

I have a few questions about the setup of an mp3 radio stream

What's the difference between all of these? And what are their meanings?
/;stream.mp3 [What exactly does the ; semicolon signify after the / slash?]
Also, what's the difference if I take off the stream.mp3, and just leave the semicolon after the slash /; or if I leave `stream.mp3 attached?
/stream [How come this one has only stream, and that's it. [There's no ; semicolon after the / slash and there's no stream.mp3?
Why would one stream be able to work without a semicolon, and why would one stream need to have one?
http://91.223.18.205:8000/c11_4? [icecast] Why does this one have a ? question mark at the end [and what does that signify?]

SHOUTcast serves its administrative interface from the exact same port and path as the stream. For example, suppose I have a SHOUTcast server running on port 8000 on 198.51.100.100. If I go to the following in my browser...
http:///198.51.100.100:8000/
... I will see the SHOUTcast admin page, where I can login and drop connections and what not. However if I go to that same URL with a media player (such as VLC or Winamp), I will hear a stream.
SHOUTcast knows which to give me based on the User-Agent request header. This header indicates what client is trying to connect to the server. When I connect with my browser, it might look something like this:
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
If I connect with VLC, the User-Agent request header might look like this:
NSPlayer/7.10.0.3059
SHOUTcast doesn't have a list of all browsers. Instead, it looks for only one keyword... Mozilla. This is found in most browsers'user agent strings, for historical reasons. If Mozilla is in the User-Agent request header, then SHOUTcast sends the admin page. For all others, it sends a stream.
This creates lots of problems. Most notably, it means you cannot listen to the stream in a browser. If you load that stream on a web page, the User-Agent string will contain Mozilla, and the SHOUTcast server will send the admin page, causing an error with the player.
There is a way around this. If in the request path you add a semicolon ;, SHOUTcast ignores the actual User-Agent and replaces it with MPEG OVERRIDE. (You can see this in your SHOUTcast server logs.) This causes the server to send the actual radio stream.
Because of this, it's common to see a semicolon ; in the path for SHOUTcast streams. But, what about ;stream.mp3? Someone did it one day and everyone else copied and pasted it. Simple as that. The SHOUTcast server ignores everything after that semicolon, so you can put whatever you want there.
Occasionally, there may be a reason for the .mp3 though. When loading over HTTP, you're supposed to be able to determine what type something is by the Content-Type response header. The "file name" is completely meaningless. You could configure a web server to name something whatever you want with any file name extension, and as long as you sent the correct Content-Type response header, all is well. One time, in the last ~15 years, I came across software that assumed file name extensions were valid and required. This was a very buggy way to do things. Fortunately, they fixed it, and all was well. This is a really rare problem, and not one you should worry about.
Now that SHOUTcast hacks are explained... onto your other questions.
/stream [How come this one has only stream, and that's it. [There's no ; semicolon after the / slash and there's no stream.mp3?
Those running the servers can do whatever they want. It's just normal HTTP. Paths can be anything. In this case, someone decided to call whatever is running there /stream. They're probably also not using SHOUTcast. (Again, SHOUTcast is non-standard and not normal.)
Why would one stream be able to work without a semicolon, and why would one stream need to have one?
Only SHOUTcast requires the semicolon ; to work as expected. Other servers don't require this hack.
http://91.223.18.205:8000/c11_4? [icecast] Why does this one have a ? question mark at the end [and what does that signify?]
Question marks ? in URLs separate the path from the query string. A query string can be used to provide a list of parameters, usually for a script at the path. In this case, the question mark doesn't matter because there are no parameters after it.
Old IE (4, I think) used to overly cache things, but often wouldn't if there were a query string involved. Sometimes people would add query strings with a random number to ensure they received a fresh copy from the server. This is a hack that hasn't been needed in a very long time. IE 4 came out nearly 20 years ago. These days, we use the proper cache control headers. SHOUTcast, Icecast, and others, all do this correctly.

Why are websites requiring referer headers (and failing silently)?

I've been noticing a very quirky trend lately and I'm baffled by it. In the past month or two, I've begun to notice sites breaking without a referer header.
As background: you'll of course remember the archaic days where referer headers were misused to do a whole bunch of things from feature detection to some misguided appearance of security. There are still some legacy sites that depend on it, but for the most part refer headers have been relegated to shitty device detection.
Imagine my surprise when not one, but three modern websites are suddenly breaking without a referer.
Codepen: pen previews and full page views just break (i.imgur.com/3abXqsC.png). But editor view works perfectly.
Twitter: basically every interactive function breaks. If you try to tweet, retweet, favourite, etc. you get a generic no-descriptive error (i.imgur.com/E6tIKFo.png). If you try to update a setting, it just flat out refuses (403) (i.imgur.com/51e2d0M.png).
Imgur: It just can't upload anything (i.imgur.com/xCWpkGX.png) and eventually gives up (i.imgur.com/iO2UlR6.png).
All three are modern websites. Codepen was already broken since I started using it so I'm not sure if it was always like that, but Twitter and Imgur used to work perfectly fine with no referer. In fact I had just noticed Imgur breaking.
Furthermore, all of them only generate non-descriptive error messages, if at all, which do not identify the problem at all. It took a lot of trial and error for me to figure it out the first two times, now I try referer headers as one of the first things. But wait! There's more! All it takes to un-bork them is to send a generic referer that's the root of the host (i.e. twitter.com, codepen.io, imgur.com). You don't even need to use actual URLs with directory paths!
One website, I can chalk it up to shitty code. But three, major, modern websites - especially when they used to work - is a huge head scratcher.
Has anybody else noticed this trend or know wtf is going on?

While Referer headers don't "add security", they can be used to trim out attempts from browsers (that play by refer rules) which invoke the request. It's not making the site "secure" from any HTTP attempt, but it is a fair filter for browsers (running on behalf of, possibly unsuspecting, users) acting-as proxies.
Here are some possibilities:
Might prevent hijacked (or phished) users, and/or other injection attacks on form POSTS (non-idempotent requests), which are not constrained to Same-Origin Policy.
Some requests can leak a little bit of information, event with Same-Origin Policy.
Limit 3rd-party use of embedded content such as iframes, videos/images, and other hotlinking.
That is, while it definitely should not be considered a last line of defence (eg. it should not replace proper authentication and CSRF tokens), it does help reduce some exposure of undesired access from browsers.

Safari failing to follow server-side redirect with https and hash params

Safari (8.0.7 in my case) is failing to follow a redirect. This is working in Chrome and only fails in a very specific scenario.
As best I can tell, the redirect will only fail when moving between two https connections on different domains/subdomains when hash params are involved. It will work with query params or if one of the domains is localhost.
According to https://bugs.webkit.org/show_bug.cgi?id=24175, it seems that Safari would not honor hash params in redirects at one point in the past, but I cannot confirm if this is still the case.
It's looking to be a security/sandbox issue, but I'd be interested if anyone can put an exact finger on this issue.

In the end, this did prove to be a big of a Safari issue, but also a perfect storm of things which resulted in my problem.
tl;dr:
Safari doesn't apply hash params on redirects unless the paths match exactly. Fixed in Safari 9.x
Specifics:
The server infrastructure was redirecting /foo to /foo/. Because of this, the original hash params were not re-applied.
http://localhost:<port>/foo#/one
results in
http://localhost:<port>/foo
Forcing the trailing slash fixed the issue
http://localhost:<port>/foo/#/one
results in
http://localhost:<port>/foo/#/one

Why do some servers respond with a 503 error unless my User-Agent starts with "Mozilla"?

I'm writing a client that grabs a page from a web server. On one particular server, it would work fine from my web browser, but my code was consistently getting the response:
HTTP/1.1 503 Service Unavailable
Content-Length:62
Connection: close
Cache-Control: no-cache,no-store
Pragma: no-cache
<html><body><b>Http/1.1 Service Unavailable</b></body> </html>
I eventually narrowed this down to the User-Agent header I was sending: if it contains Mozilla, everything is fine (I tried many variations of this). If not, I get 503. As soon as I realized it was User-Agent, I remembered having this same issue in the past (different project, different servers), but I've never figured out why.
In this particular case, the web server I'm connecting to is running IIS 7.5, but I'm not sure if there are any proxies/firewalls/etc in front of it (I suspect there probably is something because of this behaviour).
There's an interesting history to User-Agents which you can read about on this question: Why do all browsers' user agents start with "Mozilla/"?
It's clearly no issue for me to have Mozilla in my User-Agent, but my question is simply: what is the configuration or server that causes this to happen, and why would anyone want this behaviour?

Here is an interesting history of this phenomenon: User Agent String History
The main reason that this exists is because the internet, web, and browsers were not designed, but evolved, with high amounts of backwards compatibility, but then a lot of vender exclusive extensions. In particular, frames (which are widely considered a bad idea these days) were not well supported by Mosaic, but were by Netscape (which had Mozilla as it's user agent).
Server administrators then had a choice: did they use the new hip cool frames and only support Netscape, or did they use old boring pages that everyone can use? Their choice was a hack; if someone tells me they are Mozilla, send them frames; if not, send them not frames.
This ruined everything. IE had to call itself Mozilla compatible, everyone impersonated everyone else, it's all well detailed in the link at the top. But this problem more or less went away in the modern era, as everyone impersonated everyone, and everyone supported more and more of a common subset of features.
And then mobile browsers and smart phone browsers became wide spread. Suddenly, there wasn't just a few main browsers with basically the same features, and a few outlying browsers you could easily ignore. Now it was dozens of small browsers, with less power and less ability and a disjoint odd set of capabilities! And so, many servers took the easy road and simply did not send the proper data, or any data at all, to any browser they did not recognize.
Now rather than a poorly rendered or inoperable website, you had...no website on certain platforms, and a perfect one on others. This worked, but wasn't tolerable for many businesses; they wanted to work right on ALL platforms, because that's how the web was supposed to work.
Mobile versions, mobile first, responsive design, media queries, all these were designed to fill in those gaps. But for the most part, a lot of websites still just ignore less than modern browsers. And media queries were quickly subverted: no one wants to declare their browser is handheld, oh no. We're a real display browser, even if our screen is only 3 inches, yes sir!
In summary, some servers are configured to drop any browser which is not Mozilla compatible because they think it's better to serve no page than a poorly rendered one.
I've also seen some arguments that this improves security because then the server doesn't have to deal with rogue programs that aren't browsers (much like your own) connecting to them. As the user agent is easy to change, this holds no water for me; it's simply security through obscurity.

Many firewalls are configured to drop all requests which do not have "proper" user agent, as many DDoS attacks do not bother to send it - this is easy, reliable filter.

SOP issue behind reverse proxy

I've spent the last 5 months developing a gwt app, and it's now become time for third party people to start using it. In preparation for this one of them has set up my app behind a reverse proxy, and this immediately resulted in problems with the browser's same origin policy. I guess there's a problem in the response headers, but I can't seem to rewrite them in any way to make the problem go away. I've tried this
response.setHeader("Server", request.getRemoteAddress());
in some sort of naive attempt to mimic the behaviour I want. Didn't work (to the surprise of no-one).
Anyone knowing anything about this will most likely snicker and shake their heads when reading this, and I do not blame them. I would snicker too, if it was me... I know nothing at all about this, and that naturally makes this problem awfully hard to solve. Any help at all will be greatly appreciated.
How can I get the header rewrite to work and get away from the SOP issues I'm dealing with?
Edit: The exact problem I'm getting is a pop-up saying:
"SmartClient can't directly contact
URL
'https://localhost/app/resource?action='doStuffs'"
due to browser same-origin policy.
Remove the host and port number (even
if localhost) to avoid this problem,
or use XJSONDataSource protocol (which
allows cross-site calls), or use the
server-side HttpProxy included with
SmartClient Server."
But I shouldn't need the smartclient HttpProxy, since I have a proxy on top of the server, should I? I've gotten no indications that this could be a serialisation problem, but maybe this message is hiding the real issue...
Solution
chris_l and saret both helped to find the solution, but since I can only mark one I marked the answer from chris_l. Readers are encouraged to bump them both up, they really came through for me here. The solution was quite simple, just remove any absolute paths to your server and use only relative ones, that did the trick for me. Thanks guys!

The SOP (for AJAX requests) applies, when the URL of the HTML page, and the URL of the AJAX requests differ in their "origin". The origin includes host, port and protocol.
So if the page is http://www.example.com/index.html, your AJAX request must also point to something under http://www.example.com. For the SOP, it doesn't matter, if there is a reverse proxy - just make sure, that the URL - as it appears to the browser (including port and protocol) - isn't different. The URL you use internally is irrelevant - but don't use that internal URL in your GWT app!
Note: The solution in the special case of SmartClient turned out to be using relative URLs (instead of absolute URLs to the same origin). Since relative URLs aren't an SOP requirement in browsers, I'd say that's a bug in SmartClient.

What issue are you having exactly?
Having previously had to write a reverseproxy for a GWT app I can't remember hitting any SOP issues, one thing you need to do though is make sure response headers and uri's are rewritten to the reverseproxies url - this includes ajax callback urls.
One issue I hit (which you might also experience) when running behind a reverseproxy was with the serialization policy of GWT server.
Fixing this required writing an implementation of RemoteServiceServlet. While this was in early/mid 2009, it seems the issue still exists.
Seems like others have hit this as well - see this for further details (the answer by Michele Renda in particular)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse