I have an MVC route that is giving me hell on a staging server running IIS. I am running Visual Studio 2010's development server locally.
Here is a sample URL that actually works on my dev box:
Root/CPUBoards/Full+Size
Results
Server Error404 - File or directory not found.
The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.
Here is the complete behaviour I am seeing.
Localhost:
Root/CPUBoards/Full Size - Resolves
Root/CPUBoards/Full%20Size - Resolves
Root/CPUBoards/Full+Size - Resolves
Staging Server with IIS 7.0:
Root/CPUBoards/Full Size - Resolves
Root/CPUBoards/Full%20Size - Resolves
Root/CPUBoards/Full+Size - 404 Not Found Error.
Any ideas? I need to work with the encoded version for several reasons... won't waste your time with them.
HttpUtility.UrlEncode("Full Size") returns the version with the plus sing... Full+Size. This works on my dev box, but not on the staging server. I would prefer to just get it working on the server, since I already have everything else tested and working locally, but I have no idea where to start looking on the server configuration to get it to behave the same way.
Thanks!
This is an IIS security setting. There is a standard request filter that rejects URLs containing + (plus) characters.
You can disable it for your web, adding this to your web.config:
<configuration>
...
<system.webServer>
...
<security>
<requestFiltering allowDoubleEscaping="true" />
</security>
</system.webServer>
...
</configuration>
+ only has the special meaning of being a space in application/x-www-form-urlencoded data such as the query string part of a URL.
In other parts of the URL like path components, + literally means a plus sign. So resolving Full+Size to the unencoded name Full Size should not work anywhere.
The only correct form of a space in a path component is %20. (It still works when you type an actual space because the browser spots the error and corrects it for you.) %20 also works in form-URL-encoded data as well, so it's generally safest to always use that.
Sadly HttpUtility.UrlEncode is misleadingly-named. It produces + in its output instead of %20, so it's really a form-URL-encoder and not a standard URL-encoder. Unfortunately I don't know of an ASP.NET function to “really URL-encode” strings for use in a path, so all I can recommend is doing a string replace of + to %20 after encoding.
Alternatively, avoid using spaces in path parts, eg. by replacing them with -. It's common to ‘slug’ titles being inserted to URLs, reducing them to simple alphanumerics and ‘safe’ punctuation, to avoid filling the URL with ugly %nn sequences.
System.Web.HttpUtility.UrlPathEncode(string str) encodes a + to a %20
Totally agree with #bobince, the problem is in the wrong encoding to %2b instead of %20
Sadly HttpUtility.UrlEncode is misleadingly-named. It produces + in its output instead of %20, so it's really a form-URL-encoder and not a standard URL-encoder. Unfortunately I don't know of an ASP.NET function to “really URL-encode” strings for use in a path, so all I can recommend is doing a string replace of + to %20 after encoding.
this is the important part, which is to replace the + sign with %20
Related
I'm calling a web service in this format:
http://some.server/rest/resource;a=b
It works but is this valid? I've seen the ; used as a replacement for the & but never seen such an url. I've been looking for an answer but did not find a valid one. If valid what is the meaning of this kind of url?
This is a part of the path parameters and not part of the query parameters. You can find detailed information on how URLs can be built at http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/
Edit: I was actually looking for this link earlier which explains it even better and shows you some weird but valid cases: https://www.talisman.org/~erlkonig/misc/lunatech%5Ewhat-every-webdev-must-know-about-url-encoding/ (originally at the now dead url
http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding)
But anyway, this is valid: http://www.blah.com/some/crazy/path.html;param1=foo;param2=bar
RFC2396 where path parameters were specified is obsolete. newer version is RFC 3986 -- this one does not have path parameters before the query string formally specified, however still has it in section 5.4.1 in examples.
This might answer your question: Semicolon as URL query separator
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
We are using a simple UrlRewriteFilter rule to permanently (301) redirect HTTP requests without trailing slash to the same URL with trailing slash.
In some cases our presentation layer needs URLs with encoded special characters (e.g. %C3%B6 for ö) in it, which works fine as long as the UrlRewriteFilter is not involved. But when the rule kicks in I can see the encoded character getting malformed while redirecting, e.g.
www.mydomain.com/asdf%C3%B6asdf/ --> 301 --> www.mydomain.com/asdf%F6asdf/
%F6 not being a valid unicode sequence (ending up as question mark in black diamond when urldecoded).
We use UTF-8 throughout our application, it's set in response headers as well as in the HTML's <head> section. The malformed encoding occurs on Windows and Linux machines. The rewrite rule looks as follows
<rule enabled="true" match-type="regex" >
<name>Force trailing slash</name>
<note>...</note>
<condition type="request-uri" operator="notequal">...>/condition> <!-- some URLs shall not be redirected -->
<from>(^[^\?]*)(\?.*)?$</from>
<to type="permanent-redirect" last="true" >$1/$2</to> <!-- adding trailing slash and query string, if present -->
</rule>
I'd be happy for any ideas how this could be solved. I've played with the decode-using and encode attributes, but it did not help.
I had a similar problem. what I did was set decode to null :
<urlrewrite decode-using="null">
The issue I described below seems to be related to this bug report, which has been filed in 2010 and is untouched since then. I'll probably have to work around this by handling the request "manually" using Java. Other ideas are still welcome, though.
Is there any difference in behaviour of below URL.
I don't know why the & is inserted, does it make any difference ?
www.testurl.com/test?param1=test¤t=true
versus
www.testurl.com/test?param1=test¤t=true
& is HTML for "Start of a character reference".
& is the character reference for "An ampersand".
¤t; is not a standard character reference and so is an error (browsers may try to perform error recovery but you should not depend on this).
If you used a character reference for a real character (e.g. ™) then it (™) would appear in the URL instead of the string you wanted.
(Note that depending on the version of HTML you use, you may have to end a character reference with a ;, which is why &trade= will be treated as ™. HTML 4 allows it to be ommited if the next character is a non-word character (such as =) but some browsers (Hello Internet Explorer) have issues with this).
HTML doesn't recognize the & but it will recognize & because it is equal to & in HTML
I looked over this post someone had made: http://www.webmasterworld.com/forum21/8851.htm
My Source: http://htmlhelp.com/tools/validator/problems.html#amp
Another common error occurs when including a URL which contains an
ampersand ("&"):
This is invalid:
a href="foo.cgi?chapter=1§ion=2©=3&lang=en"
Explanation:
This example generates an error for "unknown entity section" because
the "&" is assumed to begin an entity reference. Browsers often
recover safely from this kind of error, but real problems do occur in
some cases. In this example, many browsers correctly convert ©=3
to ©=3, which may cause the link to fail. Since 〈 is the HTML
entity for the left-pointing angle bracket, some browsers also convert
&lang=en to 〈=en. And one old browser even finds the entity §,
converting §ion=2 to §ion=2.
So the goal here is to avoid problems when you are trying to validate your website. So you should be replacing your ampersands with & when writing a URL in your markup.
Note that replacing & with & is only done when writing the URL in
HTML, where "&" is a special character (along with "<" and ">"). When
writing the same URL in a plain text email message or in the location
bar of your browser, you would use "&" and not "&". With HTML, the
browser translates "&" to "&" so the Web server would only see "&"
and not "&" in the query string of the request.
Hope this helps : )
That's a great example. When ¤t is parsed into a text node it is converted to ¤t. When parsed into an attribute value, it is parsed as ¤t.
If you want ¤t in a text node, you should write ¤t in your markup.
The gory details are in the HTML5 parsing spec - Named Character Reference State
if you're doing a string of characters.
make:
let linkGoogle = 'https://www.google.com/maps/dir/?api=1';
let origin = '&origin=' + locations[0][1] + ',' + locations[0][2];
aNav.href = linkGoogle + origin;
I use ISAPI_Rewrite v2 for url rewriting quite a while. The site is in the Hebrew language and so the pages urls.
ISAPI_Rewrite v2 doesnt support Hebrew characters, but I overcome this problem by using UTF-8(Hex) code for the hebrew characters.
Here is an example:
RewriteRule ^/\%D7\%A6\%D7\%95\%D7\%A8_\%D7\%A7\%D7\%A9\%D7\%A8/$ /Contact.aspx [L,I]
RewriteRule ^/\%D7\%A6\%D7\%95\%D7\%A8_\%D7\%A7\%D7\%A9\%D7\%A8$ /Contact.aspx [L,I]
The problem:
While checking my popular pages in statcounter I came across this url:
http://mysite.com/%u05F6%u05E5%u05F8_%u05F7%u05F9%u05F8
Which is the same URL rule as in my example but in Unicode! And apparently ISAPI_Rewrite v2 doesnt handle this URLs, And I the user get "The page cannot be found".
There is also pages that are more complex, for example send part of the URL as a query parameter.. Which also in Unicode.
I though only on one solution - make the same rules, this time in Unicode and deal with the Unicode in the code behind. But there's 2 problems with the solution:
The URL shows for the user in Unicode and not in the Hebrew language.
More code in the code behind which, for my opinion, doesnt need to be. What I mean is that this scenario can/need to be handle before it reach the code..
Any thoughts?
Thanks.
EDIT:
Maybe this redirection can be accomplish by IIS6 somehow? When ever the IIS identify Unicode URL, it convert it to UTF-8 and redirect the page.
ISAPI_Rewrite v2 doesnt support Hebrew characters, but I overcome this problem by using UTF-8
IIS in general requires you to use UTF-8 in URLs. There is a fallback to using the default locale-specific (‘ANSI’) encoding when the URL isn't a valid UTF-8 sequence, but that's (a) no use if your server's locale isn't Hebrew (code page 1255), and (b) still not wholly reliable as some cp1255 strings can also be valid UTF-8 sequences. So, yes, for reliability always use the UTF-8 form.
http://mysite.com/%u05F6%u05E5%u05F8_%u05F7%u05F9%u05F8
Which is the same URL rule as in my example but in Unicode!
Not really. The %uxxxx syntax comes from the JavaScript escape() function and is specific to that's function's custom form of encoding. It has no relation to standard URL-encoding. The above is not even a valid URL and won't be accepted by some browsers.
You need to find where that link is coming from and fix it to use proper UTF-8-%xx-encoding instead.
In the meantime you might be able to do something with a 404 handler that redirects to the canonical form instead.
If you use some FastCGI extension behind IIS you can try configure to configure FastCGI to use UTF-8 encoding for a particular set of server variables, use the REG_MULTI_SZ registry key FastCGIUtf8ServerVariables and set its value to a list of server variable names.
reg add HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\w3svc\Parameters /v FastCGIUtf8ServerVariables /t REG_MULTI_SZ /d REQUEST_URI\0PATH_INFO
https://www.iis.net/learn/application-frameworks/install-and-configure-php-on-iis/configuring-the-fastcgi-extension-for-iis-60#utf8servervars
How do Wikipedia (or MediaWiki in general) encode page titles in URIs? It's not normal URI encoding, since spaces are replaced with underscores and double quotes are not encoded and things like that.
http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29 - here you've got some kind of description of what their engine enforces on article names.
They should have something like this in their LocalSettings.php:
$wgArticlePath = '/wiki/$1';
and proper server URI rewrites configuration - they seem to be using Apache (HTTP header), so it's probably mod_rewrite. http://www.mediawiki.org/wiki/Manual:Short_URL
You can also refer to the index.php file for an article on Wikipedia like this: http://en.wikipedia.org/w/index.php?title=Foo%20bar and get redirected by the engine to http://en.wikipedia.org/wiki/Foo_bar. Behind the scenes mod_rewrite translates it into /index.php?title=Foo_bar. For the MediaWiki engine it's the same as if you visited http://en.wikipedia.org/w/index.php?title=Foo_bar - this page doesn't redirect you.
The process is quite complex and isn't exactly pretty. You need to look at the Title class found in includes/Title.php. You should start with the newFromText method, but the bulk of the logic is in the secureAndSplit method.
Note that (as ever with MediaWiki) the code is not decoupled in the slightest. If you want to replicate it, you'll need to extract the logic rather than simply re-using the class.
The logic looks something like this:
Decode character references (e.g. é)
Convert spaces to underscores
Check whether the title is a reference to a namespace or interwiki
Remove hash fragments (e.g. Apple#Name
Remove forbidden characters
Forbid subdirectory links (e.g. ../directory/page)
Forbid triple tilde sequences (~~~) (for some reason)
Limit the size to 255 bytes
Capitalise the first letter
Furthermore, I believe I'm right in saying that quotation marks don't need to be encoded by the original user -- browsers can handle them transparently.
I hope that helps!