URL: semicolon instead of question mark

URL: semicolon instead of question mark - rest

I'm calling a web service in this format:
http://some.server/rest/resource;a=b
It works but is this valid? I've seen the ; used as a replacement for the & but never seen such an url. I've been looking for an answer but did not find a valid one. If valid what is the meaning of this kind of url?

This is a part of the path parameters and not part of the query parameters. You can find detailed information on how URLs can be built at http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/
Edit: I was actually looking for this link earlier which explains it even better and shows you some weird but valid cases: https://www.talisman.org/~erlkonig/misc/lunatech%5Ewhat-every-webdev-must-know-about-url-encoding/ (originally at the now dead url
http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding)
But anyway, this is valid: http://www.blah.com/some/crazy/path.html;param1=foo;param2=bar

RFC2396 where path parameters were specified is obsolete. newer version is RFC 3986 -- this one does not have path parameters before the query string formally specified, however still has it in section 5.4.1 in examples.

This might answer your question: Semicolon as URL query separator
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

Related

What are allowed characters for a submodule name in registerModule?

registerModule() expects a submodule key as a third parameter.
I think it should probably not contain a space and only alphabetic characters (or alphanumeric?) and underscore ('_'), but I'm not really sure.
I could not find specific information for this.

The function makes use of \TYPO3\CMS\Core\Utility\GeneralUtility::underscoredToUpperCamelCase to generate the full module name combined of main module and sub module connected with an _
So you already guessed the right answer.

It's a bit complicated strange to answer!
Official API document does not provide exact information. I have worked around some extension which has multiple sub-module. I'm quite sure this not allow special character as you sub-module key.
eg. web_TestTestbe123 (mainModulename_subModuleKey)
I have noticed bellow characteristic for the key:
Key must be lowercase
No space allowed
Numerica value would be fine
Does this make sense?

I found this in the documentation just now:
Backend modules
1. The modkey is made up of alphanumeric characters only. It does not contain underscores and starts with a letter.
https://docs.typo3.org/m/typo3/reference-coreapi/master/en-us/ExtensionArchitecture/NamingConventions/Index.html

lex default token definition syntax

I guess this is a simple question, but I have found no reference. I have a small lex file defining some tokens from a string and altering them (actually converting them to uppercase).
Basically it is a list of commands like this:
word {setToUppercase(yytext);}
Where setToUppercase is a procedure to change case and store it.
I need to have the complete entry string with the altered words. Is there a way to define a default token / rest of tokens so I can asociate them with an unaltered storage in an output string?

You can do that in one shot with:
.|\n {save_str(yytext);}

I said it was an easy one.
. {save_str(yytext);}
\n {save_str(yytext);}
This way all characters and newline are treated.

Url with Unicode - ISAPI_Rewrite doesnt recognize it

I use ISAPI_Rewrite v2 for url rewriting quite a while. The site is in the Hebrew language and so the pages urls.
ISAPI_Rewrite v2 doesnt support Hebrew characters, but I overcome this problem by using UTF-8(Hex) code for the hebrew characters.
Here is an example:
RewriteRule ^/\%D7\%A6\%D7\%95\%D7\%A8_\%D7\%A7\%D7\%A9\%D7\%A8/$ /Contact.aspx [L,I]
RewriteRule ^/\%D7\%A6\%D7\%95\%D7\%A8_\%D7\%A7\%D7\%A9\%D7\%A8$ /Contact.aspx [L,I]
The problem:
While checking my popular pages in statcounter I came across this url:
http://mysite.com/%u05F6%u05E5%u05F8_%u05F7%u05F9%u05F8
Which is the same URL rule as in my example but in Unicode! And apparently ISAPI_Rewrite v2 doesnt handle this URLs, And I the user get "The page cannot be found".
There is also pages that are more complex, for example send part of the URL as a query parameter.. Which also in Unicode.
I though only on one solution - make the same rules, this time in Unicode and deal with the Unicode in the code behind. But there's 2 problems with the solution:
The URL shows for the user in Unicode and not in the Hebrew language.
More code in the code behind which, for my opinion, doesnt need to be. What I mean is that this scenario can/need to be handle before it reach the code..
Any thoughts?
Thanks.
EDIT:
Maybe this redirection can be accomplish by IIS6 somehow? When ever the IIS identify Unicode URL, it convert it to UTF-8 and redirect the page.

ISAPI_Rewrite v2 doesnt support Hebrew characters, but I overcome this problem by using UTF-8
IIS in general requires you to use UTF-8 in URLs. There is a fallback to using the default locale-specific (‘ANSI’) encoding when the URL isn't a valid UTF-8 sequence, but that's (a) no use if your server's locale isn't Hebrew (code page 1255), and (b) still not wholly reliable as some cp1255 strings can also be valid UTF-8 sequences. So, yes, for reliability always use the UTF-8 form.
http://mysite.com/%u05F6%u05E5%u05F8_%u05F7%u05F9%u05F8
Which is the same URL rule as in my example but in Unicode!
Not really. The %uxxxx syntax comes from the JavaScript escape() function and is specific to that's function's custom form of encoding. It has no relation to standard URL-encoding. The above is not even a valid URL and won't be accepted by some browsers.
You need to find where that link is coming from and fix it to use proper UTF-8-%xx-encoding instead.
In the meantime you might be able to do something with a 404 handler that redirects to the canonical form instead.

If you use some FastCGI extension behind IIS you can try configure to configure FastCGI to use UTF-8 encoding for a particular set of server variables, use the REG_MULTI_SZ registry key FastCGIUtf8ServerVariables and set its value to a list of server variable names.
reg add HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\w3svc\Parameters /v FastCGIUtf8ServerVariables /t REG_MULTI_SZ /d REQUEST_URI\0PATH_INFO
https://www.iis.net/learn/application-frameworks/install-and-configure-php-on-iis/configuring-the-fastcgi-extension-for-iis-60#utf8servervars

Wikipedia (MediaWiki) URI encoding scheme

How do Wikipedia (or MediaWiki in general) encode page titles in URIs? It's not normal URI encoding, since spaces are replaced with underscores and double quotes are not encoded and things like that.

http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29 - here you've got some kind of description of what their engine enforces on article names.
They should have something like this in their LocalSettings.php:
$wgArticlePath = '/wiki/$1';
and proper server URI rewrites configuration - they seem to be using Apache (HTTP header), so it's probably mod_rewrite. http://www.mediawiki.org/wiki/Manual:Short_URL
You can also refer to the index.php file for an article on Wikipedia like this: http://en.wikipedia.org/w/index.php?title=Foo%20bar and get redirected by the engine to http://en.wikipedia.org/wiki/Foo_bar. Behind the scenes mod_rewrite translates it into /index.php?title=Foo_bar. For the MediaWiki engine it's the same as if you visited http://en.wikipedia.org/w/index.php?title=Foo_bar - this page doesn't redirect you.

The process is quite complex and isn't exactly pretty. You need to look at the Title class found in includes/Title.php. You should start with the newFromText method, but the bulk of the logic is in the secureAndSplit method.
Note that (as ever with MediaWiki) the code is not decoupled in the slightest. If you want to replicate it, you'll need to extract the logic rather than simply re-using the class.
The logic looks something like this:
Decode character references (e.g. é)
Convert spaces to underscores
Check whether the title is a reference to a namespace or interwiki
Remove hash fragments (e.g. Apple#Name
Remove forbidden characters
Forbid subdirectory links (e.g. ../directory/page)
Forbid triple tilde sequences (~~~) (for some reason)
Limit the size to 255 bytes
Capitalise the first letter
Furthermore, I believe I'm right in saying that quotation marks don't need to be encoded by the original user -- browsers can handle them transparently.
I hope that helps!

URL Shortening: What's the best encoding to use?

I'm adding a feature to my project where we are generating links to internal stuff of our website, and we want these links to be as short as possible, so we'll be making our own "URL Shortener".
I'm wondering what's the best encoding / alphabet to use for the generated short URLs.
This is largely a subjective question, I'd like to know what your opinions are regarding the best approach / trade-off.
Several options I've thought of:
- Digits, uppercase + lowercase (base 62)
- Digits, only lowercase (base 36)
- Base 32 (http://www.crockford.com/wrmg/base32.html)
- linkpot.net (using common short english words)
Of course, the second two are better for uses other than clicking, and the first two are better for Twitter.
Also, if I'm going with "clickable-only" URLs, I'd like to make the alphabet as large as possible, adding other symbols.
What symbols can I use in URLs that won't get URL encoded?
What symbols should I use? Could some of these prove problematic? I'm thinking slash and dot, for example.
What do you think?
NOTE: The main target for these URLs is Twitter. Keeping this in mind, we should probably have the largest alphabet possible, since most people will be clicking. However, I'm interested in your experience with people using short URLs in other ways (over the phone, in printed paper, etc). How likely is it this could happen?
NOTE 2: I'm not making "yet another URL shortener", please don't condemn me with downvotes. We are generating short URLs for internal stuff in our site, not allowing anyone to shorten any URL. Imagine Google Maps giving you short URLs when you generate a link to a specific coordinate.

I would go with Base-62, it's the shortest. Shortened URL is not meant for someone to manually enter anyway so don't worry about case-sensitivity.

If these are "clickable only URLS" I'd probably go with a base-64 encoding. MIME's base-64 uses a couple of characters you shouldn't use, but there are enough unreserved safe characters in URLs that you can just swap them out. (Also, you don't need the padding that MIME's base-64 uses, since you know when your URL ends.)
Here's a page that discusses one way to do this.
You can look at RFC2396 to figure out exactly what characters are safe in URIs if you want to double check.

I'd be curious to know a little more about the implementation. How will these URLs be "unshortened", or will the internal pages being accessed be saved as shortened URLs? In either case, even if you went with the encoding set of [A-Z] you'd be able to reference 26 * 26 * 26 = 17,576 pages with only 3 characters; how many internal web pages are you talking about?
In general I would lean on what your use case requirements are for picking the right encoding set. Are you planning on having these links available for "uses other than clicking"? What would those uses be, and how do you suspect they'll alter the encoding? (For example, using parts of the URL as a file name on a case-insensitive file system reduces the available character set.)
Here's an informative page on the character set you have available to you when writing a URL.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse