Google Custom Search encodes spaces with %2520 - encoding

I am using the free version of Google Custom Search. The 2-page version.
They gave me 1 javascript for the search box, and one for the search results page.
It seems fine, except that spaces in the request get converted to %2520 instead of %20, which leads to 0 results.
If I write my OWN simple HTML form that points to the result page, it works fine (it uses '+' for spaces).

I had a similar issue.
It was found that the url was rewritten to another one which resulted in extra encoding of the query term.
In my case, www.sub.domain.com/?q=some text was redirected (301) to sub.domain.com/?q=some%2520text
It was an error in usage. The link should not have contained 'www' for a sub domain.
Avoiding the redirection fixed the issue. Check if your query link is getting rewritten/redirected.

Related

Link validator: external URLs containing ampersands not working

When inserting external links via CKEditor, ampersands (&) are converted to & in the source code. That's fine for the frontend but the link validator seems to have problems with the ampersand, as it tries to verify the link containing the ampersand (which doesn't work).
Is there something wrong with my CKEditor configuration or can I configure link validator so it replaces & with &?
As & to & for links in HTML is absolutely correct, linkvalidator should urldecode() this before accessing the URL.
I suggest to open a bug report at https://forge.typo3.org

TinyMCE converting ampersand in querystring to HTML entity

Edit: My assumptions about encoding were incorrect. I'm leaving the question as originally asked in case others come here with the same misunderstanding.
When I include a link in some text in the editor that includes a querystring, then view the source code, I can see that it's converted any & characters in the href to &, which breaks the links.
A link
becomes
A link
and if I change it back to just & in the source, click Ok on the view source dialog, then immediately view source again, it's already worked its charms and encoded the & once again.
Is there a way to cue the editor to go ahead and convert those outside tag attributes, but not mess with those in attributes?
Using an older version (4.0.12), but I see the behavior on the current live sample right on tinymce.com, so if it's a bug it looks like it hasn't been fixed. But I am wondering if it's just a setting I'm missing.
Relevant questions:
Do I encode ampersands in <a href...>?
Do ampersands still need to be encoded in URLs in HTML5?
The HTML spec actually states that ampersands in HTML attributes have to be encoded so TinyMCE is working 100% as it should. If your server side code is not handling that correctly that is an issue with the server side code.

Github's jekyll sitemap generator giving wrong urls for spaces

One of the urls for my page is:
http://blog.theofekfoundation.org/general%20computer%20programming/2015/12/30/2d-array-copy-speeds.html
(note the %20s)
While the jekyll sitemap entry is:
<loc>
http://blog.theofekfoundation.org/general%2520computer%2520programming/2015/12/30/2d-array-copy-speeds.html
</loc> # Note the %2520s
I added the sitemap using github's sitemap gem:
gems:
- jekyll-sitemap
in my _config.yml.
Any idea what's going wrong or how to fix it?
At the moment, jekyll-sitemap always encode the URLs and is not smart enough to detect that the URL already contains encoded text, which is causing it to encode the % character (hence the %25).
You can open an issue on the jekyll-sitemap repository, and see if there are any plans to improve this story.
However, if that is an option, I would recommend you to not use spaces, and instead use a dash -, which is more user-friendly and easier to read... With the added benefit that it doesn't break the sitemap.
Also, get rid of the .html at the end.
e.g.
http://blog.theofekfoundation.org/general-computer-programming/2015/12/30/2d-array-copy-speeds/

UTF8 encoding problem, same results work fine in wordpress

I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?
Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.

Why are accented characters rendering inconsistently when accessing the same code on the same server at a different URL?

There is a page on our server that's reachable via two different URLs.
http://www.spotlight.com/6213-5613-0721
http://www.spotlight.com/interactive/cv/1/M103546.html
There's classic ASP behind the scenes, and both of those URLs actually do a Server.Transfer to the same underlying ASP page.
The accents in the name at the top of the page are rendering correctly on one URL and incorrectly on the other - but as far as I can tell, the two requests are returning identical responses (same markup, same headers, same everything) - and I have absolutely no idea why one URL should be rendering correctly whilst the other is corrupting the accented characters.
Is there anything else (content encoding?) that I should be examining - and if so, how can I tell what's being returned beyond the information displayed in Firebug?
I been in this problem in the past and the problem was that some file (maybe the asp file that do the transfer or some include) is not saved as ANSI.
Check that all files involved in the request has the same encoding in the server (try File -> Save As With Encoding)
I have checked the character encoding in your headers and meta tags and they are consistent across both pages. I also agree that the output of the pages is largely similar - except for the special characters, which are "messed up" in the source file.
I don't think this issue exists in the browser, the must be something behind the scenes that causes this. How does the name containing these characters get from the data store to the page?