Double byte Unicode shown as 2 characters by classic ASP - unicode

Problem with Unicode in asp
These two address are created using the same function from a single DB entry the only diff that I can see is that the top is written to the page directly the bottom is written via JS in an Iframe to the parent page
this is was the classic ASP outputs and it is wrong
Fxxxx Ã…sbrink
RSG connexion AB
Baggängsvägen 18
1245 Karlskoga
Karlskoga
123345
Sweden
+1233514543
this is what the JS code outputs to the page do show how it should show
Fxxxx Åsbrink
RSG connexion AB
Baggängsvägen 18
1224 Karlskoga
Karlskoga
12345
Sweden
+1233514543
I have set the page to utf-8 set the codepage to 65001 and am still getting what look 2 letters where I should be getting 1
Any idea how to fix this?

This is classic. You're most likely not setting Response.Charset = "UTF-8", together with Response.CodePage = 65001.

HTML Encode the characters as it should be.

Javascript should handle UTF8 just fine, so can you show us a snippet of the JS code to make sure there isn't a unicode error there?
http://www.joelonsoftware.com/articles/Unicode.html

I tracked it down to Response.CodePage = 65001 being set in an an include of an include the page worked just fine once I removed it.
So if you have problem like this remove all extra code and starting code back.
Paul

Related

Zend Framework Form not rendering special characters like (ä, ö, ü etc) - makes the form element value empty

I am trying to set the Zend Form working for me. I am using the same form for inserting and editing a particular database object. The object has name and I can easily create a new object with the name "Ülo". It saves correctly in database and when I fetch it to display in a report then it shows correclty "Ülo". The problem is with forms. When I open the edit form then the name element is empty. All other elements are showing correctly and if I change them to have "ü" in them they are displayed empty too. The same thing is with Form element labels. When I set a label to contain "ü" it does not show the label any more.
For example if I have $name->setLabel('Nameü: '); then it does not show the label but when I change it back to $name->setLabel('Name: '); then it shows correclty.
Same thing when I have $bcrForm->name->setValue('Ülo'); it does not show the value but when I change it to $bcrForm->name->setValue('Alo'); it shows correctly.
How can I fix it to display correctly? It seems like it is some kind of form rendering issue.
This one helped solving it for me:
make sure these settings are in /etc/php5/apache2/php.ini and /etc/php5/cli/php.ini:
default_charset = utf-8
Did you checked encoding? Try adding this to head...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
make sure that both your script and view files are encoded on UTF-8 and your connection to your DB is set for UTF-8 too.
If you are using mysql you force it to return UTF-8 data by openning a DB connection and running : SET NAMES 'utf8'
or using mysqli with : mysqli_set_charset('utf8');
I would check:
view charset
database charset (backend)
Zend_Db_Adapter charset
file charset
The view escape method is set to expect utf8 chars and may strip anything else (ie. singlebyte strange chars) :)
It should be as simple as setting escape flag to false on the elements label decorator.
$name->addDecorator('Label', аrray('escape'=>false));
Or see setEscape(). http://framework.zend.com/manual/1.12/en/zend.form.standardDecorators.html

Encoding problems in ASP when using English and Chinese characters

I am having problems with encoding Chinese in an ASP site. The file formats are:
translations.txt - UTF-8 (to store my translations)
test.asp - UTF-8 - (to render the page)
test.asp is reading translations.txt that contains the following data:
Help|ZH|帮助
Home|ZH|首页
The test.asp splits on the pipe delimiter and if the user contains a cookie with ZH, it will display this translation, else it will just revert back to the Key value.
Now, I have tried the following things, which have not worked:
Add a meta tag
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
Set the Response.CharSet = "UTF-8"
Set the Response.ContentType = "text/html"
Set the Session.CodePage (and Response) to both 65001 (UTF-8)
I have confirmed that the text in translations.txt is definitely in UTF-8 and has no byte order mark
The browser is picking up that the page is Unicode UTF-8, but the page is displaying gobbledegook.
The Scripting.OpenTextFile(<file>,<create>,<iomode>,<encoding>) method returns the same incorrect text regardless of the Encoding parameter.
Here is a sample of what I want to be displayed in China (ZH):
首页
帮助
But the following is displayed:
首页
帮助
This occurs all tested browsers - Google Chrome, IE 7/8, and Firefox 4. The font definitely has a Chinese branch of glyphs. Also, I do have Eastern languages installed.
--
I have tried pasting in the original value into the HTML, which did work (but note this is a hard coded value).
首页
首页
However, this is odd.
首页 --(in hex)--> E9 A6 96 E9 A1 --(as chars)--> 首页
Any ideas what I am missing?
In order to read the UTF-8 file, you'll probably need to use the ADODB.Stream object. I don't claim to be an expert on character encoding, but this test worked for me:
test.txt (saved as UTF-8 without BOM):
首页
帮助
test.vbs
Option Explicit
Const adTypeText = 2
Const adReadLine = -2
Dim stream : Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = adTypeText
stream.Charset = "UTF-8"
stream.LoadFromFile "test.txt"
Do Until stream.EOS
WScript.Echo stream.ReadText(adReadLine)
Loop
stream.Close
Whatever part of the process is reading the translations.txt file does not seem to understand that the file is in UTF-8. It looks like it is reading it in as some other encoding. You should specify encoding in whatever process is opening and reading that file. This will be different from the encoding of your web page.
Inserting the byte order mark at the beginning of that file may also be a solution.
Scripting.OpenTextFile does not understand UTF-8 at all. It can only read the current OEM encoding or Unicode. As you can see from the number of bytes being used for some character sets UTF-8 is quite inefficient. I would recommend Unicode for this sort of data.
You should save the file as Unicode (in Windows parlance) and then open with:
Dim stream : Set stream = Scripting.OpenTextFile(yourFilePath, 1, false, -1)
Just use the script below at the top of your page
Response.CodePage=65001
Response.CharSet="UTF-8"

Get window title with AppleScript in Unicode

I've stuck with the following problem:
I have a script which is retrieving title form the Firefox window:
tell application "Firefox"
if the (count of windows) is not 0 then
set window_name to name of front window
end if
end tell
It works well as long as the title contains only English characters but when title contains some non-ASCII characters(Cyrillic in my case) it produces some utf-8 garbage. I've analyzed this garbage a bit and it seems that my Cyrillic character is converted to the Utf-8 without any concerning about codepage i.e instead of using Cyrillic codepage for conversion it uses non codepages at all and I have utf-8 text with characters different from those in the window title.
My question is: How can I retrieved the window title in utf-8 directly without any conversion?
I can achieve this goal by using AXAPI but I want to achieve this by AppleScript because AXAPI needs some option turned on in the system.
UPD:
It works fine in the AppleScript Editor. But I'm compiling it through the C++ code via OSACompile->OSAExecute->OSADisplay
I don't know the guts of the AppleScript Editor so maybe it has some inside information about how to encode the characters
I've found the answer when wrote update. Sometimes it is good to ask a question for better it understanding :)
So for the future searchers: If you want to use unicode result of the script execution you should provide typeUnicodeText to the OSADisplay then you will have result in the UTF-16LE in the result AEDesc

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.

Encoding problem classic ASP

I have a problem with classic ASP. The encoding is wrong when I send data with XMLHttp.send. The response is a PDF file, but the “ÆØÅ” gets wrong, the “Ø” is read as “øy” for example. It’s like it’s a converting mistake from UTF-8 to ISO-8859-1, but it should be ISO-8859-1 now. I have <%#CODEPAGE="28591"%> at the top at the page and ISO-8859-1 as encoding in the XML file, I have checked the file so it’s valid ISO-8859-1. I don’t have access to the server I am sending this data to, but I fixed it in a VB6 program which use the same logic with:
aPostBody = StrConv(strBody, vbFromUnicode)
WinHttpReq.SetTimeouts 100000, 100000, 100000, 1000000
WinHttpReq.Send aPostBody
And in a C# program that also uses the same logic with
// ISO-8859-1
byte[] bytes = Encoding.GetEncoding(28591).GetBytes(data);
But in ASP classic I need some help to find a way to change the encoding on a string to ISO-8859-1.
Try:
Session.CodePage = 28591
There is some good information here, and I got the CodePage number here.
Have you tried using Response.Charset and setting it like so:
<% Response.Charset="ISO-8859-1"%>
Check the encoding of the .ASP file and all the .ASP files included with #include.
Once I had a problem when I created a new .ASP file in VS and was encoding in UTF-8. This file was included by others, and the file encoding "overwrites" all other encoding commands.
AFAIK this is a known problem with WinHttpReq / XMLHTTPRequest, hope someone proves me wrong.
Reference this as well: How do I set the character set using XMLHttp Object for a POST in classic ASP?
I have used this component both on ASP and Javascript, but on javascript I found the resolution for this issue here: http://squio.nl/blog/2006/06/27/xmlhttprequest-and-character-encoding/
The solution:
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8
Complete:
https://pt.stackoverflow.com/questions/80886/encoding-asp-cl%C3%A1ssico/81418#81418
Have you tried using the meta tag equivalent to what you are doing?
Example:
Response.Write("<meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1' />")
Note: I use a Response.Write to paste spit out the charset, because Visual Studio will attempt to save the file with a different encoding if, for example, the charset is UTF-8.