I have a problem with classic ASP. The encoding is wrong when I send data with XMLHttp.send. The response is a PDF file, but the “ÆØÅ” gets wrong, the “Ø” is read as “øy” for example. It’s like it’s a converting mistake from UTF-8 to ISO-8859-1, but it should be ISO-8859-1 now. I have <%#CODEPAGE="28591"%> at the top at the page and ISO-8859-1 as encoding in the XML file, I have checked the file so it’s valid ISO-8859-1. I don’t have access to the server I am sending this data to, but I fixed it in a VB6 program which use the same logic with:
aPostBody = StrConv(strBody, vbFromUnicode)
WinHttpReq.SetTimeouts 100000, 100000, 100000, 1000000
WinHttpReq.Send aPostBody
And in a C# program that also uses the same logic with
// ISO-8859-1
byte[] bytes = Encoding.GetEncoding(28591).GetBytes(data);
But in ASP classic I need some help to find a way to change the encoding on a string to ISO-8859-1.
Try:
Session.CodePage = 28591
There is some good information here, and I got the CodePage number here.
Have you tried using Response.Charset and setting it like so:
<% Response.Charset="ISO-8859-1"%>
Check the encoding of the .ASP file and all the .ASP files included with #include.
Once I had a problem when I created a new .ASP file in VS and was encoding in UTF-8. This file was included by others, and the file encoding "overwrites" all other encoding commands.
AFAIK this is a known problem with WinHttpReq / XMLHTTPRequest, hope someone proves me wrong.
Reference this as well: How do I set the character set using XMLHttp Object for a POST in classic ASP?
I have used this component both on ASP and Javascript, but on javascript I found the resolution for this issue here: http://squio.nl/blog/2006/06/27/xmlhttprequest-and-character-encoding/
The solution:
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8
Complete:
https://pt.stackoverflow.com/questions/80886/encoding-asp-cl%C3%A1ssico/81418#81418
Have you tried using the meta tag equivalent to what you are doing?
Example:
Response.Write("<meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1' />")
Note: I use a Response.Write to paste spit out the charset, because Visual Studio will attempt to save the file with a different encoding if, for example, the charset is UTF-8.
Related
I am working on files with unknown encoding at first but I get the encoding with this lines in JAVA:
InputStream in = new FileInputStream(new File("D:\\lbl2\\1 (26).LBL"));
InputStreamReader inputStreamReader = new InputStreamReader(in);
System.out.print(inputStreamReader.getEncoding());
and we get UTF8 in output.
but the problem is that when I try to see file content with the browser or text editor like Notpad++ I can't see character correctly. Instead when I change the encoding to Windows-1256 all of characters view correct and readable.
Do i do any mistake?
Java does not attempt to detect the encoding of a file. getEncoding returns the encoding that was selected in the InputStreamReader constructor. If you don't use one of the constructors that take a character set parameter, you get the 'platform default charset', according to Oracle's documentation.
This question discusses what the platform default charset is, and how you can change it.
If you know in advance that this file is Windows-1256, you can use:
InputStreamReader inputStreamReader = new InputStreamReader(in, "Windows-1256");
Attempting to detect the encoding of a file usually fails - see for example the Bush hid the facts issue in Windows Notepad.
Unfortunately there is no 100% reliable way to detect the encoding of a file and as the other answer points out Java by default doesn't try. It simply assumes the platform's default encoding.
If you know the files are all in a single encoding then great, you can just specify that encoding and life is good.
If you know that some files are in UTF-8 and some files are in a single legacy encoding then you can generally get away with trying a strict* UTF-8 decode first. If the strict UTF-8 decode errors out then you move on to your legacy encoding.
If you have a wider mix of encodings things get considerablly harder, you may have to resort to some quite complex language processing to sort them out.
* I belive to get a strict decode in Java you need to first get the "Charset", then get a "CharsetDecoder" and then use the "onMalformedInput" method to set it to strict mode.
when looking at outlook i can find two properties telling me what codepage a mailitem has
Internet Codepage Property
PR_MESSAGE_CODEPAGE -> 0x3FFD0003
What are the intentions of the two different codepage values?
When looking at an E-Mail with an HTML Body there can also be a encoding in the HTML .
So what is the correct way to interpret the HTML Body ?
In a current mail which is UTF-8 encoded, the internetcodepage returns 65001 (correkt) the PR_MESSAGE_CODEPAGE returns (1252) and html encoding tat says UTF-8
Can I rely on the internetcodepage?
An other developer told me that this sometimes didn't return the correct value, but he hasn't an example for this.
So whats the best approach to find the encoding of the HTML Body and / or Subject of a mail?
See my reply at http://social.msdn.microsoft.com/Forums/en/outlookdev/thread/d0608d5a-eef3-41cb-abc5-a6296fb92b3b
If you are only dealing with the HTML body, look at the HTML header to check if it specifies the encoding.
For other properties (if the store is not Unicode enabled), I usually use PR_INTERNET_CPID. If it is not available, then I use PR_MESSAGE_CODEPAGE.
I don't know if there a reason to prefer one over another...
I want to set a text in a label:
labelDemnaechst.setText(" Demnächst fällig:");
On the output in the application the characters "ä" are displayed wrong.
How can I display them well?
GWT assumes all source files are encoded in UTF-8 (and this is why you see löschen if you open them with an editor that interprets them as Windows-1252 or ISO-8859-1).
GWT also generates UTF-8-encoded files, so your web server should serve them as UTF-8 (or don't specify an encoding at all, to let browsers "sniff" it) and your HTML host page should be served in UTF-8 too (best is to put a <meta charset=utf-8> at the beginning of your <head>).
If a \u00E4 in Java is displayed correctly in the browser, then you're probably in the first case, where your source files are not encoded in UTF-8.
See http://code.google.com/webtoolkit/doc/latest/FAQ_Troubleshooting.html#International_characters_don't_display_correctly
well you have to encode your special charactars to Unicode. You can finde a list of the representive Unicode characters here.
Your examle would look like this:
labelDemnaechst.setText("Demn\u00E4lachst f\u00E4llig:");
Hope this helps, if noone has a better solution.
Appendix:
Thanks Thomas for your tipp, you really have to change the format in which eclipse safes it's source files. Per default it uses something like Cp1252. If you change it to UTF-8, your example works correct. (So Demnächst is written correctly).
You can edit the safing format, if you right-click on your file --> Preferences.
To get UTF-8 encoding for your entire workspace, go to Window -> Preferences. In the pop-up start typing encoding. Now you should have Content Types, Workspace, CSS Files, HTML Files, XML Files as result. In content Types you can type UTF-8 in the Default encoding text box, for the other elements you can just select the encoding in their respective listboxes.
Then check the encoding for your project in Project -> Properties -> Resource.
Detailed instruction with pictures can be found here:
http://stijndewitt.wordpress.com/2010/05/05/unicode-utf-8-in-eclipse-java/
Cheers
what i did:
open the file with notepad (Windows Explorer),
and save it with the option UFT-8 instead of proposed ANSI.
Encoding the project to UTF-8 didn't work (for me)
Cheerio
Use iso-8859-1 (western europe) character set instead of UTF-8.
I am having problems with encoding Chinese in an ASP site. The file formats are:
translations.txt - UTF-8 (to store my translations)
test.asp - UTF-8 - (to render the page)
test.asp is reading translations.txt that contains the following data:
Help|ZH|帮助
Home|ZH|首页
The test.asp splits on the pipe delimiter and if the user contains a cookie with ZH, it will display this translation, else it will just revert back to the Key value.
Now, I have tried the following things, which have not worked:
Add a meta tag
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
Set the Response.CharSet = "UTF-8"
Set the Response.ContentType = "text/html"
Set the Session.CodePage (and Response) to both 65001 (UTF-8)
I have confirmed that the text in translations.txt is definitely in UTF-8 and has no byte order mark
The browser is picking up that the page is Unicode UTF-8, but the page is displaying gobbledegook.
The Scripting.OpenTextFile(<file>,<create>,<iomode>,<encoding>) method returns the same incorrect text regardless of the Encoding parameter.
Here is a sample of what I want to be displayed in China (ZH):
首页
帮助
But the following is displayed:
首页
帮助
This occurs all tested browsers - Google Chrome, IE 7/8, and Firefox 4. The font definitely has a Chinese branch of glyphs. Also, I do have Eastern languages installed.
--
I have tried pasting in the original value into the HTML, which did work (but note this is a hard coded value).
首页
首页
However, this is odd.
首页 --(in hex)--> E9 A6 96 E9 A1 --(as chars)--> 首页
Any ideas what I am missing?
In order to read the UTF-8 file, you'll probably need to use the ADODB.Stream object. I don't claim to be an expert on character encoding, but this test worked for me:
test.txt (saved as UTF-8 without BOM):
首页
帮助
test.vbs
Option Explicit
Const adTypeText = 2
Const adReadLine = -2
Dim stream : Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = adTypeText
stream.Charset = "UTF-8"
stream.LoadFromFile "test.txt"
Do Until stream.EOS
WScript.Echo stream.ReadText(adReadLine)
Loop
stream.Close
Whatever part of the process is reading the translations.txt file does not seem to understand that the file is in UTF-8. It looks like it is reading it in as some other encoding. You should specify encoding in whatever process is opening and reading that file. This will be different from the encoding of your web page.
Inserting the byte order mark at the beginning of that file may also be a solution.
Scripting.OpenTextFile does not understand UTF-8 at all. It can only read the current OEM encoding or Unicode. As you can see from the number of bytes being used for some character sets UTF-8 is quite inefficient. I would recommend Unicode for this sort of data.
You should save the file as Unicode (in Windows parlance) and then open with:
Dim stream : Set stream = Scripting.OpenTextFile(yourFilePath, 1, false, -1)
Just use the script below at the top of your page
Response.CodePage=65001
Response.CharSet="UTF-8"
I'm modifying a mature CGI application written in Perl and the question of content encoding has come up. The browser reports that the content is iso-8859-1 encoded and the application is declaring iso-8859-1 as the charset in the HTTP headers but doesn't ever seem to actually do the encoding. None of the various encoding techniques described in the perldoc tutorials (Encode, Encoding, Open) are used in the code so I'm a little confused as to how the document is actually being encoded.
As mentioned, the application is quite mature and likely predates many of the current encoding methods. Does anyone know of any legacy or deprecated techniques I should be looking for? To what encoding does Perl assume/default to when no direction is provided by the developer?
Thanks
By default Perl handles strings as being byte sequences, so if you read from a file, and print that to STDOUT, it will produce the same byte sequence. If your templates are Latin-1, your output will also be Latin-1.
If you use a string in text string context (like with uc, lc and so on) perl assumes Latin-1 semantics, unless the string has been decoded before.
More on Perl, charsets and encodings
Perl will not assume anything, but the browser is assuming that encoding based usually on guesswork. The documents are output directly, just as they were written, if none of the encoding techniques is used.
You can specify the charset in the HTTP Content-Type header.
The first place I'd look is the server configuration. If you aren't setting the content-encoding header in the program, you're likely picking up the server's guess.
Run the script separate from the server to see what its actual output is. When the server gets the output from a CGI program (that's not nph), the server fixes up the header for anything it thinks is missing before it sends it to the client.
If the browser reports the content as iso-8859-1, maybe your perl script didn't output the correct headers to specify the charset?