preserve encoding for included files - encoding

I have used UTF-8 encoding and ASP classic with vbscript as default scripting language in my website. I have separated files to smaller parts for better management.
I always use this trick in first line of separated files to preserve UTF-8 encoding while saving files elsewhere the language characters are converted to weird characters.
mainfile.asp
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body>
<!--#include file="sub.asp"--->
</body>
</html>
sub.asp
<%if 1=2 then%>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<%end if%>
this is some characters in other language:
تست متن به زبان فارسی
This trick works good for offline saving and also works good when the page is running on the server because these Extra lines are omitted (because the condition is always false!):
Is there a better way to preserve encoding in separated files?
I use Microsoft expression web for editing files.

I use Textpad to ensure that all main files and includes are saved in UTF-8 encoding. Just hit 'Save As' and set the encoding dropdown on the dialog to the one you want.
Keep the meta tag as well because that is still necessary.

Related

Represent encoding used for a text file

How is the encoding for a simple text file stored?
In an email there's a header
Content-Type: text/plain; charset="UTF-8"
In html we have a meta tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
That leaves me the question of how a text editor knows what encoding is used, since we don't explicitly set this in a text file as we do with an html file.
If it's a standard complex format, like .docx or .pdf the encoding is likely to be stored there as some sort of a property.
If it's a simple file, like .txt, .csv the encoding will not be stored anywhere. A text editor will use heuristics to determine which encoding was used to save the file, but it will only be a guess.
Read more:
How to detect the encoding of a file?
Heuristic to detect encoding

Perl Encoding for Japanese characters

Please help me for my Perl encode problem.
I create html form with some input fields.
I take parameters from input "name".
Form action is ".pl" file.
and then I filled the data input fields. and take parameter and I can see the data that I filled. But not OK for Japanese characters.
How to use Encode for that case? e.g Japanese character become ã­ã“.
You need to ensure you are seting the character encoding of your web page correctly. Usually UTF-8. So if you're using the CGI module you do something like:
my $q = CGI->new();
print $q->header( -charset=> 'utf-8' );
This is assuming your form is also generated by by the perl CGI. If its flat HTML, there are some META tags you can use to acomplish the same thing. I think its
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Zend_form: doesn't accept Latin characters(ú, ë, etc?

I can't get Zend_form to accept any inserted latin characters (ü, é, etc).
Even if I'm not validating it doesn't accept this.
Does anyone now how to get this to work?
Gr. Tosh
After doing a couple of tests, it seems to be a simple character encoding issue.
Your server is probably not delivering documents with UTF-8 encoding. You can easily force this in your view / layout by placing this in your <head> (preferably as the first child)
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
or if using a HTML 5 doctype
<meta charset="utf-8">
It probably doesn't hurt to set the Zend_View encoding as well in your application config file though this wasn't necessary in my tests (I think "UTF-8" is the default anyway)
resources.view.encoding = "utf-8"

How do I specify an encoding for TextCells in CellList?

I use a CellList like this
CellList<String> cellList = new CellList<String>(new TextCell());
and then give it an ArrayList<String>.
If a String contains an "ü" I get a question mark in the browser (FF4, GWT Dev Plugin). If I use ü I get ü
Where can I specify the encoding, so that "ü" works? (I'm not sure if it makes a difference, but the "ü" is currently hardcoded in the .java file and not read from somewhere else).
The GWT compiler assumes, that your Java files are encoded in UTF-8. Make sure, that your editor is set to save in that encoding.
You should also make sure to set the encoding of the HTML page to a unicode capable encoding like UTF-8 (this allows you to use even more exotic characters that you won't find in other charsets):
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
...
Moreover, if you later want to retrieve the strings from a database, make sure, that it is also set up to handle Unicode, and that your JDBC driver connects in Unicode mode (required for some databases).

Using unicode characters in gwt checkbox label

How can I put a unicode character in the label(constructor) of a gwt checkbox. if I put the character in, gwt escapes the & and I end up with ë in the label of the checkbox instead of ë.
Unicode characters in Java String literals follow a special syntax.
In your case, you could write it like this:
new CheckBox("H\u00ebllo")
The code for "ë" is 00eb - you can use e.g. this table. By the way, 00ebhexadecimal = 235decimal
Another possibility is to save your Java files as UTF-8. Then you can write your literals without escaping for these characters. This however also requires you to set the compiler option -Dfile.encoding=UTF-8. Many IDEs do this automatically, if you set the encoding preference for the file to UTF-8.
Another important factor is that you should set the charset of your HTML page correctly (usually UTF-8):
<meta http-equiv="content-type" content="text/html; charset=UTF-8">