I'm using Mojolicious (not Lite) together with CPAN::Redis.
I'm storing some data which is Japanese encoded in this way:
use Redis;
my $redis = Redis->new;
$redis->set("mykey",$val);
# $val contains a string which was read from a file.
# The value looks like: テスト
Later in the code I read that value from redis:
my $val = $redis->get("mykey");
print Dumper($val); #the value prints correctly in terminal
$self->stash(
myvalue => $val
);
$self->render(
template => "/pages/test"
);
And the template:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<div><%= $myvalue %></div>
...
But it display the value like: ãã¹ã.
Changing the charset manually in the browser makes no difference (it is not displayed as expected).
Why if its displayed correctly in the terminal, it is not displayed correctly in the template?
Notes:
I used base64 encode/decode and it didn't change (I'm sure its not Redis).
I have Japanese fonts and settings installed correctly (I have been working with Japanese encodings for many years but first time I use Mojolicious templates for this task).
All files are saved in UTF-8 (no other encoding is being used).
If I write something in Japanese inside the template (hard coded) it displays correctly.
I hate to answer my own questions.. but I found the solution:
use Encode qw(decode_utf8);
...
$self->stash(
myvalue => decode_utf8($val)
);
Simple as that. Not sure why its displayed correctly on the terminal... Probably "Dumper" is converting it?
Why it is not displayed correctly in the template?
When you get value from redis - you get sequence of bytes. you should decode that octets into utf8. as you did it by decode_utf8($val).
Not sure why its displayed correctly on the terminal... Probably "Dumper" is converting it?
You terminal opened with utf8 flag. while dumping you just pass your octets to terminal and Wide character in print at is issued. but characters are displayed correct because terminal understands utf8
The main rule is: when you get bytes from external source you must convert them into internal representation.
Here is full list of recommendations
Related
Privjet!
I don't understand for what reason I am not getting displayed the non ASCII language characters like say, "ç, ñ, я " for my different languages.
The text in question is hardcoded, it is not served from a DB.
I have seen identical questions here
Charset=utf8 not working in my PHP page
I have seen that I should write this:
header('Content-type: text/html; charset=utf-8');
But where the heck does that go? I cant write it like that, the browser just mirrors the words and displays them as plain text, no parsing.
My encoding for the frontpage says this:
<head>
<meta charset="utf-8">
</head>
which is supposed to be Unicode.
I tried to test my page in validator.w3.org and it went:
Sorry, I am unable to validate this document because on line 60 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
Line 60 actuallly has the word Español (Spanish) with that weird n.
Any hint?
thank you
best regards
Please help me for my Perl encode problem.
I create html form with some input fields.
I take parameters from input "name".
Form action is ".pl" file.
and then I filled the data input fields. and take parameter and I can see the data that I filled. But not OK for Japanese characters.
How to use Encode for that case? e.g Japanese character become ãã“.
You need to ensure you are seting the character encoding of your web page correctly. Usually UTF-8. So if you're using the CGI module you do something like:
my $q = CGI->new();
print $q->header( -charset=> 'utf-8' );
This is assuming your form is also generated by by the perl CGI. If its flat HTML, there are some META tags you can use to acomplish the same thing. I think its
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
international html files archived by wget
should contain chars like this
(example hebrew and thai:)
אב
הם
and ยคน
instead they are saved like this:
íäáåãéú and ÃÒ¡à§é
How to get the these displayed properly?
iconv filename.html
iconv: illegal input sequence at position 1254
SOLVED: There was nothing wrong.
Only i didnt notice the default php.ini did set the charset in the http header but
to use various charsets like this meta http-equiv="Content-Type" content="text/html; charset=windows-874" you needed to set: default_charset = "empty";
....
The pages aren't "saved like this", whatever you're using to view the file is simply interpreting the encoding incorrectly. To know what encoding the file is in you should have paid attention to the HTTP Content-Type header during download; that's gone now.
Your only other chance is to parse the equivalent HTML meta tag in the <head>, if the document has one.
Otherwise, you can only guess the encoding of the document.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for more required background knowledge.
I use a CellList like this
CellList<String> cellList = new CellList<String>(new TextCell());
and then give it an ArrayList<String>.
If a String contains an "ü" I get a question mark in the browser (FF4, GWT Dev Plugin). If I use ü I get ü
Where can I specify the encoding, so that "ü" works? (I'm not sure if it makes a difference, but the "ü" is currently hardcoded in the .java file and not read from somewhere else).
The GWT compiler assumes, that your Java files are encoded in UTF-8. Make sure, that your editor is set to save in that encoding.
You should also make sure to set the encoding of the HTML page to a unicode capable encoding like UTF-8 (this allows you to use even more exotic characters that you won't find in other charsets):
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
...
Moreover, if you later want to retrieve the strings from a database, make sure, that it is also set up to handle Unicode, and that your JDBC driver connects in Unicode mode (required for some databases).
we built a java ee web project and use jdbc for storing our data.
The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like
ö�
instead.
I've already tried setting the encoding of the jdbc connection like described in this question:
JDBC character encoding
And the encoding of the html page is correctly set:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
Any ideas how to fix that?
Update
connection.prepareStatement("SET CHARACTER SET utf8").execute();
won't make umlauts work.
changing the meta-tag to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
won't change anything, too
"We don't know why, but in the browser those characters are broken"
Well, that's the first thing to find out. You should trace your data at every stage:
As you fetch it out of the database (with logging)
When you inject it into the page (with logging)
On the wire (via Wireshark)
When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.
When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.
You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.
Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)