I am getting Arabic characters as ????? in return from JSON.
Can anyone tel me how to get Arabic characters right in JSON format?
EDIT:
I am using English language. I also have tried encoding it to UTF8.
Many Thanks,
Naveed
I have searched the solution by myself.
Solution is just call the set names to utf8 after connecting to DB like this:
$host_link = mysql_connect(DBASE_HOST, DBASE_USER, DBASE_PWD);
if (!$host_link) {
die('Could not connect: ' . mysql_error());
exit;
}
mysql_query("SET NAMES utf8; ");
Hope this will help some others.
Naveed
Related
I was use the protobuf to read and write config file. but I found the chinese character can't correctly write to the file.
the encode code:
zrd::Config cfg;
zrd::Market *market = nullptr;
market = cfg.add_market();
market->set_id("11");
market->set_name("清江冷链市场");
market->set_district("六合区");
string content;
google::protobuf::TextFormat::PrintToString(cfg, &content);
when run finished , the content is like this:
market {\n id: \"11\"\n name: \"\346\270\205\346\261\237\345\206\267\351\223\276\345\270\202\345\234\272\"\n district: \"\345\205\255\345\220\210\345\214\272\"\n}
why the chinese character is convert to that way ? when I use ofstream to write the content to file, such chinese characters are not convenient to read. but the probobuf can decode it successfully.
I wonder know whether there is way to save the chinese characters in right way?
I use luasocket to GET a web page which contains Chinese characters "开奖结果" (the page itself is encoded in charset="gb2312"), as below:
require "socket"
host = '61.129.89.226'
fileformat = '/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=%s'
function getlottery(num)
c = assert(socket.connect(host, 80))
c:send('GET ' .. string.format(fileformat, num) .. " HTTP/1.0\r\n\r\n")
content = c:receive('*l')
while content do
if content and content:find('开奖结果') then -- failed
print(content)
end
content = c:receive('*l')
end
c:close()
end
--http://61.129.89.226/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=2012138
getlottery('2012138')
Unfortunately, it fails to match the expected characters:
content:find('开奖结果') -- failed
I know Lua is capable of finding unicode characters:
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
> if string.find("This is 开奖结果", "开奖结果") then print("found!") end
found!
Then I guess it might be caused by how luasocket retrieves data from the web. Could anyone shed some lights on this?
Thanks.
If the page is encoded in GB2312, and your script (the file itself) is encoded in utf-8, there's no way the match will work. Because .find() will look for utf-8 codepoints, and it will just slide over the characters you're looking for, because they're not encoded the same way...
开 奖 结 果
GB bfaa bdb1 bde1 b9fb
UTF-16 5f00 5956 7ed3 679c
UTF-8 e5bc80 e5a596 e7bb93 e69e9c
I have a bad words filter that uses a list of keywords saved in a local UTF-8 encoded file. This file includes both Latin and non-Latin chars (mostly English and Arabic). Everything works as expected with Latin keywords, but when the variable includes non-Latin chars, the matching does not seem to recognize these existing keywords.
How do I go about matching both Latin and non-Latin keywords.
The badwords.txt file includes one word per line as in this example
bad
nasty
racist
سفالة
وساخة
جنس
Code used for matching:
$badwords = file_get_contents("badwords.txt");
$badtemp = explode("\n", $badwords);
$badwords = array_unique($badtemp);
$hasBadword = 0;
$query = strtolower($query);
foreach ($badwords as $key => $val) {
if (!empty($val)) {
$val = trim($val);
$regexp = "/\b" . $val . "\b/i";
if (preg_match($regexp, $query))
$badFlag = 1;
if ($badFlag == 1) {
// Bad word detected die...
}
}
}
I've read that iconv, multibyte functions (mbstring) and using the operator /u might help with this, and I tried a few things but do not seem to get it right. Any help would be much appreciated in resolving this, and having it match both Latin and non-Latin keywords.
The problem seems to relate to recognizing word boundaries; the \b construct is apparently not “Unicode aware.” This is what the answers to question php regex word boundary matching in utf-8 seem to suggest. I was able to reproduce the problem even with text containing Latin letters like “é” when \b was used. And the problem seems to disappear (i.e., Arabic words get correctly recognized) when I set
$wstart = '(^|[^\p{L}])';
$wend = '([^\p{L}]|$)';
and modify the regexp as follows:
$regexp = "/" . $wstart . $val . $wend . "/iu";
Some string functions in PHP cannot be used on UTF-8 strings, they're supposedly going to fix it in version 6, but for now you need to be careful what you do with a string.
It looks like strtolower() is one of them, you need to use mb_strtolower($query, 'UTF-8'). If that doesn't fix it, you'll need to read through the code and find every point where you process $query or badwords.txt and check the documentation for UTF-8 bugs.
As far as I know, preg_match() is ok with UTF-8 strings, but there are some features disabled by default to improve performance. I don't think you need any of them.
Please also double check that badwords.txt is a UTF-8 file and that $query contains a valid UTF-8 string (if it's coming from the browser, you set it with a <meta> tag).
If you're trying to debug UTF-8 text, remember most web browsers do not default to the UTF-8 text encoding, so any PHP variable you print out for debugging will not be displayed correctly by the browser, unless you select UTF-8 (in my browser, with View -> Encoding -> Unicode).
You shouldn't need to use iconv or any of the other conversion API's, most of them will simply replace all of the non-latin characters with latin ones. Obviously not what you want.
I referred to serhio for the utf-8 encoding problem and hv been trying for the whole day different methods searched from net :( I want to show the chinese characters in subject lines but when received in gmail it shows rubbish characters. I had tried to put
header('Content-Type: text/html; charset=utf-8');
on top of page but not working
i tried to add "\r\n" also not working
My code as below
$mail->charset = 'utf-8';
$mail->body('',$strInv);
$mail->subject('"=?UTF-8?B?".base64_encode(我的问题)."?=" #'.$inquiry_no);
when I received in gmail subject looks like this :
"=?UTF-8?B?".base64_encode(è®¢å •ç¡®è®¤)."?=" #00016
I really appreciate anyone can help me with this. Thank you.
when you fix it your subject string should look like this:
=?UTF-8?B?RUSSIANNNN?=
use the echo function to debug your subject string before you call
$mail->subject
or just do
$ssubject = '=?UTF-8?B?' . base64_encode('RUSSIAN') . '?=';
$ssubject = $ssubject . $inquiry_no;
$mail->subject($ssubject);
good luck newbie
I use Zend Framework and I have problem with JSON and UTF-8.
Output
\u00c3\u00ad\u00c4\u008d
ÃÄ
I use...
JavaScript (jQuery)
contentType : "application/json; charset=utf-8",
dataType : "json"
Zend Framework
$view->setEncoding('UTF-8');
$view->headMeta()->appendHttpEquiv('Content-Type', 'text/html;charset=utf-8');
header('Content-Type: application/json; charset=utf-8');
utf8_encode();
Zend_Json::encode
Database
resources.db.params.charset = "utf8"
resources.db.params.driver_options.1002 = "SET NAMES utf8"
resources.db.isDefaultTableAdapter = true
Collation
utf8_unicode_ci
Type
MyISAM
Server
PHP Version 5.2.6
What did I do wrong? Thank you for your reply!
utf8_encode();
If you've got UTF-8 strings from your database and UTF-8 strings from your browser, then you don't need to utf8_encode any more. You've already got UTF-8 strings; calling this function again will just give you the UTF-8 representation of what you'd get if you read UTF-8 bytes as ISO-8859-1 by mistake.
Pass your untouched UTF-8 strings straight to the JSON encoder.
I think this question is some how related to yours
my problem was when encoding some [ Arabic , Hebrew or Chinese as you might see ]
turns out that unicode notation understood by javascript/ecmascript like what did you see
I hope that explain to you in details