I am working with TCP socket which I receive unicode data data for every 1 sec, data includes arabic text.
That unicode data I need to convert it to UTF8 and pass to JSON
How to convert unicode to UTF8 in blackberry 10 qt cascades.?
I am using the following code to convert but it is not working.
QString appTitle = QString::fromLocal8Bit(data2.toAscii());
I receive data in this format.
Related
I am working on creating file by querying data form DB and using it to create a file, the situation is as follows:
Database: Oracle with charset UTF8
Applicaiton Server: Resin with charset UTF8
Application framework: NTT Intra-Mart (a japanese framework based on Rihno and using javascript as server program language)
Need: querying data from Oracle and creating a file by charset [Shift-JIS], the file is used as a middle file that exported by one system and transfered using FTP to another system to import.
The file requires to have fixed bytes range for the destination server to locate the specified data to import:
e.g.
byte 1-10: [user address]
byte 11-20 : [user name]
However, first I create the file with UTF8, it seems all characters are shown correctly, but when I try to write data with charset [SJIS], there is some full-width charactors become half-width question mark[?], and this may lead to the bytes width shortened and can't get data correctly:
e.g.
when [user address]'s data like: 1-10-1, the data in the file will become 1?10?1
byte 1-10: [user address], but in current file user address is byte 1-8
byte 11-20 : [user name]
could you please give me some advice?
You will have to use charset name Windows-31J rather than Shift-JIS.
The data 1-10-1 would be typed from Microsoft IME. Microsoft IME use to U+FF0D (FULLWIDTH HYPHEN-MINUS) to represent the character -.
U+FF0D is not mapped to any character in the Shift-JIS - Unicode mapping in JavaVM. So you will get ? when you convert - from JVM internal representation (UTF-16) to Shift-JIS with charset Shift-JIS.
U+FF0D is mapped to 0x817C in Windows-31J - Unicode mapped in JavaVM. So you will get - when you convert - from JVM internal representation (UTF-16) to Shift-JIS with charset Windows-31J.
When displaying a bytes object in python the print function will display a select number values as ASCII characters instead of their numeric representation.
>>> b"1 2 \x30 a b \x80"
b'1 2 0 a b \x80'
Is there a known encoding that would allow a set of binary data containing mostly ASCII text to be put into a valid ASCII string where the few invalid characters are replaced by a numeric representation, similarly to what python does with bytes?
Edit:
I used python's bytes.repr as an example of what the encoding would do, the project we need this for is written in c++, something like a "language agnostic" spec would be nice.
Edit 2:
Think of this as a base64 alternative so that binary data that is mostly ASCII does not get altered too much.
I have UTF8 data in a MYSQL table. I Base64 encode this as it's read from the table and transport it to a web page via PHP and AJAX. Javascript Base64 decodes it as it is inserted into the HTML. The page receiving it is declared to be UTF8.
My problem is that if I insert the Base64 decoded data (using atob()) into the page, any two bytes that make up a single UTF-8 character are presented as two separate Unicode code points. I have to use "decodeURIComponent(escape(window.atob(data)))" (learned from another question on this forum, thank you) to get the characters to be represented correctly, and what this process does is convert the two UTF-8 byte to a single byte equaling the unicode code point for the char (also the same char under ISO 8859).
In short, to get the UTF-8 data correctly rendered in a UTF-8 page they have to be converted to their unicode code-point/ISO 8859 values.
An example:
THe unicode code-point for lowercase e-acute is \u00e9. The UTF-8 encoding of this character is \xc3\xa9:
THe following images show what is rendered for various decodings of my Base64 encoding of this word - first plain atob(), then adding escape() to the process, then further adding decodeURIComponent(). I show the console reporting the output of each, as well as three INPUT fields populated with the three outputs ("record[6]" contains the Base64 encoded data). First the code:
console.log(window.atob(record[6]));
console.log(escape(window.atob(record[6])));
console.log(decodeURIComponent(escape(window.atob(record[6]))));
jQuery("#b64-1").val(window.atob(record[6]));
jQuery("#b64-2").val(escape(window.atob(record[6])));
jQuery("#b64-3").val(decodeURIComponent(escape(window.atob(record[6]))));
`
Copy and pasting the two versions of née into a hex editor reveals what has happened
''
Clearly, the two bytes from the atob() decoding are the correct values for UTF-8 e-acute (\xc3\xa9), but are initially rendered not as a single UTF-8 char, but as two individual chars: C3 (uppercase A tilde) and A9 (copyright sign). The next two steps convert those two chars to the single codepoint for e-acute \u00e9.
So decodeURIComponent() obviously recognises the two bytes as a single UTF-8 character (because it changes them to A9), but not the browser.
Can anyone explain to me why this needs to happen in a page declared to be UTF-8?
(I am using Chrome on W10-64)
I am currently using Lazarus 1.6.4 for a mobile app, and I am making an Indy TIdHTTP.Get() call to a web server, which returns a JSON string in UTF-8 format.
The string contains Unicode characters that I cannot see in Lazarus, where the response is converted with multiple ? characters (one for each Unicode character).
For example, I want to get ["id","code","name"],[2,"100","ΤΕΣΤ"], but I receive ["id","code","name"],[2,"100","????"].
I have an older version of Lazarus, and I don't have any problems with the encoding. I also perform the call from other applications/browsers with no problem.
Any ideas?
Please help me to change the encoding to UTF8? Encoding is in latin1 the fields are in latin format.
I need to change in MongoDB database.
How to change the encoding of a database from latin 1 to utf 8.
db.collection.insert(array("title" = utf8_encode("Péter")));
MongoDB supports UTF-8 out of the box. So the encoding should not be in the MongoDB but in the data you insert.
The String as no Encoding so changing the encoding can be performed by changing to byte array and enforce the UTF-8 encoding.
If you are using JAVA 7 try this:
byte titleValue[] = myString.getBytes(ISO_8859_1);
String value = new String(titleValue, UTF_8);