How to convert GB2312 (chinese) characters in UTF-8 inside Weblogic 12? - unicode

We have pages that´re using simplified Chinese (GB2312) in the HTML form. When we submit the form with 3 Chinese characters in a text field, we receive 6 others characters (that aren't in Chinese) in the server (Weblogic 12). Then we save these 6 characters in the database, and when get it back to the screen the Chinese character is back. Work´s fine!
But, when we receive the 6 characters (that aren't Chinese) in the server, we would need to call a web service that only receive UTF-8 characters.How can I convert the original 3 Chinese (GB2312, that´re in the HTML page) characters in UTF-8?
I don´t know if the characters will be show in the forum, but I´ll try:
In the HTML form: 陈玉珍
Received in the Weblogic server: ³ÂÓñÕä

http://www.521yy.com/tools/GB2312-UTF8.html
Since this is a Chinese website. Let me explain a little bit in English
Copy your Chinese character into the black form, and click the left button on the bottom, it should convert to UTF-8 character. And the right bottom is reset. Hope it helps.

Related

Character Encoding Issue - Characters Being Replaced with Random Characters after Saving in Textarea

I'm working with a third-party company and I'm trying/hoping to determine the cause of a character encoding issue before I bring it up with them.
This company has a custom drag and drop editor for designing websites on their platform. Within the editor they have a Raw HTML widget that I can drag in and add my own content too. The problem is that when I copy HTML from someones old website, using the inspector tool, and paste it into this widget of theirs, all of the apostrophe's & double quotes get replaced with 'jibberish'. I also have the same issue when I try pasting the content into notepad, notepad++, sublime editors and then pasting it into their Raw HTML editor.
Here's a recording of the issue and a few examples:
https://streamable.com/phwn2
Here are the known characters that get replaced and what they get replaced
’ turns into â™
“ turns into âœ
” turns into â
+ turns into (a space)
Å turns into Ã…
" stays as "
' stays as '
Does anyone see a pattern with these characters or know what could be the cause of these characters being replaced?
The website probably has UTF-8 encoding, and the company's editor might be using something like Windows-1252 encoding. In your first example, the right single quote has UTF-8 encoding e2 80 99. When each of those bytes is read by a program using Windows-1252, you get "small latin letter a with circumflex" (e2), [undefined] 80 and "trademark" (99). I haven't checked the other transformations. If this is the problem, then you could do a workaround by first converting the copied characters to the destination encoding with iconv, before pasting into the company's editor.

Chinese in Japanese encoding

This may sound like a stupid question. I typed some Chinese characters into an empty text file in VS code text editor (default utf8). Then I saved the file in an encoding for Japanese: shift JIS, which apparently doesn't cover all the characters I have typed in.
However, before I close the file, all Chinese characters are displayed properly in VS code. Now after I closed the file and reopened it using shift JIS encoding, several characters are displayed as a question mark ?. I guess these are the Chinese characters not covered by the Japanese encoding?
What happened in the process? Is there anyway I can 'get back' the Chinese characters that are now shown in ?? I don't really understand how encoding works in this scenario...
Not all encodings cover all characters. (Unicode encodings, in principle, do, but even they don't have quite everything yet.) If you save some text in an encoding which does not include all characters in that text, something has to give.
Options:
you get an error message,
nothing saves at all,
the characters which cannot be included are silently dropped,
the characters which cannot be included are converted to some other character (such as the question mark).
Once that conversion is done, the data is lost, and cannot be recovered. Why not use UTF-8 or another Unicode encoding? (GB 18030 might be the best for large amounts of Chinese text.)

MFC multibyte application shows junk "????" on pasting Chinese characters, but typing works

Our MFC application uses Multi Byte Character Set (MBCS). OS is Windows 7.
We could type in Chinese Simplified characters by virtual keyboard, but copy pasting Chinese characters from Google Translate to an edit box in the application shows junk characters "????"
Is this a known issue with MBCS applications? Is there a workaround?
When copying and pasting into a multi-byte app the Unicode characters will be converted into the local code page. If they can't be converted you'll get ?. You really should be compiling and distributing your app in Unicode otherwise you'll be fighting these sorts of issues all the time.
If you can't re-compile in Unicode try catching the 'Paste' action and handle the clipboard yourself. Use GetClipboardData and read the value for CF_UNICODETEXT, which will be the valid text. You'll then need to do your own conversion to the correct multi-byte format.

special character issue in text getting from CMS

I am using API to get data from CMS, we are displaying text what user has entered into CMS,
But my problem is when user enter some special character into CMS,I am not able to get those text on iphone side
Here is the link of text what user has entered in wall description
We are using json web service, they are encode string to utf-8 so my json string will be
The word 'stop' isn\u0092t in your vocabulary. Run a marathon in 4.5 hours or less.
The utf character \u0092 is a special character we need to display same in shown in above image
NOTE:
1)if we pass string without encoding to utf-8 in webservice,I am getting whole string as null .
2)I have try with [NSString stringWithCString:[textFromCms cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding];
where textFromCms is text I got from cms as show above.
3)I also try without any conversation/encoding ….it ignore the special character
4)also try with base64 but did not help that also.
Any help would be so appreciated.
The CMS apparently uses windows-1252, not UTF-8. The curly apostrophe is 92 (hex) in windows-1252, U+2019 in Unicode, so when properly encoded into JSON, it should be \2019.

Should I use hex ascii accented character code in HTML or use the actual character?

I have several huge CSVs with lots of accented characters in html hex code: é for é and lots of others, even – for –, etc.
My site is a wiki for people to update listings. So when they are presented a textarea for update, the existing content is filled in, and obviously those hex codes will be shown.
Should I be bothered replacing those codes with actual accented characters, or just leave it as it is? I wrote a script to replace the characters, but somehow the output are weird characters. Probably the format saved in Ruby isn't in UTF-8 format.
By default my site is in UTF-8, and the accented characters are displayed properly with some html coding in the view.
Please advise. Thanks.
Could you clarify what the problem is?
If your data (CSV) is in UTF-8, and the default encoding of your site is UTF-8, then all you would need to do is make sure that when users are editing content, that content is properly treated as UTF-8.
You may not need to display the markup to the users. Perhaps you could leverage a WYSIWIG editor package like TinyMCE?