Russian text is not showing properly after rythm rendering - rythm

I have a rythm template which shows error message. Error messages can be any language such as english, spanish, geranium, russian et...
I tried to pass the following Russian text to rythm and the output i see from rythm rendering is all ??????
Это объявление должно содержать информацию о товаре из каталога Добавьте в это объявление характеристики товара
I don't have any code which encodes or decodes.Its a plain String which is passed to the template. Any help is greatly appreciated.

Make sure you are using utf-8 for your template source code encoding.
Checkout http://fiddle.rythmengine.org/#/editor/68d19008800d43ab81a6f09ef29c2426

Related

Identify hidden non-UTF8 encoded characters

I am working in postgreSQL database and I have text column which in various languages like russian, chineses, korean, english etc. Although our application handles these languages well, we are having a issue dealing with non-UTF-8 characters.
For example, if you see the image from notepad++ where I have done Encoding > Encode in UTF-8, it neatly shows all the non-recognizable characters.
However, we are facing issue marking such records as non-process-able in postgres. Something like a flag should also do but I am trying something like below but it flags the valid russian records as well whereas notepad++ explicitly shows the hidden/non-UTF-8 characters.
Notepad++
Weird thing about these characters are that they do not show up regular select query but when I convert them to "UTF-8", those show up like below.
Database
Tried something like this (below query) but it does not seem to work i.e give me the desired output. Expectation is to set a flag to such records which have invalid hidden HTML references but not lose the valid text like the valid russian sentence in the snapshot. Should be able to distinctly identify only such texts.
select text, text ~ '[^[:ascii:]]', text ~ '^[\x00-\x7F]*$'
from sample_data;
Sample Data -
"Я не наркоман. Это у меня всегда, когда мне афигитительно. А если серьёзно, это интересно,…"
"Ya le dieron amor a la foto de instagram de mi #UberCALAVERITA?"
"Executive Admininstrative Assistant in Toronto, ON for a Group"
"Сегодня валютные стратеги BMO обновили прогнозы по основным валютам на ближайшие пять кварталов (на конец периода): читать далее…"
"Flicitations Gestion d'actifs pour 6 Trophes #FundGradeA+2016 de fonds communs de placement :"
This answer might help you go back to fix problems. It doesn't directly help you to go forward in the direction you are asking about.
Looking at Flicitations and F\302\202licitations, the escapes look like octal, which is possibly a presentation choice of your "IDE" and/or the convert_to function. From octal, \302\202 is 0xC2 0x82, decoding as UTF-8 gives U+0082. In Unicode, that's a control character, in ISO 8859-1 it's a non-character, either might explain why some renderings make it invisible or take no space.
Now, Google tells me that Flicitations is almost like a French word, Félicitations. So, perhaps there is a character set and encoding where é is encoded as 0x82. Wikipedia helps here—Indeed there is: IBM850, which has been used for some French text.
So, it seems that someone has mishandled the user's text, causing data loss. The fundamental rule of text encoding is that text bytes must be read with the same encoding they were written with. Don't guess; Ask, or reference a standard, specification, documentation, or convention. Maybe you can go back and find the misbehaving process/code—at least that would prevent future data loss.
"Dealing with non-UTF-8 characters": There aren't really any non-UTF-8 characters. UTF-8 is an encoding of the Unicode character set. There are areas with exceptions but, practically speaking, Unicode has all characters, and UTF-8 can encode them all. So, if you think there are non-UTF-8 characters, the writer is either non-compliant or the reader is using the wrong encoding.

Print Chinese / Japanese character in Zebra Printer with ZPL

I have loaded the Mono Chinese/ Japanese font onto my ZM400 printer. So far I have no success printing both Chinese & English together on the same field.
Here is some example code:
^XA^CW1,B:ANMDS.TTF
^SEB:GB.DAT^CI14
^FO100,100^A1,50,50^FD中文English Here^FS
^XZ
Since I change the international code to 14 (with ^CI14), it only prints the Chinese text without the English text.
I have also try using the ^FL command, but can't seen to get it to work.
Does anyone have a working example of printing Chinese / Japanese text along with English text on the same FD (data field)?
You should probably use ^CI28 (UTF-8), and make sure that your labels are encoded in UTF-8.
As far as I know, ^CI14 only supports Asian encodings.
If anyone is looking at how to do this, I imagine what I did for Japanese will work for Chinese.
Firstly, I didn't want to purchase the Asian Font Pack because I think it's a bit of a ripoff, so I found an appropriate open source Japanese Unitype Font. I then uploaded this to the printer using Zebra Tools... make sure you upload it as a file, NOT using the font upload.
Then I managed to get it printing by escaping the characters.
So my final ZPL is
^XA
^LL150
^CI28^A#N,60,60,E:OSAKA.TTF
^FO0,0
^FH
^FD_5F_E3_81_93_E3_82_8C_E3_81_AF_E4_BD_95_E3_81_A8_E8_A8_80_E3_81_A3_E3_81_A6_E3_81_84_E3_81_BE_E3_81_99^FS
^XZ
Essentially you have to escape the bytes of each value (original Japanese これは何と言っています)
You also have to put ^FH in front of ^FD so it knows you're escaping characters.
Hopefully this helps the poster and anyone else who is looking to overcome problems with ZPL and Unicode fonts / characters.
I have figured out why. The Chinese text needs to be in gibberish format.
What I meant by gibberish is that. When you use Chinese in ZPL code, it needs to be in the windows codepage format text. This windows codepage format Text that is Chinese will be displayed as gibberish in English environment.
For example. In ZPL Code, your code might look like this:
^H ~!!####$ (this gibberish is actual the ASCII representation of Chinese text in windows code page format)
However, you can't type in unicode Chinese because ZPL would not print it.
^H 中文 (this is Chinese text in unicode format)

TEXT in Arabic showing wrong characters instead

I've a Character Encoding issue.
I've a text file written in arabic, when i open it I get weird characters.. like this åÇÜÇáÍÌÑäÇáÑÝÇÚíãäí..
Is there any way to fix this and get a correct text? the text file where it is written is utf8x encoded.
As in the comment: it is not UTF8, it is WINDOWS-1256 encoding, so you can repair it on Linux using iconv command for file test:
jh#jh-aspire:4804~$ iconv -fwindows-1256 -tutf8 test
هاـالحجرنالرفاعيمني
(I have no idea what it means as I don't know Arabic)
You can use Notepad++ for showing the Arabic texts.

special character issue in text getting from CMS

I am using API to get data from CMS, we are displaying text what user has entered into CMS,
But my problem is when user enter some special character into CMS,I am not able to get those text on iphone side
Here is the link of text what user has entered in wall description
We are using json web service, they are encode string to utf-8 so my json string will be
The word 'stop' isn\u0092t in your vocabulary. Run a marathon in 4.5 hours or less.
The utf character \u0092 is a special character we need to display same in shown in above image
NOTE:
1)if we pass string without encoding to utf-8 in webservice,I am getting whole string as null .
2)I have try with [NSString stringWithCString:[textFromCms cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding];
where textFromCms is text I got from cms as show above.
3)I also try without any conversation/encoding ….it ignore the special character
4)also try with base64 but did not help that also.
Any help would be so appreciated.
The CMS apparently uses windows-1252, not UTF-8. The curly apostrophe is 92 (hex) in windows-1252, U+2019 in Unicode, so when properly encoded into JSON, it should be \2019.

MSXML.DOMDocument.4.0 loadXML with Chinese Unicode characters

Currently, I'm trying to use the MSXML loadXML method in ASP to load XML string which may contain Unicode Chinese characters like
𠮢 (U+20BA2) 4bytes
and the xml string looks like
<City>City</City><Name>𠮢</Name>
So, in my code, I could see the xml string comes in right, but the loadXML returns an an error message like
Invalid unicode characters, & #55362;&#57250
Can someone please tell me what I can do to resolve this issue?
Thanks,
Edited
The code looks like this
Set objDoc = CreateObject("MSXML2.DOMDocument")
objDoc.async = false
objDoc.setProperty "SelectionLanguage", "XPath"
objDoc.validateOnParse = false
objDoc.loadXML(strXml)
I suggest posting the exact code, XML source and error message you are getting. I cannot reproduce an error by parsing <element>𠮢</element> in MSXML 4.0 SP3; this works fine.
I certainly do get a parseError with reason "Invalid unicode character" by trying to parse <element>𠮢</element>, because that's not well-formed XML. If you do have this in your markup then you need to fix the serialiser that produced it because neither MSXML nor any standards-compliant XML parser will load it.
If 𠮢 is turned into a character reference it must be 𠮢 (or 𠮢). Code units 55362 and 57250 are 'surrogates', reserved for encoding astral plane characters in UTF-16. They can't be included in an XML document.
𠮢 is the entity encoded form of 0xD842 0xDFA2, which is the UTF-16 encoded form of the Unicode 𠮢 character. Make sure that the XML is completely UTF-16 encoded, not mixed single-byte ASCII and multi-byte UTF-16.