I've a Character Encoding issue.
I've a text file written in arabic, when i open it I get weird characters.. like this åÇÜÇáÍÌÑäÇáÑÝÇÚíãäí..
Is there any way to fix this and get a correct text? the text file where it is written is utf8x encoded.
As in the comment: it is not UTF8, it is WINDOWS-1256 encoding, so you can repair it on Linux using iconv command for file test:
jh#jh-aspire:4804~$ iconv -fwindows-1256 -tutf8 test
هاـالحجرنالرفاعيمني
(I have no idea what it means as I don't know Arabic)
You can use Notepad++ for showing the Arabic texts.
Related
I have a very old program (not a server or something on the internet) that I think it use the ANSI (Windows-1252) encoding.
The problem is that some inputs to this program are written in Arabic.
However, when I am trying to read the result, the Arabic words are written with very wired character. For example the input: "نور" is converted to "äæÑ".
The program output should contain a combination of English words and Arabic words.
E.x. It outputs "Name äæÑ" while the correct output should be something like "Name نور".
In general, the English words are correct and readable with both UTF-8 and ANSI. But the Arabic words are read for example as "���" with UTF-8 and as "äæÑ" with ANSI.
I understand that this is because ANSI doesn't have support to non-Latin letters.
but what should I do now? How can I convert them to Arabic again?
Note: I know the exact input and the exact output that this program should produce.
Note2: I don't have the source code of this program. I just want to convert the output file of this program to have the correct words or encoding.
I solved this problem now by typing in the terminal:
iconv -f WINDOWS-1256 -t utf8 < my_File.ged > result.ged
I tried to write code in java that do a similar thing but it wasn't really working with giving my the result I wanted.
I have also tried the previous terminal command but using WINDOWS-1252 instead of WINDOWS-1256 but it wasn't working. So, I guess it is good to try different encoding until it is working
I am integrating data using some flat files. I'm getting the flat files delivered by FTP as .csv-files out of MS SQL exports from a business partner.
I asked him to encode it as UTF-8 (just using the standard I thought).
Now I can see in his files that a lot of UTF-8 bytes such as "& # 2 3 3 ;" (w/o the spaces) can be seen as plain text when I open it in Notedpad++ (or also using my "ETL" tool).
Before I ask him to fix it into proper UTF-8, I would like to understand the issue and whether my claim is actually correct?
Shouldn't special characters be shown as special characters when I open them in Notepad++ and not as plain text UTF-8 codes?
Any help is much appreciated :))
Cheers
Martin
é is an HTML entity. For some reason the text is HTML formatted, which I wouldn't count as "plaintext"/flat files. The file may or may not be encoded in UTF-8 in addition to that, we don't know from the information given.
A file containing "special characters" (meaning non-ASCII characters) encoded in UTF-8 opened in a text editor which correctly interprets the file as UTF-8 looks exactly like the text it should look like, e.g.:
正式名称は、ISO/IEC 10646では “UCS Transformation Format 8”、Unicodeでは “Unicode Transformation Format-8” という。両者はISO/IEC 10646とUnicodeのコード重複範囲で互換性がある。RFCにも仕様がある。
Put this in a file, save it as UTF-8, open it in another application as UTF-8, and this is what the text should look like.
I have loaded the Mono Chinese/ Japanese font onto my ZM400 printer. So far I have no success printing both Chinese & English together on the same field.
Here is some example code:
^XA^CW1,B:ANMDS.TTF
^SEB:GB.DAT^CI14
^FO100,100^A1,50,50^FD中文English Here^FS
^XZ
Since I change the international code to 14 (with ^CI14), it only prints the Chinese text without the English text.
I have also try using the ^FL command, but can't seen to get it to work.
Does anyone have a working example of printing Chinese / Japanese text along with English text on the same FD (data field)?
You should probably use ^CI28 (UTF-8), and make sure that your labels are encoded in UTF-8.
As far as I know, ^CI14 only supports Asian encodings.
If anyone is looking at how to do this, I imagine what I did for Japanese will work for Chinese.
Firstly, I didn't want to purchase the Asian Font Pack because I think it's a bit of a ripoff, so I found an appropriate open source Japanese Unitype Font. I then uploaded this to the printer using Zebra Tools... make sure you upload it as a file, NOT using the font upload.
Then I managed to get it printing by escaping the characters.
So my final ZPL is
^XA
^LL150
^CI28^A#N,60,60,E:OSAKA.TTF
^FO0,0
^FH
^FD_5F_E3_81_93_E3_82_8C_E3_81_AF_E4_BD_95_E3_81_A8_E8_A8_80_E3_81_A3_E3_81_A6_E3_81_84_E3_81_BE_E3_81_99^FS
^XZ
Essentially you have to escape the bytes of each value (original Japanese これは何と言っています)
You also have to put ^FH in front of ^FD so it knows you're escaping characters.
Hopefully this helps the poster and anyone else who is looking to overcome problems with ZPL and Unicode fonts / characters.
I have figured out why. The Chinese text needs to be in gibberish format.
What I meant by gibberish is that. When you use Chinese in ZPL code, it needs to be in the windows codepage format text. This windows codepage format Text that is Chinese will be displayed as gibberish in English environment.
For example. In ZPL Code, your code might look like this:
^H ~!!####$ (this gibberish is actual the ASCII representation of Chinese text in windows code page format)
However, you can't type in unicode Chinese because ZPL would not print it.
^H 中文 (this is Chinese text in unicode format)
Yesterday I wrote some text in a notepad file which was full of Unicode characters and saved the file as ANSI. Notepad gave me some warning, which i clicked OK without reading it fully and closed notepad.
Today when I again opened the same text in notepad, I am seeing notepad full of ??? signs. I now understand that this happened because I saved Unicode data as ANSI text. Is there a way to retrieve this text back? May be using some hex-editor or so?
No. Certain characters cannot be encoded in certain encodings. "風" cannot be encoded at all in ISO-8859 or any other single-byte encoding, for example. Each ANSI encoding also can only encode a certain subset of all possible characters. It is simply not possible to store characters not defined in a particular ANSI encoding in that encoding, they're simply not defined there.
So, they're gone. You better pull out a backup.
I got a problem with the encoding of a text file.
If I open it with *nix terminal tools like less, cat or more, accented characters are shown correctly.
But if I open it with any editor (e.g. vim), accented characters are scrambled.
My terminal locale is set tu UTF-8, my editor (vim) has his default encoding set to UTF-8. If I open textfile.txt with vim I see scrambled accents either I set vim encoding to UTF-8 or ISO8859-1.
The output of the file utility is:
$ file textfile.txt
textfile.txt: ISO-8859 English text, with very long lines
I already tried the following with iconv:
iconv -f iso-8859-1 -t utf-8 textfile.txt > textfile.utf8.txt
I get this
$ file textfile.utf8.txt
textfile.utf8.txt: UTF-8 Unicode English text, with very long lines
Opening it with vim keeps showing scrambled accents, and this time accents are scrambled even if I use cat or more.
My goal is to get this file in UTF-8 format and, obviously, showing correctly the accented characters.
[The brute way to do this is to copy every single output screen of the command "more", and paste it into an editor. There must be a smarter way to do this.]
Thanks for any help.
It turned out that the file contained characters from two different encodings, that's why visualizations were so scrambled in every case, and iconv didn't manage to successfully convert the file. Thanks everyone anyway