ImpEx Import of Arabic Text

ImpEx Import of Arabic Text - encoding

I am trying an impex import in Hybris Administration Console to add arabic text for some products.
The arabic text is in UTF-16 LE format and English text is in UTF-8 format. Therefore, I am always getting an error:
{1=ValueEntry('عدّة البدء Single'=null,unresolved=null,ignore=false)}] - cannot import, unknown type '1234546' in line ValueLine
[,line 3 at main script,1234546,HeaderDescriptor[line 2 at main script, insert_update, Product, {}, [code, endUserShort]}
I tried setting my scrip encoding as UTF 16 LE, after which editor can not recognize the english text and produces same error.
Here is my code:
$productCatalog=Master
$catalogVersion=catalogversion(catalog(id[default=$productCatalog]),version[default='Master'])
insert_update Product;code[unique=true]; endUserShort[lang=ar_Arab]
1234546; عدّة البدء Single
Any suggestions are welcome :)

You are missing a ; on the beginning of your impex should be like this i think :
$productCatalog=Master
$catalogVersion=catalogversion(catalog(id[default=$productCatalog]),version[default='Master'])
insert_update Product;code[unique=true]; endUserShort[lang=ar_Arab];$catalogVersion
;1234546; عدّة البدء Single
Hope this helps

Related

I need to remove a specific unicode in my existing subtitle text file

I basically work on subtitles and I have this arabic file and when I open it up on notepad and right click and select SHOW UNICODE CONTROL CHARACTERS I give me some weird characters on the left of every line. I tried so many ways to remove it but failed I also tried NOTEPAD++ but failed.
Notepad ++
SUBTITLE EDIT
EXCEL
WORD
288
00:24:41,960 --> 00:24:43,840
‫أتعلم، قللنا من شأنك فعلاً‬
289
00:24:44,000 --> 00:24:47,120
‫كان علينا تجنيدك لتكون جاسوساً‬
‫مكان (كاي سي)‬
290
00:24:47,280 --> 00:24:51,520
‫لا تعلمون كم أنا سعيد‬
‫لسماع ذلك‬
291
00:24:54,800 --> 00:24:58,160
‫لا تقلق، سيستيقظ نشيطاً غداً‬
292
00:24:58,320 --> 00:25:00,800
‫ولن يتذكر ما حصل‬
‫في الساعات الـ٦‬
the unicodes are not showing in this the unicode is U+202B which shows a ¶ sign, after googling it I think it's called PILCROW.
The issue with this is that it doesn't display subtitles correctly on ps4 app.
I need this PILCROW sign to go away. with this website I can see the issue in this file https://www.soscisurvey.de/tools/view-chars.php

The PILCROW ¶ is used by various software and publishers to show the end of a line in a document. The actual Unicode character does not exist in your file so you can't get rid of it.

The Unicode characters in these lines are 'RIGHT-TO-LEFT EMBEDDING'
(code \u202b) and 'POP DIRECTIONAL FORMATTING' (code \u202c) -
these are used in the text to indicate that the included text should be rendered
right-to-left instead of the ocidental left-to-right direction.
Now, these characters are included as hints to the application displaying the text, rather than to actually perform the text reversing - so they likely can be removed without compromising the text displaying itself.
Now this a programing Q&A site, but you did not indicate any programming language you are familiar with - enough for at least running a program. So it is very hard to know how give an answer that is suitable to you.
Python can be used to create a small program to filter such characters from a file, but I am not willing to write a full fledged GUI program, or an web app that you could run there just as an answer here.
A program that can work from the command line just to filter out a few characters is another thing - as it is just a few lines of code.
You have to store the follwing listing as a file named, say "fixsubtitles.py" there, and, with a terminal ("cmd" if you are on Windows) type python3 fixsubtitles.py \path\to\subtitlefile.txt and press enter.
That, of course, after installing Python3 runtime from http://python.org
(if you are on Mac or Linux that is already pre-installed)
import sys
from pathlib import Path
encoding = "utf-8"
remove_set = str.maketrans("\u202b\u202c")
if len(sys.argv < 2):
print("Usage: python3 fixsubtitles.py [filename]", file=sys.stderr)
exit(1)
path = Path(sys.argv[1])
data = path.read_text(encoding=encoding)
path.write_text(data.translate("", "", remove_set), encoding=encoding)
print("Done")
You may need to adjust the encoding - as Windows not always use utf-8 (the files can be in, for example "cp1256" - if you get an unicode error when running the program try using this in place of "utf-8") , and maybe add more characters to the set of characters to be removed - the tool you linked in the question should show you other such characters if any. Other than that, the program above should work

Print Chinese / Japanese character in Zebra Printer with ZPL

I have loaded the Mono Chinese/ Japanese font onto my ZM400 printer. So far I have no success printing both Chinese & English together on the same field.
Here is some example code:
^XA^CW1,B:ANMDS.TTF
^SEB:GB.DAT^CI14
^FO100,100^A1,50,50^FD中文English Here^FS
^XZ
Since I change the international code to 14 (with ^CI14), it only prints the Chinese text without the English text.
I have also try using the ^FL command, but can't seen to get it to work.
Does anyone have a working example of printing Chinese / Japanese text along with English text on the same FD (data field)?

You should probably use ^CI28 (UTF-8), and make sure that your labels are encoded in UTF-8.
As far as I know, ^CI14 only supports Asian encodings.

If anyone is looking at how to do this, I imagine what I did for Japanese will work for Chinese.
Firstly, I didn't want to purchase the Asian Font Pack because I think it's a bit of a ripoff, so I found an appropriate open source Japanese Unitype Font. I then uploaded this to the printer using Zebra Tools... make sure you upload it as a file, NOT using the font upload.
Then I managed to get it printing by escaping the characters.
So my final ZPL is
^XA
^LL150
^CI28^A#N,60,60,E:OSAKA.TTF
^FO0,0
^FH
^FD_5F_E3_81_93_E3_82_8C_E3_81_AF_E4_BD_95_E3_81_A8_E8_A8_80_E3_81_A3_E3_81_A6_E3_81_84_E3_81_BE_E3_81_99^FS
^XZ
Essentially you have to escape the bytes of each value (original Japanese これは何と言っています)
You also have to put ^FH in front of ^FD so it knows you're escaping characters.
Hopefully this helps the poster and anyone else who is looking to overcome problems with ZPL and Unicode fonts / characters.

I have figured out why. The Chinese text needs to be in gibberish format.
What I meant by gibberish is that. When you use Chinese in ZPL code, it needs to be in the windows codepage format text. This windows codepage format Text that is Chinese will be displayed as gibberish in English environment.
For example. In ZPL Code, your code might look like this:
^H ~!!####$ (this gibberish is actual the ASCII representation of Chinese text in windows code page format)
However, you can't type in unicode Chinese because ZPL would not print it.
^H 中文 (this is Chinese text in unicode format)

Importing German values in openerp 7

I would like to import German language values in Openerp 7. Currently, the import fails due to special characters. Importing English text works perfectly.
Some of the sample values are:
beschränkt
öffentlich nach Einzelgewerken
Do I need to change the language preference to German first, before importing?
Also, do I need to know German to accomplish this task?
Any advice?

Changing the language of your User account to German is only useful if you'd like to load the German version of translatable fields. For example, if you were importing a list of Products as a CSV file, it would allow you to load the German translation of the product names. If you don't, the names will simply be stored as the master translation (English).
However it is very likely that your import fails due to encoding issue. Encoding comes into the picture because of German special characters, such as Umlauts. In that case you basically need to make sure that you are importing the CSV file using the same Encoding setting that was used to export it.
If your CSV was produced on Windows using Excel or something similar, there is a good chance it was produced with Windows-1252 encoding. By default OpenERP will select utf-8, so you will need to change that in the Encoding selection box of the File Format Options that appear after you select the CSV file to import (in OpenERP 7.0).

Convert non english characters into Unicode (UTF-8)

I copied large amount of text from another system to my PC. When I viewed the text in my PC, it looked weird. So I copied all the fonts from the other PC and installed them in mine too. Now the text looks okay, but actually it seems that is not in Unicode. For example, if I copy the text and paste in another UTF-8 supported editor such as Notepad++, I get English characters ("bgah;") only like shown below.
How to convert this whole text into unicode text, like the one below. So I can copy the text and paste anywhere else.
பெயர்
The above text was manually obtained using http://www.google.com/transliterate/indic/Tamil
I need this conversion to be done, so I can copy them into database tables.

'Ja-01' is a font with a custom 'visual encoding'.
That is to say, the sequence of characters really is "bgah;" and it only looks like Tamil to you because the font's shapes for the Latin characters bg look like பெ.
This is always to be avoided, because by storing the content as "bgah;" you lose the ability to search and process it as real Tamil, but this approach was common in the pre-Unicode days especially for less-widespread scripts without mature encoding standards. This application probably predates widespread use of TSCII.
Because it is a custom encoding not shared by any other font, it is very unlikely you will be able to find a tool to convert content in this encoding to proper Unicode characters. It does not appear to be any standard character ordering, so you will have to look at the font (eg in charmap.exe) and note down every character, find the matching character in Unicode and map between them.
For example here's a trivial Python script to replace characters in a file:
mapping= {
u'a': u'\u0BAF', # Tamil letter Ya
u'b': u'\u0BAA', # Tamil letter Pa
u'g': u'\u0BC6', # Tamil vowel sign E (combining)
u'h': u'\u0BB0', # Tamil letter Ra
u';': u'\u0BCD', # Tamil sign virama (combining)
# fill in the rest of the mapping information here!
}
with open('ja01data.txt', 'rb') as fp:
data= fp.read().decode('utf-8')
for char in mapping:
data= data.replace(char, mapping[char])
with open('utf8data.txt', 'wb') as fp:
fp.write(data.encode('utf-8'))

The font you found is getting you into trouble. The actual cell text is "bgah;", it gets rendered to பெயர் because you found a font that can work with 8-bit non-Unicode characters. So reading it or pasting it into Notepad++ is going to produce "bgah;" since that's the real text. It can only ever be rendered properly again by forcing the program that displays the string to use that same font.
Ditch the font and enter Unicode so it looks like this:

"bgah" looks like a Baamini based system, which is pre-unicode. It was popular in Canada (and the SL Tamil diaspora in general) in the 90s.
As the others mentioned, it looks like a custom visual encoding that mimics the performance of a foreign script while maintaining ASCII encoding.
Google "Baamini to unicode convertor". The University of Colombo seems to have put one up: http://www.ucsc.cmb.ac.lk/ltrl/services/feconverter/?maps=t_b-u.xml
Let me know if this works. If not, I can ask around and get something for you.

You could first check whether the encoding is TSCII, as this sounds most probable. It is an 8-bit encoding, and the fonts you copied are probably based on that encoding. Check out whether the TSCII to UTF-8 converter at SourceForge is suitable. The project there is called “Any Tamil Encoding to Unicode” but they say that only TSCII is supported for now.

Get window title with AppleScript in Unicode

I've stuck with the following problem:
I have a script which is retrieving title form the Firefox window:
tell application "Firefox"
if the (count of windows) is not 0 then
set window_name to name of front window
end if
end tell
It works well as long as the title contains only English characters but when title contains some non-ASCII characters(Cyrillic in my case) it produces some utf-8 garbage. I've analyzed this garbage a bit and it seems that my Cyrillic character is converted to the Utf-8 without any concerning about codepage i.e instead of using Cyrillic codepage for conversion it uses non codepages at all and I have utf-8 text with characters different from those in the window title.
My question is: How can I retrieved the window title in utf-8 directly without any conversion?
I can achieve this goal by using AXAPI but I want to achieve this by AppleScript because AXAPI needs some option turned on in the system.
UPD:
It works fine in the AppleScript Editor. But I'm compiling it through the C++ code via OSACompile->OSAExecute->OSADisplay
I don't know the guts of the AppleScript Editor so maybe it has some inside information about how to encode the characters

I've found the answer when wrote update. Sometimes it is good to ask a question for better it understanding :)
So for the future searchers: If you want to use unicode result of the script execution you should provide typeUnicodeText to the OSADisplay then you will have result in the UTF-16LE in the result AEDesc

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

ImpEx Import of Arabic Text - encoding

Related

I need to remove a specific unicode in my existing subtitle text file

Print Chinese / Japanese character in Zebra Printer with ZPL

Importing German values in openerp 7

Convert non english characters into Unicode (UTF-8)

Get window title with AppleScript in Unicode

Categories

Resources