FontLab Glyph Name to Unicode Mapping Not Working - unicode

I have created my own font pack with a custom encoding file which worked great. I am just having trouble using the "Generate Unicodes" feature. I want the Glyphs to automatically take on corresponding unicodes based on the mapping file.
I created a file called standard.nam and place it in "FontLab VI.app/Contents/Resources" as suggested. I even created a custom file and pointed the app to it.
No matter what I do the Glyph names never match to their corresponding Unicodes. Nothing happens at all when I click the little refresh unicode name.
How do I get this to work? I want my Glyph using my custom encoding file to get assigned their unicodes from the ".nam" database file. Below is my current file:
%FONTLAB NAMETABLE[: Database_name]
0x0020 !visiblespace
0x0020 space
0x0021 exclam
0x0022 quotedbl
0x0023 numbersign
0x0024 dollar
HELP PLEASE!!!

Thanks to the Fontlab team I was able to fix it. The updated code is below and I hope it helps someone else:
%%FONTLAB NAMETABLE: Database Name
0x0020 !visiblespace
0x0020 space
0x0021 exclam
0x0022 quotedbl
0x0023 numbersign
0x0024 dollar

Related

Unicode converted text isn't shown properly in MS-Word

In a mapping editor, the display is correct after the legacy to unicode conversion for DEVANAGARI text shown using a unicode font (Arial Unicode MS). However, in MS-WORD, the display isn't as expected for the same unicode text in the unicode font (Arial Unicode MS) or any other Devanagari unicode fonts. The expected sequence of unicodes are provided as per the documentation. The sequence can be seen on the left-hand side table.
Please let me know where I am going wrong.
Thanks for your help!
Does your map have to insert the zero_width_joiner? The halant (virama) by itself is enough to get the half-consonant (for some combinations) and in particular, it may be that Word is using the presence of the ZWJ to keep them separate.
If getting rid of the ZWJ doesn't help, another possibility is that Word may be treating the individual characters of the text string as individual "runs" of text.
If those first 4 characters are not in a single run, this can happen.
[aside: the way to tell if it's being treated as a single run, is to save the document as an xml file and then open it with something like notepad++ and look at the xml "w:t" element (IIRC) associated with these characters. If they're all in separate w:t elements, it means they're in separate runs. In that case, you might need to copy the text from Word to some other tool (e.g. Notepad++) and then copy it from there and paste it back in Word -- that might cause it to be imported into Word in a single run.

some nonASCII Characters are coming like a thick vertical line in talend 5.6

I have a file with name "→Ψjohn.txt" and now i want to remove those special characters from the file name and update the file name as "john.txt". But talend is recognizing those characters as thick vertical line so it is not recognizing the source file in the physical location.can anyone please suggest a solution.
I have this file in database as well as physical location and when I am reading the file from the database it has to remove the special characters and update the same in the database as well as physical location.
in database the file looks like this
database
when I am reading from database using talend it looks like following
talend
Thanks in advance
You dont have to specify the exact file name, you can use tFilelist and get all file in a specific directory, you can also regex to mask some names, for example iterate over all *john.txt.
Once you get the actual filename, use a regex to remove unwanted characters for example : \W for non word characters and rename the file through a system command or using tFileCopy.
After seeing the pictures this is not a Talend issue but has to do with the font used in the Talend (Eclipse) console and the enconding setting of Java.
Those rectangles (better visible with a bigger font size) show that the font cannot represent your characters - it has no symbols for it.
Talend (Eclipse) settings
In Talend, navigate to Window / Preferences and choose General / Appearance / Colors and Fonts (as described in the Eclipse help). Check what font you are using for Debug / Console font. I've had good results with the font Consolas, which is set from Talend version 6 onwards. Beforehand it was Courier New.
Java encoding
You should check that Java uses UTF-8 encoding to display the characters. The Console Enconding has to be set to UTF-8. See this answer for an explanation how to do this.
Alternative
Alternatively you could store all the log data into a file and open this file in e.g. Notepad++ to see if the output is generated correctly and only displayed wrong.

Found some square boxes in a xliff file and not sure what they are?

I'm looking at a xliff file and found some weird boxes which I don't know what they are? (Please see screenshot)
Do you guys have any ideas what the weird bug boxes are?
Thank you very much and I'm looking forward to your reply!
I have never seen that character, but here is how I would go about finding out what it is:
The first thing to do is to check the source and target language of the XLIFF file, which should be defined in the XLIFF header. Perhaps this character is a valid character in either the source or the target language script.
The next step depends on whether you can contact the person who created the XLIFF file. If yes, you can show them what the file looks like for you and ask them if the file has perhaps been garbled during transmission.
If not, you could check the encoding of the XLIFF file. If it is UTF-16, just open the file in a hex editor, find the code point for this character, and look it up on unicode.org. If the file is encoded as UTF-8 open it in Notepad++ (or any other text editor that allows you to change the encoding), convert it to UTF-16, then proceed as described above.
If you don't know the encoding of the file it becomes a matter of guessing. You can look at some other <trans-units> (assuming that there are more than this one in your XLIFF file): if they contain other extended characters and they are displayed correctly your editor has probably guessed the right encoding, and you can convert to Unicode and look up the character code. Different text editors have different ways of guessing encodings: try a few.
It's possible that those characters are the result of an encoding conversion error, which are commonly called mojibake.
It's also possible this is some sort of emoji or unusual glyph that's not rendering correctly in your editor. This would be unusual, but given that it appears to be a UI string, it might be possible.

MS Access Convert from Unicode when Reading from Text File

So, I have an Access database where I import data from a text file. The file is semi-colon delimited. Occasionally (and will become more frequent) I receive a file from one of our affiliates from Russia. The file has unicode (I think) characters like "Ìèðîøíèêîâ" instead of "Мирошников". Ultimately, I'd like to translate those into English upon import, but for now, I'll accept the Russian characters.
How should I go about doing this? Currently, I'm reading each line of the file, using the SPLIT function to separate each field by the ";" separator into an array, and sticking each array element into a table. Would changing the system Keyboard Layout to Russian prior to this work, or is it more complicated than that.
Does any of this make sense, or should I just bag it and go grab a beer (or some Vodka)?
Thanks!
You should be able to create an "Import Specification" that will tell Access how to convert the character data. Follow the procedure here...
Importing a text with separators using VBA
...and choose the appropriate character set from the "Code Page" combo box.
If you need to perform the imports from VBA code then you can save the specification (using the "Save As..." button) and then re-use that specification in a DoCmd.TransferText statement.

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instances of the placeholder values with text from a database record. The issue I'm having is that users have pasted a certain type of apostrophe into the web app (and therefore into the database) that is not rendering correctly in the resulting RTF file. It is rendering like this - ’.
I need a way to spot this invalid apostrophe in the code and replace it with a valid one. Unfortunately when I paste the invalid apostrophe into the Visual Studio editor it gets converted into the correct one. So I need another way to express this invalid apostrophe's value. Unfortunately I do not know a great deal about unicode and other encodings so I'm calling out for help with this.
Any ideas?
If you really just want to figure out what the character is you might want to try and paste it into a text editor like ultraedit. It has a hex mode that you can flip to to see the actual underlying bytes.
In order to do the replace once you've figured out the character you'd do something like this in Vb,
text.Replace(ChrW(2001), "'")
Note that you might not be able to figure it out easily using the text editor because it might also get mangled by paste from the clipboard. You might want to either print some debug of the ascii values from code. You can use the AscW function to do that.
I can't help but think that it may actually simply be a case of specifying the correct encoding to use when you write out the stream though. Assuming you're using a StreamWriter you can specify it on the constructor. I'm guessing you actually want ASCII given your requirement.
oWriter = New System.IO.StreamWriter(path, False, System.Text.Encoding.ASCII)
It looks like you probably want to encode characters out of the 8 bit range (>255).
You can do that using \uNNNN according to the wikipedia article.