Different glyph index for same glyph in Word and InDesign - ms-word

When selecting a character from the Wingdings font to insert into a document, MS Word 2015 and InDesign CS 5.5 show different glyph indices for at least some characters.
According to Word, the character I am looking for, Unicode 25A0 (black square), has a decimal Wingdings gid of 110.
The two online resources I found, http://www.alanwood.net/demos/wingdings.html and http://www.csn.ul.ie/~caolan/wingdings/proposal/, confirm that number.
Still, InDesign displays a gid of 132.
Why does InDesign show a different gid for the same glyph of the same font?

You are confusing the assigned Unicode codepoint or character set encoding with the glyph index.
This font uses a custom character set, and it's the character n (U+006E) that actually displays a square -- not the Unicode character U+25A0. If you insert an Arial Unicode MS black square in your text and then change the font to Wingdings, you will see the character disappear.
The glyph index as reported by InDesign has nothing to do with character sets, Unicode, and anything else. This index is just an increasing number for each separate glyph in a font -- of which there may be few, quite some, or a lot. Glyph indexes always start at 0, which indicates the 'missing character' character (and so is not useful to use), and every next glyph is enumerated. This enumeration is translated to Unicode or another character mapping through a font's encoding. The enumeration in itself means nothing: a font may contain 'A', 'B', 'C' as its first 3 glyphs, or it may contain '#', '☺', 'ó', or anything else.
This glyph order is entirely irrelevant to the working of the font itself, and for all practical end-user operations you may as well forget there is such a thing. (It's something else if you try to change the font of a certain glyph -- you may get to see another character because there is absolutely no need for two fonts to have the same glyph in the exact same position.)
You are looking at the wrong place in InDesign: the Glyphs panel shows all glyphs and always shows their glyph index (because all characters have one, per definition of 'glyph index'), but only their Unicode codepoint if one is assigned.
InDesign is somewhat smarter than Word in how it processes dingbat/symbol fonts: rather than lying and saying 'this "is" the letter n', it assigns a private Unicode codepoint; in this case, U+F06E. That way you can never accidentally assign another font and get the letter n.

Related

Font whose glyphs all display as missing characters, might be named 'nosuchglyph'

I could use a font that might be named 'nosuchglyph'. This could be used for testing a font on the web with a font stack specified something like
style="font-family:'Some Font Regular', 'nosuchglyph'"
Every glyph would be an appropriate missing-character glyph. It could be minimally the same glyph for every single character -- something like a black rectangle or a rectangle outline with a '?' in the middle, or some similar image to convey that this glyph is missing. More ideal would be a rectangle per Unicode character code with a tiny display of that character code, something like this ASCII art for, say, the character code U+4f4f:
[4f4f]
Whatever it is, for any glyph missing in "Some Font Regular" (in this example) the glyph in the output would come from "nosuchglyph".
This is nice for testing in order to see for a given font, say, "Some Regular Font" as in the example, which characters are from that font and which are missing. This is meant to ensure you do not get the normal substitution for missing characters, which would show the glyph from a font later in the stack or else from some default fallback font.

What is different between encoding and font

Encoding is maping that gives characters or symbols a unique value.
If a character is not present in encoding no matter what font you use it won't display correct fonts
Like Lucida console, arial or terminal
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
My question is why terminal is behaving different to other font
Plz note
Windows 7
Locale English
For the impatient, the relevant link is at the bottom of this answer.
Encoding is maping that gives characters or symbols a unique value.
No, that are the specifics of a character-set, which maps certain characters to code points (using the Unicode terminology). Lets ignore the above for now.
If a character is not present in encoding no matter what font you use it won't display correct fonts Like Lucida console, arial or terminal
Font formats map Unicode code points to glyphs. Not all code points may be mapped for specific fonts - somebody has to create all these symbols. Again, lets ignore this.
Not all binary encodings may map to code points within a certain character set; this is possibly what you mean.
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
Your terminal seems to operate on a different character set, probably the "OEM" or "IBM PC" character set instead of a Unicode compliant character set or Windows-1252 / ISO 8859-1 / Latin.
If it is the latter than you are out of luck unless you can set your output-terminal to another character set, as Windows-1252 doesn't support the box drawing characters at all.
Solutions:
If possible try and set the output to OEM / IBM PC character set.
If it is Unicode you can try and convert the output to Unicode: read it in (decode it) using the OEM character set and then re-encode it using the box drawing subset.

Invisible characters - ASCII

Are there any invisible characters? I have checked Google for invisible characters and ended up with many answers but I'm not sure about those. Can someone on Stack Overflow tell me more about this?
Also I have checked a profile on Facebook and found that the user didn't have any name to his profile? How can this be possible? Is it some database issue? Hacking or something?
When I searched over Internet, I found that 200D is an ASCII value with an invisible character. Is it true?
I just went through the character map to get these.
They are all in Calibri.
Number    Name      HTML Code    Appearance
------    --------------------  ---------    ----------
U+2000    En Quad           " "
U+2001    Em Quad           " "
U+2002    En Space        " "
U+2003    Em Space        " "
U+2004  Three-Per-Em Space      " "
U+2005  Four-Per-Em Space         " "
U+2006 Six-Per-Em Space       " "
U+2007 Figure Space         " "
U+2008 Punctuation Space        " "
U+2009 Thin Space         " "
U+200A Hair Space        " "
U+200B Zero-Width Space ​      "​"
U+200C Zero Width Non-Joiner ‌   "‌"
U+200D Zero Width Joiner ‍      "‍"
U+200E Left-To-Right Mark ‎      "‎"
U+200F Right-To-Left Mark ‏      "‏"
U+202F Narrow No-Break Space        " "
How a character is represented is up to the renderer, but the server may also strip out certain characters before sending the document.
You can also have untitled YouTube videos like https://www.youtube.com/watch?v=dmBvw8uPbrA by using the Unicode character ZERO WIDTH NON-JOINER (U+200C), or ‌ in HTML. The code block below should contain that character:
‌‌
There is actually a truly invisible character: U+FEFF.
This character is called the Byte Order Mark and is related to the Unicode 8 system. It is a really confusing concept that can be explained HERE The Byte Order Mark or BOM for short is an invisible character that doesn't take up any space. You can copy the character bellow between the > and <.
Here is the character:
> <
How to catch this character in action:
Copy the character between the > and <,
Write a line of text, then randomly put your caret in the line of text
Paste the character in the line.
Go to the beginning of the line and press and hold the right arrow key.
You will notice that when your caret gets to the place you pasted the character, it will briefly stop for around half a second. This is becuase the caret is passing over the invisible character. Even though you can't see it doesn't mean it isn't there. The caret still sees that there is a character in that area that you pasted the BOM and will pass through it. Since the BOM is invisble, the caret will look like it has paused for a brief moment. You can past the BOM multiple times in an area and redo the steps above to really show the affect. Good luck!
EDIT: Sadly, Stackoverflow doesn't like the character. Here is an example from w3.org: https://www.w3.org/International/questions/examples/phpbomtest.php
Other answers are correct - whether a character is invisible or not depends on what font you use. This seems to be a pretty good list to me of characters that are truly invisible (not even space). It contains some chars that the other lists are missing.
'\u2060', // Word Joiner
'\u2061', // FUNCTION APPLICATION
'\u2062', // INVISIBLE TIMES
'\u2063', // INVISIBLE SEPARATOR
'\u2064', // INVISIBLE PLUS
'\u2066', // LEFT - TO - RIGHT ISOLATE
'\u2067', // RIGHT - TO - LEFT ISOLATE
'\u2068', // FIRST STRONG ISOLATE
'\u2069', // POP DIRECTIONAL ISOLATE
'\u206A', // INHIBIT SYMMETRIC SWAPPING
'\u206B', // ACTIVATE SYMMETRIC SWAPPING
'\u206C', // INHIBIT ARABIC FORM SHAPING
'\u206D', // ACTIVATE ARABIC FORM SHAPING
'\u206E', // NATIONAL DIGIT SHAPES
'\u206F', // NOMINAL DIGIT SHAPES
'\u200B', // Zero-Width Space
'\u200C', // Zero Width Non-Joiner
'\u200D', // Zero Width Joiner
'\u200E', // Left-To-Right Mark
'\u200F', // Right-To-Left Mark
'\u061C', // Arabic Letter Mark
'\uFEFF', // Byte Order Mark
'\u180E', // Mongolian Vowel Separator
'\u00AD' // soft-hyphen
The question about invisible characters in Unicode deserves a more thorough explanation.
Short answer - there are lots
Here are 134 invisible characters →­؜᠎​‌‍‎‏‪‫‬‭‮⁠⁡⁢⁣⁤⁧⁦⁨⁩𝅳𝅴𝅵𝅶𝅷𝅸𝅹𝅺󠀁󠀠󠀡󠀢󠀣󠀤󠀥󠀦󠀧󠀨󠀩󠀪󠀫󠀬󠀭󠀮󠀯󠀰󠀱󠀲󠀳󠀴󠀵󠀶󠀷󠀸󠀹󠀺󠀻󠀼󠀽󠀾󠀿󠁀󠁁󠁂󠁃󠁄󠁅󠁆󠁇󠁈󠁉󠁊󠁋󠁌󠁍󠁎󠁏󠁐󠁑󠁒󠁓󠁔󠁕󠁖󠁗󠁘󠁙󠁚󠁛󠁜󠁝󠁞󠁟󠁠󠁡󠁢󠁣󠁤󠁥󠁦󠁧󠁨󠁩󠁪󠁫󠁬󠁭󠁮󠁯󠁰󠁱󠁲󠁳󠁴󠁵󠁶󠁷󠁸󠁹󠁺󠁻󠁼󠁽󠁾󠁿← and here is their escaped ASCII representation: U+00AD U+061C U+180E U+200B U+200C U+200D U+200E U+200F U+202A U+202B U+202C U+202D U+202E U+2060 U+2061 U+2062 U+2063 U+2064 U+2067 U+2066 U+2068 U+2069 U+206A U+206B U+206C U+206D U+206E U+206F U+FEFF U+1D173 U+1D174 U+1D175 U+1D176 U+1D177 U+1D178 U+1D179 U+1D17A U+E0001 U+E0020 U+E0021 U+E0022 U+E0023 U+E0024 U+E0025 U+E0026 U+E0027 U+E0028 U+E0029 U+E002A U+E002B U+E002C U+E002D U+E002E U+E002F U+E0030 U+E0031 U+E0032 U+E0033 U+E0034 U+E0035 U+E0036 U+E0037 U+E0038 U+E0039 U+E003A U+E003B U+E003C U+E003D U+E003E U+E003F U+E0040 U+E0041 U+E0042 U+E0043 U+E0044 U+E0045 U+E0046 U+E0047 U+E0048 U+E0049 U+E004A U+E004B U+E004C U+E004D U+E004E U+E004F U+E0050 U+E0051 U+E0052 U+E0053 U+E0054 U+E0055 U+E0056 U+E0057 U+E0058 U+E0059 U+E005A U+E005B U+E005C U+E005D U+E005E U+E005F U+E0060 U+E0061 U+E0062 U+E0063 U+E0064 U+E0065 U+E0066 U+E0067 U+E0068 U+E0069 U+E006A U+E006B U+E006C U+E006D U+E006E U+E006F U+E0070 U+E0071 U+E0072 U+E0073 U+E0074 U+E0075 U+E0076 U+E0077 U+E0078 U+E0079 U+E007A U+E007B U+E007C U+E007D U+E007E U+E007F
Are there more? Yes.
Are there invisible characters in the ASCII range? Depends on the font.
Long answer - ready? set. go!
The Unicode Standard enables anyone to read and write in their own language. To do that, it lists unique code points󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 (U+hex), that are categorized into letters (D,ž,Dž,ʶ,愛,𓂀), symbols (+∊≠,£¥₪,҂˚˟˿), marks (ם֑֟֯ ,ী,◌҉ ), separators ( , , , ,  ), emojis (😊,🙏,👍), and much more. ASCII/Basic Latin is the very beginning of the table and more code points are added every update.
Simply listing unique numbers for characters is not enough. Characters can change their shape or change the sentence depending on the context. To support that, every code point comes with a list of properties . These properties may define the width (AA), its role in the sentence (-“.), its direction (cכ), and much more.
Most invisible characters have the property General_Category=Format (other answers here included Spaces as well). Theis characters have a supporting role to a word/sentence. Here are some examples:
General Punctuation Block -
Invisible characters that are an integral part of some writing systems and emojis. Common ones are Zero width joiner (U+200D), Zero width non joiner (U+200C), Word joiner (U+2060)
Explicit Bidirectional Formatting characters - 12 invisible characters󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 used to enforce different direction constraints on the sentence. Helping present text to more than 300 million speakers of right-to-left languages e.g. Hebrew or Arabic.
Tags - 97 invisible characters that mirror ASCII (just drop the E and you get characters in the ASCII range). These are used as emoji modifiers and digital signatures to prove who copied your text.
This all leads to talk about exploiting invisible characters for homograph attack/visual spoofing. Sometimes it's harmless like invisible names and titles but in lots of cases they are used maliciously. For example U+202E is one invisible character that keeps doing more harm than good for decades!!
Last point, there is another way to make invisible characters using fonts. Fonts are files that store glyphs (pictures of characters), that present the characters' look. If the font does not contain a glyph for a codepoint, a substitute/replacement󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 character is displayed (e.g. �, □). But if the font contains a transparent glyph for a codepoint, then the character is invisible, only when displayed by that font. This is the only way to have invisible characters in the ASCII range (for example can you see →``← U+000C Form Feed).
Hope you find this explanation helpful and may you check strings for invisible characters more often 󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩😉
Yes you can use invisible or blank name on facebook by using some HTML code/symbols.
Method 1:
Copy and paste (ﹺ                         ﹺ) symbols without brackets in your first and last name field.
Method 2:
Click on edit name. Now copy and paste following symbol in first and last name.
ՙՙ ՙՙ
An invisible Character is ​, or U+200b
​

Why is � not listed in the Windows character table?

So there is "�", the replacement character 0xFFFD, the symbol for a byte-sequence that is not represented as a character in Unicode (right?).
Well, I wonder what 'is' this 'thing' actually, as I can't 'see'/'find' it in Windows' character table, neither searching for the symbol itself, nor searching for FFFD. But after all it is a character, right? So it should be in there. I am confused ...
The font Arial does probably not contain a corresponding glyph to represent that character. Try a different font like Arial Unicode or Arial Unicode MS.

FreeType 2 - Unicode Character Codes?

The documentation for FreeType2 says that the default character map used is the Unicode map... However, when I attempt to retrieve character code for Unicode 'T', it gives me Unicode 'Z' using:
glyph_index = FT_Get_Char_Index(face, text[n]);
What I really need is a way to find out how many glyphs are in the font face and what their Unicode value maps to per each one. Is there any way to do this. I've tried almost every FreeType function and can't get good results.
Thanks
I know this is old but...
What you ask is not possible. There are glyphs that don't match to any Unicode codepoint, and there are Unicode codepoints that map to multiple glyphs, depending on the neighboring glyph. For example, the "ff" in many fonts is a special glyph to make typesetting work better. There is no "ff" codepoint in Unicode. It's up to your layout system to decide to use an "ff" glyph or not.
However, if you're asking for the 'T' character and getting a 'Z' glyph index, then there are probably issues with your font.