I am looking for help in printing the club symbol from the Arial font in postscript.
It has a Unicode of 9827 (2663 hexadecimal).
The ampersand is Unicode 38 (26 hexadecimal)
This postscript code snippet
!PS
/ArialMT findfont
12 scalefont setfont
72 72 moveto
<26> show
showpage
produces the ampersand system when I run it through Adobe Distiller. It appears that postscript understands Unicodes with UTF-8 encoding by default.
I am unable to do the same with the club symbol.
My research indicates that I have to use Character Encoding and this where I am lost.
Could some kind soul show me (hopefully fairly short) how to show the club symbol using Character Encoding?
Alternatively if you could point me to a simple tutorial it would be greatly appreciated.
Reading the reference manual leaves my head spinning.
PostScript does not understand Unicode, not at all, or at least not as standard, though there are ways to deal with it.
Section 5.3 of the PostScript Language Reference Manual contains complete information on Character Encoding. You really need to read this in detail, what you are asking is a deceptively simple question, with no simple answer.
The way this works for PostScript fonts is that the characters in the document have a character code which lies between 0 and 255. When processing text, the interpreter takes the character code and looks it up in the Encoding attached to the font. If you didn't supply an Encoding to the font, then it will normally have a pre-defined StandardEncoding.
StandardEncoding has some congruence with UTF-8, for character codes 0x7F and below, but it's not exactly the same I don't think.
The Encoding maps the character code to a glyph name, For example 0x41 in StandardEncoding maps to /A (that's a name in PostScript). Note that is not UTF-8 or anything else, it's a mapping. It's entirely possible, and common practice for subset fonts, to map the first character used to character code 1, the second to character code 2 and so on.
So if we applied that scheme to 'Hello World' we would use an Encoding which maps
0x01->/H
0x02->/e
0x03->/l
0x04->/o
0x05->/space
0x06->/W
0x07->/r
0x08->/d
and then we might draw the text by :
<0102030304050604070308> show
Which, as you can see, bears no relation to UTF-8 at all.
Anyway, having retrieved the glyph name the interpreter then looks at the CharStrings dictionary in the font and locates the key associated with the character code. So for StandardEncoding we would map the 0x41 to /A and we'd then look in the CharStrings dictionary for the /A key. We then take the value associated with that key, which will be a PostScript glyph program and run it.
Your problem is that you are trying to use a TrueType font. PostScript does not support TrueType fonts in that way, it does support them when they are defined as Type42 fonts, because a Type42 font carries around some additional information which allows the PostScript interpreter to treat them, broadly speaking, the same way as PostScript fonts.
Many modern PostScript interpreters will load a TrueType font and create a Type42 font from it for you, but this involves guessing at the additional information, and there's no real way to tell in advance how any given interpreter will deal with this. I suspect that Adobe Distiller will behave similarly to Ghostscript and attempt to map the type42 to a StandardEncoding.
Essentially the Encoding maps the character code to a key in the CharStrings dictionary and the value associated with that key is the GID. The GID is used to index the GLYF table in the TrueType font, the TrueType rasteriser then reads that glyph program.
So in order to create a type42 font with an Encoding which will map a character code to a club symbol, you would need to know what the GID of the club symbol in the font actually is. This can be derived from one of the CMAP subtables in the TrueType font, which is how PostScript interpreters such as Ghostscript build the required Encoding when they load a TrueType font as a Type 42. You would then need to alter the CharStrings dictionary in the type42 font so that it maps to the correct GID. You would also need to alter the Encoding; choose a character code that you want to use, map the character code to the key in the CharStrings dictionary.
You would have to determine what kind of keys the Encoding and CharStrings dictionary is using. It might be names or it might be integers or anything else. You could figure that out of course by looking at the content of the Encoding array.
In all honesty unless you know a lot about TrueType fonts I think it would be hard for you to reverse-engineer the font to retrieve the correct GID and then re-encode the font that gets loaded by the interpreter. You would also need to examine the contents of the font dictionary returned by findfont to see what the existing mapping is. And crucially you may need to modify the CharStrings dictionary to map the key to the GID. It may be that Distiller returns a dictionary which is defined as no-access which will prevent you looking inside or (or at least, inside parts of it). You might be able to get away with looking at the Encoding in the font dictionary and modifying that, if the CharStrings dictionary already contains a key for every glyph in the font, which it may well do.
I could probably guide you through doing this with Ghostscript, but I have no idea how Adobe Distiller defines TrueType fonts loaded from disk.
You could use a CIDFont instead. These are defined in section 5.11.1 and it may be that if you were to use something like the pre-defined Identity-H or UCS2 CMaps you could create a CID-Keyed instance of ArialMT with TrueType outlines which would work for your Unicode code point.
But that would mean defining the font yourself, so you would need to include the whole TrueType font as part of your PostScript program. Again this isn't simple.
There is some good information here: Show Unicode characters in PostScript
I also have the ArialMT.ttf and made the ArialMT.ttf.t42 just to look inside. I found the /club glyph with GID 389 as described by KenS and tried this as described in the linked post with good results:
%!
100 300 moveto
/ArialMT.ttf 46 selectfont (ArialMT) show
100 200 moveto /club glyphshow
showpage
Note: I use ArialMT.ttf because the TT font wasn't installed in the ghostscript Fontmap just in the current directory so used gs -P for that reason. The normal /ArialMT findfont should work when the TT font is already installed in the search path. This is my first attempt with these glyphs and was just using trial and error.
There is a comprehensive Adobe list of glyphs that map many of the Unicode characters: https://github.com/adobe-type-tools/agl-aglfn/blob/master/glyphlist.txt.
If the desired Unicode character is in that list, say club;2663 or clubsuitblack;2663 or clubsuitwhite;2667, all one needs to say is
/club glyphshow and most modern fonts will know what to do. But #KenS says this "can cause problems".
Instead the preferred scheme that emerges from the recommended references is to:
create a composite font in the preamble, one for each of the fonts
you are using;
include the lower 256 characters as Font0;
add whatever Unicode characters you are planning to use, in chunks of
256 characters, as Font1, Font2 etc.;
remap the Unicode of the special characters onto a two-character
sequence, of the sub-font index within the composite font, followed
by the byte that is the index of the character within that sub-font.
The following is a complete example of both methods.
I use http://www.acumentraining.com/Acumen_Journal/AcumenJournal_May2002.zip, but with Font1 is a custom-remapping of a portion of the same font as Font0, re-using some of the well known ascii character(s).
This a complete file.eps:
%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: 0 0 792 612
%%LanguageLevel: 2
%%EndComments
%%BeginProlog
userdict begin
%%EndProlog
%%BeginSetup
% The following encodes a few useful Unicode glyphs, if only a few are needed.
% Based on https://stackoverflow.com/questions/54840594/show-unicode-characters-in-postscript
% Usage: /Times-Roman /Times-Roman-Uni UniVec new-font-encoding
/new-font-encoding { <<>> begin
/newcodesandnames exch def
/newfontname exch def
/basefontname exch def
/basefontdict basefontname findfont def % Get the font dictionary on which to base the re-encoded version.
/newfont basefontdict maxlength dict def % Create a dictionary to hold the description for the re-encoded font.
basefontdict
{ exch dup /FID ne % Copy all the entries in the base font dictionary to the new dictionary except for the FID field.
{ dup /Encoding eq
{ exch dup length array copy % Make a copy of the Encoding field.
newfont 3 1 roll put }
{ exch newfont 3 1 roll put }
ifelse
}
{ pop pop } % Ignore the FID pair.
ifelse
} forall
newfont /FontName newfontname put % Install the new name.
newcodesandnames aload pop % Modify the encoding vector. First load the new encoding and name pairs onto the operand stack.
newcodesandnames length 2 idiv
{ newfont /Encoding get 3 1 roll put}
repeat % For each pair on the stack, put the new name into the designated position in the encoding vector.
newfontname newfont definefont pop % Now make the re-encoded font description into a POSTSCRIPT font.
% Ignore the modified dictionary returned on the operand stack by the definefont operator.
end} def
/Helvetica /Helvetica-Uni [
16#43 /club % ASCII 43 = C = /club
] new-font-encoding
/Helv
<<
/FontType 0
/FontMatrix [ 1 0 0 1 0 0 ]
/FDepVector [
/Helvetica findfont % this is Font0
/Helvetica-Uni findfont % this is Font1
]
/Encoding [ 0 1 ]
/FMapType 3
>> definefont pop
%%EndSetup
%%BeginScript
/Helv 20 selectfont
72 300 moveto
(The club character is \377\001C\377\000 a part of the string.) show
/Helvetica findfont 20 scalefont setfont
263 340 moveto
/club glyphshow
showpage
%%EOF
Which produces this
Obviously, this can be extended to more characters, but only 256 per sub-font. I am not aware of a "standard" convention for such re-encoding, although I would imagine a set of Greek letters alpha,beta,gamma... would map pretty obviously onto a,b,c... Perhaps somebody else is aware of such an implementation for all of the Unicode characters from the Adobe glyph list using multiple custom sub-fonts, and can provide a pointer..
Encoding is maping that gives characters or symbols a unique value.
If a character is not present in encoding no matter what font you use it won't display correct fonts
Like Lucida console, arial or terminal
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
My question is why terminal is behaving different to other font
Plz note
Windows 7
Locale English
For the impatient, the relevant link is at the bottom of this answer.
Encoding is maping that gives characters or symbols a unique value.
No, that are the specifics of a character-set, which maps certain characters to code points (using the Unicode terminology). Lets ignore the above for now.
If a character is not present in encoding no matter what font you use it won't display correct fonts Like Lucida console, arial or terminal
Font formats map Unicode code points to glyphs. Not all code points may be mapped for specific fonts - somebody has to create all these symbols. Again, lets ignore this.
Not all binary encodings may map to code points within a certain character set; this is possibly what you mean.
But problem is terminal font is showing line draw characters but other font is not showing line draw characters
Your terminal seems to operate on a different character set, probably the "OEM" or "IBM PC" character set instead of a Unicode compliant character set or Windows-1252 / ISO 8859-1 / Latin.
If it is the latter than you are out of luck unless you can set your output-terminal to another character set, as Windows-1252 doesn't support the box drawing characters at all.
Solutions:
If possible try and set the output to OEM / IBM PC character set.
If it is Unicode you can try and convert the output to Unicode: read it in (decode it) using the OEM character set and then re-encode it using the box drawing subset.
I m looking up the format of GSM SMS.
When PDU mode is used, the TP-UD field is said to be one of the three, 7bit is for ascii symbol, 8 bit is for data, and UCS2 is for the unicode, like Japanese.
There is an example, Hello! has the TP-UD field C8 32 9B FD 0E 01. why? It's not ascii, not GSM03.38 basic character set.
And what if the user data is a mix of ascii character and Japanese, is it unicode for all?
Thanks.
Short message content encoding type(7-bit, 8-bit, 16 bit etc. ) is being chosen by looking the data coding scheme parameter value. If message content consists mixed characters of GSM default alphabet and unicode(e.g. Russian, Arabic, Japan etc), data coding scheme value must be set to 16 bit(UCS-2).
GSM 7-bit default alphabet is for English and several European languages. A limited number of languages, like Portuguese, Spanish, Turkish may use 7-bit encoding with national language shift table defined in 3GPP 23.038. 8-bit encoding is dedicated to binary short messages.
Try Cloudhopper Java SMPP API Charset utility class
msgChars = CharsetUtil.encode("öàß", CharsetUtil.CHARSET_GSM);
msgChars = CharsetUtil.encode("Точно так и было!", CharsetUtil.CHARSET_UCS_2);
I'm currently doing a project on Language translation where I'm converting an English text to Hindi. I'm trying to send the converted Hindi text to a mobile phone, but the message could not be displayed on my phone as there is no hindi font. But I have seen mobile network operators sending their promos in Hindi which my mobile reads like charm. I would like to know if there is any unicode or some other conversion of the text so that the hindi text will be displayed on my phone?
I also thought to start such program. I maintained the Unicode characters of all the hindi letters in a file.
2305
2309
2309 अआइईउऊऋएऐओऔअंअँअंअः
2325 कखगघङ
2330 चछजझञ
2335 टठडढण
2340 तथदधनऩ
2346 पफबभम
2351 यरऱलळऴव
2358 शषसह 2361
2364 च़चऽचाचि ---- -upto च॔ 2388
2392 क़ख़ग़ज़ड़ड़ढ़फ़य़ॠॡ 2401
2402 चॢचॣ। ॥ 2405
2406 ०१२३४५६७८९
I hope you are having 16-bit characters to store hindi characters.
I have a TrueType Font and I want to merge it with a Postscript Font File. When I am merging using fontforge I am getting an error:
TrueType font file is 2 byte Encoded and Postscript is 1 byte
I want to know whether there is any method by which we can merge 2-byte encoded Font files to 1-byte encoded files or is there any way we can convert TrueType fonts (2-byte encoding) to a Postscript File (1-byte encoding)? For example, a Korean font file is 2-byte encoding and I want to merge it with a 1-byte encoded Postscript file.
If you are using more than 255 glyphs, then you would need to convert the TrueType font into a CIDFont with TrueType outlines, and supply a suitable CMap to map from character codes to CIDs.
Alternatively, or if you only want to use up to 255 glyphs, you can convert the font into a PostScript Type 42 font which is a PostScript method for wrapping a TrueType font so that it can be used. If you want to sue more than 255 glyphs using this method then you need to split the original TrueType font into multiple type 42 fonts and switch font as required to use the glyphs.
But basically you can't combine a TrueType font and a type 1 (or CFF) font succesfully, the technologies are quite different.
Why do you want to do this anyway ?