how to generate Chinese Characters using Postscript? - unicode

Does anyone knows how to generate Chinese characters using Postscript or related tools? I'd like to use unicode to represent Chinese characters but it seems that Postscript doesn't support unicode, yet. In addition, I'd like to specify several fonts to generate the same character.
Thus, I have two questions:
1. how to use unicode in Postscript? Or how to enumerate Chinese Character set in the postscript way?
2. How to specify the fonts configurations using Postscript?
At last, in case postscript cannot do this job, what tools should I turn to for my purpose?
Thank you very much!
-Jin

In Adobe's official PostScript language specification there is no specific support for Unicode fonts. (And this is the final version of the spec for PS Level 3, valid since its publication in 1999 -- PostScript as a language is no longer developed...)
However, PostScript supports (since Level 2) multi-byte fonts (2-, 3- and 4-bytes) in a generic way (see 'CID'). All PostScript fonts need an "encoding": an encoding basically is a table telling at which index position of a font which glyph description for a given character can be found. So while there are no Unicode fonts as such, there are multi-byte CID fonts which provide ranged subsets of Unicode.
Also, there are no freely re-distributable CMaps. (A CMap .) If you need a CMap, you have to derive it from the Windows codepage and the matching Adobe CMap.
If you just look for a "super-simple" method to use Unicode text strings with no need of checking for ranges, language etc.: sorry to disappoint you. There is no way. That would be a pipe dream.
Have a look at CID-keyed fonts instead. These are designed to include a large number of glyphs. (Page 364ff in PLRM)
Update: Linked to the correct page with CID font description.

Related

Font for "math bold script" unicode charset

I wouldn't believe I have been stuck on this for one hour, but it seems the fonts for extended unicode characters are not easyly available as TTF / OTF for use on computers, especially with graphic software where unicode fallback doesn't work
especifically I looking for the so called Math bold script
somehting like : 𝓓𝓮𝓶𝓸 𝓯𝓸𝓷𝓽 𝓐𝓑𝓒𝓖𝓟 𝓮𝓻𝓽𝓷𝓭 (<- those are extended chars)
as in https://textfancy.com/font-converter/
as imagen at: https://snipboard.io/fNYd7w.jpg
(becouse I am not sure we all see the same glyphs)
Note: what I am looking for, is a standrd TTF font, which normal glyphs are equal to those extended glyphs, meaning that the A looks like the 𝓐, B like 𝓑, and so on. So I could use the font as normal font in every software.
The STIX math fonts support the Unicode Mathematical Alphanumeric Symbols block.
https://www.stixfonts.org/
https://github.com/stipub/stixfonts
(Note: the variable fonts don't include support for that block of characters; only the static fonts do.)
Please note the intended use of those Unicode characters, as pointed out in the STIX project:
The sans serif, fraktur, script, etc., alphabets in Plane 1 (U+1D400-U+1D4FF) are intended to be used only as technical symbols.

Where are the unicode characters on the disk and what's the mapping process?

There are several unicode relevant questions has been confusing me for some time.
For these reasons as follow I think the unicode characters are existed on disk.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
There's a concept of UCD (unicode character database), and We can download it's latest version. UCD latest
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
So if the unicode characters does existed on the disk , then :
Where is it ?
How can I upgrade it ?
What's the process of mapping the unicode code point to a glyph ?
If I use a specific font, then what's the process of mapping the unicode code point to a glyph ?
If not, then what's the process of mapping the unicode code point to a glyph ?
It will very appreciated if someone could shed light on these problems.
Execute echo "\u6211" in terminal, it will print the glyph corresponding to the unicode code point U+6211.
That's echo -e in bash.
› echo "\u6211"
\u6211
› echo -e "\u6211"
我
Where is it ?
In the font file.
Some new version unicode characters like latest emojis can not display on my mac until I upgrade macOS version.
How can I upgrade it ?
Installing/upgrading a suitable font with the emojis should be enough. I don't have macOS, so I cannot verify this.
I use "Noto Color Emoji" version 2.011/20180424, it works fine.
What's the process of mapping the unicode code point to a glyph ?
The application (e.g. text editor) provides the font rendering subsystem (Quartz? on macOS) with Unicode text and a font name. The font renderer analyses the codepoints of the text and decides whether this is simple text (e.g. Latin, Chinese, stand-alone emojis) or complex text (e.g. Latin with many marks, Thai, Arabic, emojis with zero-width joiners). The renderer finds the corresponding outlines in the font file. If the file does not have the required glyph, the renderer may use a similar font, or use a configured fallback font for a poor substitute (white box, black question mark etc.). Then the outlines undergo shaping to compose a complex glyph and line-breaking. Finally, the font renderer hands off the result to the display system.
Apart from the shaping, very little of this has to do with Unicode or encoding. Font rendering already used to work that way before Unicode existed, of course font files and rendering was much simpler 30 years ago. Encoding only matters when someone wants to load or save text from an application.
Summary: investigate
Truetype/Opentype font editing software so you can see what's contained in the files
font renderers, on Linux look at the libraries pango and freetype.
Generally speaking, operating system components that use text use the Unicode character set. In particular, font files use the Unicode character set. But, not all font files support all the Unicode codepoints.
When a codepoint is not supported by one font, the system might fallback to another that does. This is particularly true of web browsers. But ultimately if the codepoint is not supported, an unfilled rectangle is rendered. (There is no character for that because it's not a character. In fact, if you were able to copy and paste it as text, it should be the original character that couldn't be rendered.)
In web development, the web page can either supply or give the location of fonts that should work for the codepoints it uses.
Other programs typically use the operating system's rendering facilities and therefore the fonts available through it. How to install a font in an operating system is not a programming question (unless you are including a font in an installer for your program). For more information on that, you could see if the question fits with the Ask Different (Apple) Stack Exchange site.

Why Julia returns "\uf8ff" when I use  (Apple logo) unicode?

I thought Julia supports raw unicode input, such as:
julia> test = "π£¢∞§"
"π£¢∞§"
julia> 😘 = 1 ;
julia> print(😘 )
1
However, it seems julia does not support  (Apple logo).
julia>  = 123
ERROR: syntax: invalid character ""
julia> test = ""
"\uf8ff"
I wonder what's the underlying reason for that, and whether there is a way I can use  character in Julia?
I believe this link more properly explains the case of the unicode character that you see as apple's logo.
The problem is that the unicode value used is one of several that is set aside for private use. That means that each operating system, or application, or implementation is free to use those unicode characters for anything they want. It just so happens that Apple has chosen to use unicode character U+F8FF (decimal value 63743, or on the web as either  or ) as the Apple Logo. But some Windows fonts put in a Windows logo. And some other fonts put in a Klingon Mummification glyph. Or elven script. Or anything they want. And if it isn't defined in your local font, you'll just see a square.
My opinion is that Julia simply doesn't use this special value for anything. This also explains why your "π£¢∞§" characters work nicely - they are proper unicode characters, more largely supported by different platforms.
As a side note, i too see a simple square instead of the apple logo on this instance.
Edit
Here is a list of unicode characters supported by Julia.
To expand on Alex's answer...
Apple's logo () isn't an official Unicode symbol. I think there are very few commercial logos and symbols in the main Unicode tables.
However, Unicode provides some 'anything goes' areas (called PUAs - private use areas) that companies and individuals can fill with their own symbols, so that their users can access certain special glyphs. The main PUA is U+E000 to U+F8FF. Depending on which font you're using, you'll find all kinds of stuff assigned to these codes. On a Mac, I can usually get the Apple logo at "\uf8ff", with the right font selected, but not the Ubuntu symbol or the Windows logo, unless I choose another font. (There's also a fallback mechanism, whereby if you request a code point that the current font doesn't have, the OS will find a suitable substitute in another font and use that.)
[
In Julia, you can only use certain Unicode characters for variable names. Julia wouldn't allow anything from the private use area anyway, unless some fonts were distributed to every computer and everyone agreed on who had which Unicode point. (Mathematica makes extensive use of PUA symbols in their notebooks, because they can and do install their own fonts, and can then access various glyphs from the PUA in the notebook with guaranteed results.)
You are allowed to use emoji characters as variable names, so you could try the Emoji apple, rather than the Apple apple:

Why isn't there a font that contains all Unicode glyphs?

Pretty much as the title says. Rendering all of the unicode format correctly what with composite characters and characters that affect other characters and ligatures is really hard, I understand that. We have fonts that seem to be designed for maximum Unicode symbol support(Symbola, Code2001, others) and specialized fonts for certain planes or character ranges(BabelStone Han, others).
I don't know much about the underlying technical details for fonts. Is there a maximum size? Is it a copyright problem? Is essentially redrawing all ~110,000 extant glyphs too hard? I understand style concerns, but why not fall back to a 'default' font that had glyphs for everything? They're on unicode.org, redrawing them all would be pretty hard work but then you'd have a guaranteed fallback font for everything. If you got rights to some pre-existing fonts you could just composite them and that should help a lot. Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
What is that reason?
"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")
The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.
"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).
As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.
Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.
Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.
So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.
And as for this:
Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.
Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)
They're even open source, go out and thank the people who made them!
There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.
You will probably find what you are looking for at the following links.
Unicode Character Table
HTML Character Entity References
Huge List of Unicode Symbols
List of Unicode Characters of Category “Other Symbol
This other is funny for particular character since you can draw what you search:
Unicode Character Recognition
Can't enter unicode character with Alt+ even with EnableHexNumpad
Basic Questions
Q: How many characters are in Unicode?
A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.
Unicode font
A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.
Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).
...
No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).
As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.
Enjoy!

Dynamically generating Ge'ez unicodes

Hi. If you look at the image above, you will see a set of very weird-looking characters displayed along with some Latin characters. The weird ones are Eritrean characters. They are the characters we use in my country. So, to go strait to the point, I am hoping to create even the simplest possible bit of software or maybe even a batch file (if possible) to help me make these characters applicable on the web and make PCs understand and display them when being typed. Just like Arabic, Hindu, Chinese... characters are used. I think, since the question of 'creating a language' is often rare or because I may not know the correct term to use, when I searched the internet to find any tutorial or even a freelancer or anything, all I got was... nothing. So, I am hoping, if anyone can give me a step-by-step guide, or even just a clue about how to create this, would be very helpful.
Thanks.
Your question asks "how to create a language", so I will describe all the pieces that need to be in place for a new language (or more accurately, writing system). You ask specifically about the Eritrean alphabet, so I will provide specific examples of how that is supported on modern systems, and try to provide you pointers for the pieces you are missing. The answer is long, and provides lots of links, to support the two explanations.
To work with a script like Ge'ez (also known as Ethiopic, the script used to write Amharic in Ethiopia and Tigrinya in Eritrea) you need a few things. The first is a way to encode the characters; a set of numbers representing each character, that the computer can use to represent the text. Luckily, Unicode has become widespread, and Unicode is designed to be a universal character set that includes all of the world's languages. Unicode 3.0 introduced Ethiopic in the range U+1200-U+137F, and later versions added supplements of more obscure characters in the ranges U+1380-U+1394, U+2D80-U+2DDF and U+AB00-U+AB2F. If you wanted to support a language that Unicode didn't yet support, you would either need to use the private use area and define your own mapping of characters to code points, or submit a proposal to have your script added to Unicode; for example, see the proposal for Ethiopic.
Now, Unicode is just a character set; an abstract mapping between characters and numbers. To actually transmit these characters as a sequence of bytes, you use a character encoding. There are many encodings; some of them, like ASCII and ISO-8859-1 only cover a subset of the full Unicode character set, while others, like UTF-8 and UTF-16, cover the full range. For documents on the web, UTF-8 is the recommended character encoding; you should never use anything else if you can help it. In UTF-8, you can write Ge'ez directly in the document, for example: ኤርትራ. One thing to watch out for is that some programs (especially on Windows) will offer you "Unicode" as an encoding, when they mean UTF-16; you want to make sure to choose UTF-8, as it's more efficient and more compatible with a wider variety of software.
If you are using encodings that don't cover the full range of Unicode, or you don't have a good way to type those characters, and you are writing HTML or XML, you can use numeric character references instead. To do this, you write the Unicode code point of the character you want to refer between &# and ;. You can write the number in decimal, or in hexadecimal prefixed with an x. For example, ሀ can be written ሀ or ሀ (the semicolon at the end is important; it wasn't working for you in the comments because you were missing it).
Now that you have a character set, and a way of encoding it, you need a way to display it. Some scripts are easier to display in others. For all scripts, you need a font; a file defining how each character looks. A font contains a collection of glyphs, or drawings of each character. Some scripts, like the Latin alphabet (the alphabet used for English and most European languages) are relatively simple; each character is a separate glyph, and how they are drawn doesn't depend on what characters come before or after (though diacritics and ligatures can make it a little more complicated). Others, like Arabic and Indic scripts are written in cursive, where letters join to each other so how they are drawn can depend on the characters near them. These languages require special rendering support like Uniscribe or DirectWrite on Windows, Pango on Linux, or advanced font technology like Apple Advanced Typography or Graphite.
Luckily, Ge'ez is a fairly simple writing system, that doesn't require any specialized rending support or advanced font systems. Each of the characters is a separate glyph, and it doesn't require any reordering. So a normal OpenType font, displayed with the rendering systems already available on most computers, will do the job. But you still need the font in order to be able to display the characters. To create you own font, you can use FontForge (a free/open source tool), Fontographer, FontLab Studio, or other similar software.
For Ethiopic, you don't need to create your own. There are numerous fonts available that include the Ethiopic characters, but one that I would recommend is Abyssinica SIL from SIL (the Summer Institute of Linguistics), which does a lot of great work for minority languages and writing systems. Their fonts are available under a free license, that allows you to use the font, redistribute the font, and modify the font, so their fonts are quite flexible and can be used in a wide variety of situations. Windows ships with Nyala, which includes Ethiopic characters, since Windows Vista, and Ebrima, which added support for Ethiopic characters in Windows 8; so people on Windows Vista or later should be able to view Ethiopic characters already. Mac OS X ships with Kefa as of 10.6.
Once you have the font, you will be able to view Ethiopic characters. But other people reading your documents might not have those fonts (if they are using an older version of Windows or Mac OS X, if they didn't install all of the fonts that came with Windows, or the like), in which case the characters will probably show up as boxes or question marks on their machine. You could give those people a redistributable font like Abyssinica SIL, or they could buy a font that includes Ethiopic characters, but that can be inconvenient. For working with word processor documents or plain text, that's probably the best you can do; they will need the font installed on their computer to be able to display the text. If you create a PDF on your computer, it should embed the fonts that it needs to display the text, so creating a PDF can be a convenient way to include uncommon fonts with your document.
On a web page, you can use web fonts to link to a font from your stylesheet, allowing the users web browser to load that font for that web page. Web fonts are supported all the way back to IE 6, and in recent versions of most other web browsers, so they are actually quite widely supported. Different web browsers support different font file formats (EOT, TTF, OpenType, SVG, and WOFF), and slightly different syntaxes for the CSS (older versions of IE are based on an older draft), so it can be a bit tricky to make a page that is compatible with all browsers. Luckily, people have automated that process. Some web fonts are available online from Google Web Fonts or FontSquirrel, but sadly, I couldn't find any Ethiopic fonts already hosted. However, you can upload a font to FontSquirrel, and it will convert it into all of the major formats, and provide example CSS that will work on all modern browsers. Note that you should only do this with fonts that allow web embedding; not all fonts do. Since Abyssinica SIL is available under the Open Font License, you can use it, and I've run it through FontSquirrel for you; you can see how it works (check out the Glyphs & Languages tab), or download the kit. To use it, just put the font files (.ttf, .eot, .svg, and .woff) on your server in the same directory as your CSS, and include the following in your CSS:
#font-face {
font-family: 'abyssinica_silregular';
src: url('abyssinicasil-r.eot');
src: url('abyssinicasil-r.eot?#iefix') format('embedded-opentype'),
url('abyssinicasil-r.woff') format('woff'),
url('abyssinicasil-r.ttf') format('truetype'),
url('abyssinicasil-r.svg#abyssinica_silregular') format('svg');
font-weight: normal;
font-style: normal;
}
Now that you know how to encode Ethiopic, view Ethiopic characters, and share documents containing Ethiopic characters, you are probably going to want to type them into documents. If you are using HTML, you could just type the numeric character reference described above. In other documents, you could just copy and paste the characters from a chart of all of them, like the Wikipedia page. But that would become pretty cumbersome. Depending on your system and settings, you can also use Unicode Hex Input to enter arbitrary Unicode characters, but that is also cumbersome.
To fully support typing a script on your computer, you need a keyboard layout or input method. Some scripts can be typed with a simple keyboard layout, which says which keys correspond to which characters. If a script has more characters than there are keys on the keyboard, Shift and Alt (or Option on the Mac) can be used to map to more characters. Dead keys can also be used to expand the range of characters that you type; dead keys are sequences of two or more keystrokes that produce a single glyph; for example, on Mac OS X, to type "á", you can type Option-E A. To create a keyboard layout on Windows, you can use the Microsoft Keyboard Layout Creator. Mac OS X uses an XML format for keyboard layouts, so you can create one directly, or use Ukelele from SIL to create one more easily. On systems using X11 (like Linux), you can create your own XKB layouts.
If you need more characters than can be supported with modifiers and dead keys, like typing Chinese or Japanese, then you need a full-fledged input method. An input method allows you to run arbitrary code to map what someone types into the text it produces; for example, in a Japanese input method, you may type a phonetic representation of what you you are writing, and it will show you a drop down list of possible characters that match that representation, allowing you to choose the appropriate ones. Windows provides the Input Method Manager for writing input methods, Mac OS X the Input Method Kit, and X11 has a few ways to do it, such as SCIM and iBus.
The standard input method for Ethiopic makes extensive use of dead keys. It looks like the most popular existing input method for Ethiopic is Keyman, which is a commercial input method that works on Mac and Windows, and in addition there's a free variant, KMFL, that works on Linux. SIL has keyboard downloads for this input method; they also have a keyboard layout for Mac OS X which uses dead keys to achieve the same thing. Mac OS X has more extensive dead key support, so it doesn't require an input method to support this form of input, while on Windows you need to use an input method like Keyman to be able to enter input this way. Google has a free input method for Windows, Google Input Tools for Windows, which supports Amharic, and allows you to customize its input schemes; you could try adapting their Amharic support for Tigrinya.
If you just need to support input on a web site, you could do this in JavaScript, by writing an input method in JavaScript that transliterates from what someone types into Ethiopic. I do not know of any existing frameworks for doing this; however, I have found Korean and Japanese input methods implemented in JavaScript. You could take a look at how those are implemented. Upon looking further, I've found that Tavultesoft, who make Keyman, also have KeymanWeb, a JavaScript based input method that you can buy and embed in your site. MediaWiki also has an input method extension Narayam, that includes a JavaScript based input method for MediaWiki based sites like Wikipedia, which includes an experimental Amharic input method. There is also a draft W3C IME API, which helps provide an interface between web apps and native IMEs, as well as JavaScript based IMEs. Given that it's still a draft, I don't know if it is yet supported anywhere.
With all the above (a character set, encoding, fonts, rendering support, and an input method), you will be able to create, share, and view documents in your script. If that's all you need, great; the above will allow you to work with documents in a given script. But for full support for a language on your computer, not just its script or writing system, there are two more pieces that you need: a locale, and your software to be localized (translated and adapted) for your language.
A locale specifies how programs should manipulate text in a given script, language, culture, and/or encoding. There are many common text processing operations that programs do: displaying numbers, displaying dates and times, sorting strings or names, and so on. How these should work can differ based on the language, script, and culture of the person using the program; for instance, in Swedish "ü" is sorted along with "y", while in English and German it's sorted along with "u". Differences may not be based on language: both Mexico and Spain use Spanish, but in Mexico numbers are displayed with . as the decimal separator (1½ is written "1.5"), while in Spain , is used as the decimal separator (1½ is written "1,5"). A locale specifies all of these rules. Because the locale can vary based on language, culture, and sometimes other factors, the language and country are usually used to specify the locale, and other information can be used as well.
The most widely used standard for naming locales is RFC 4646 (BCP 47). Locales are usually specified as "ln-CC" with the language code ln and country code CC: US English is en-US, British English is en-UK, and French in France is fr-FR. If more information needs to be specified, it can be included. For instance, Serbian can be written with either Latin or Cyrillic, and so Serbian in Serbia can be either sr-Latn-CS or sr-Cyrl-CS. Tigrinya in Eritrea is written ti-ER.
There are a variety of different formats for defining the rules that a particular locale has. Windows uses NLP files, a custom format that can be created with Microsoft Locale Builder. POSIX (Unix/Linux) locales can be created using localedef. Many systems these days are moving towards the Unicode Common Locale Data Registry, which specifies a standardized format for locale data as well as a comprehensive database of locales for many of the worlds languages. ICU is a library for C and Java (and used by many other environments) for manipulating Unicode text according to Unicode rules and locale data; they have a good browser for the data from the CLDR and their own locale data. For example, take a look at their entry for ti-ER.
Finally, for full support of a language, you need to translate the software itself into that language. There are, of course, many pieces of software, and each one contains many strings that need to be translated. Some software is not designed to be translated; it has not been internationalized. Some software can only be translated by whoever created it; the strings are built into the program and cannot be easily modified by a third party. But it is possible to localize some software, translating it to your language and culture. If the software has already been localized for several other languages and cultures, it is likely to be flexible enough to support a new language, and if it uses formats that are easily modifiable for localization information, it can be modified by third parties.
For instance, applications on Mac OS X store their localization data in separate files within the application bundle. There is a tool called AppleGlot (you need to register for the Mac Developer Program and go to the downloads area to find it) which can help you extract that data, provide a file with all of the strings which need to be translated, and allow you to combine that with the application again once you have. For open source software, such as much software available on Linux, you can work with the developers to provide translation. Some software uses gettext for translation strings, which use the PO file format that you can edit using poedit. Some uses Qt, for which you can use Qt Linguist. Or for dealing with a wide variety of formats, you can use a commercial offering like Swordfish or Transifex.
Of course, no one person can do all of the above; it takes many people working together to build support for a new language on modern computer systems. This is all intended to be a high-level tour of all of the components that go into language support for a given language, with references that will help you follow up on whichever aspects you would like to work on, as well as demonstrate what already works for Tigrinya and the Ge'ez script.
If they are Unicode characters they should be displayable just like characters of any other language. I googled it and found this, hopefully they're the same ones you're asking about:
የ ዩ ዪ ያ ዬ ይ ዮ
ዸ ዺ ዻ ዼ ዽ ዾ
See? No extra work required to display them on web browsers or other programs.
These are characters from the Unicode Ethiopic set (U+1200..U+137C), encoded in UTF-8:
Line 1:
የ = 0xE1 0x8B 0xA8 = U+12E8 = ETHIOPIC SYLLABLE YA
ዩ = 0xE1 0x8B 0xA9 = U+12E9 = ETHIOPIC SYLLABLE YU
ዪ = 0xE1 0x8B 0xAA = U+12EA = ETHIOPIC SYLLABLE YI
ያ = 0xE1 0x8B 0xAB = U+12EB = ETHIOPIC SYLLABLE YAA
ዬ = 0xE1 0x8B 0xAC = U+12EC = ETHIOPIC SYLLABLE YEE
ይ = 0xE1 0x8B 0xAD = U+12ED = ETHIOPIC SYLLABLE YE
ዮ = 0xE1 0x8B 0xAE = U+12EE = ETHIOPIC SYLLABLE YO
Line 2:
ዸ = 0xE1 0x8B 0xB8 = U+12F8 = ETHIOPIC SYLLABLE DDA
ዺ = 0xE1 0x8B 0xBA = U+12FA = ETHIOPIC SYLLABLE DDI
ዻ = 0xE1 0x8B 0xBB = U+12FB = ETHIOPIC SYLLABLE DDAA
ዼ = 0xE1 0x8B 0xBC = U+12FC = ETHIOPIC SYLLABLE DDEE
ዽ = 0xE1 0x8B 0xBD = U+12FD = ETHIOPIC SYLLABLE DDE
ዾ = 0xE1 0x8B 0xBE = U+12FE = ETHIOPIC SYLLABLE DDO
Using Ethiopian characters on web pages is mostly a matter of fonts these days. (You may also have a problem with entering them conveniently, but this depends on your authoring environmentPeople using e.g. Windows 7 have at least one font containing them, but old computers typically lack such fonts. The following fonts contain them (there may be others):
Code 2000, was freeware, the author has disappeared, so the status is obscure
Unifont, a free bitmap font
FreeSerif, a free font
Nyala, distributed with some versions of Windows
SunExt-A, a free font
Fixedsys Excelsior, a free bitmap font I suppose (haven’t tested)
I would probably use FreeSerif as a downloadable font, with #font-face.
Just came accross the same problem but there is a easy solution: Google provides now webfonts for many languages, also ethiopic:
http://www.google.com/fonts/earlyaccess
To write amharic or Tigrigna in web forms you can simply use Any Key firefox add on https://addons.mozilla.org/en-US/firefox/addon/any-key/ and there is for chrome too !!
But To create an editor using javascript you can see a site here http://www.lexilogos.com/keyboard/amharic.htm and try to firgure it out how they implemented it !!
You probably want to look at
http://senamirmir.org/
which unless I am wrong has done what you want to do.
If you don't like their fonts SIL Abyssinica should be fine too (but it only includes one writing style).
The layout status will vary from system to system, to target *nix like systems you need a layout merged in
http://www.freedesktop.org/wiki/Software/XKeyboardConfig/
#Samaya, by now you probably got the answer you were looking for. But let me drop what I think. Based on your original question, I think you are trying to develop a small software which can be selected as utility(as a feature) and be used to display Geez alphabets without the need of installing a separate Geez application. For that, I reckon, the utility application should be developed in a way that it could be selected as a feature (language feature) in an operating system (Like Amharic in windows for instance). However, your subsequent comments seem to focus more on displaying Geez characters on a web. As many have suggested, we already have that functionality. But if you still want to develop an application for it, I would suggest you to have unicode (U1260-በ for instance) array and matching transcription array of your choices from a keyboard ( be - በ for instance). Your application then would use the array of transcription when keyboard key are entered and match them to the unicode to show the right alphabet in Geez. Not sure if I fully understood what you're looking for but I myself with colleagues did a project that included this type of work for the particular application. By the way, do you have to install Geez software to view Tigrigna/Geez transcript based website? If so, check your version of browser.