I am using Tesseract 4.0.0-beta.1-370-g8b64 on Ubuntu 16.04 by building it from source. I've got a directory of font files, and it seems from the documentation for fonts that you need to list the custom fonts in training/language_specific.sh and langdata/font_properties. Also it seems that fonts are listed in font_properties in some particular format, however I can't find the format anywhere. Is there any link or instruction asking how to do it?
It's described in Tesseract Training Wiki:
https://github.com/tesseract-ocr/tessdoc/blob/master/tess3/Training-Tesseract-3.03%E2%80%933.05.md#the-font_properties-file
Each line of the font_properties file is formatted as follows: fontname italic bold fixed serif fraktur
where fontname is a string naming the font (no spaces allowed!), and italic, bold, fixed, serif and fraktur are all simple 0 or 1 flags indicating whether the font has the named property.
Example:
timesitalic 1 0 0 1 0
Related
I wouldn't believe I have been stuck on this for one hour, but it seems the fonts for extended unicode characters are not easyly available as TTF / OTF for use on computers, especially with graphic software where unicode fallback doesn't work
especifically I looking for the so called Math bold script
somehting like : ๐๐ฎ๐ถ๐ธ ๐ฏ๐ธ๐ท๐ฝ ๐๐๐๐๐ ๐ฎ๐ป๐ฝ๐ท๐ญ (<- those are extended chars)
as in https://textfancy.com/font-converter/
as imagen at: https://snipboard.io/fNYd7w.jpg
(becouse I am not sure we all see the same glyphs)
Note: what I am looking for, is a standrd TTF font, which normal glyphs are equal to those extended glyphs, meaning that the A looks like the ๐, B like ๐, and so on. So I could use the font as normal font in every software.
The STIX math fonts support the Unicode Mathematical Alphanumeric Symbols block.
https://www.stixfonts.org/
https://github.com/stipub/stixfonts
(Note: the variable fonts don't include support for that block of characters; only the static fonts do.)
Please note the intended use of those Unicode characters, as pointed out in the STIX project:
The sans serif, fraktur, script, etc., alphabets in Plane 1 (U+1D400-U+1D4FF) are intended to be used only as technical symbols.
I want to create a website readable not only in english but i have problems with special characters. I've tried ascii html.
Any idea?
If You have troble with the text component there are three ways I can think of:
1) The proper way: find or generate a font from a fontset containing those characters. The docs describe how to use custom fonts:
<a-entity text="text: Hello World; font: ../fonts/CustomFnt.fnt;
fontImage: ../fonts/CustomFnt.png"></a-entity>
But you need to have a font file + a .png grid with the font images.
The docs provide a link to a tool for generating fonts, as well as a tutorial.
2) check out Don McCurdy's custom font generator !
3) The workaround: You could make a transparent image containing Your text and put it on an <a-plane>, like I did here.
I'm writing a tcl/tk application, where i would like to use font-awesome icons.
In principal, this works nicely: just display some string/label with the correct unicode char, and if the proper fonts are installed, it will render)
Now, on my dev machine i have font-awesome installed as an ordinary machine.
I cannot expect that on the deployment machines.
So I would like to find out, whether the system can render a given character, or whether it just uses a glyph-not-found placeholder. In the latter case, i would just fallback to some less-nice representation...
(I don't want my users to have to answer a question like "does this string look correct?")
E.g. the symbol "๏
ฌ" (๏
ฌ) displays as the stackoverflow-icon in my application. in my browser it is rendered as a glyph-not-found.
So, is there a way to find out programmatically, if any of the (used) system fonts provides the glyph for a given character?
Unfortunately, there isn't; it's outright missing functionality. The closest you can get is to get the actual font info for a character โ requires 8.6 I think โ or to measure its width, but that doesn't really help:
% font actual TkFixedFont
-family Monaco -size 11 -weight normal -slant roman -underline 0 -overstrike 0
% font actual TkFixedFont \uf16c
-family Monaco -size 11 -weight normal -slant roman -underline 0 -overstrike 0
% font measure TkFixedFont \uf16c
14
(The character renders as the glyph-not-found symbol on this system with that font.)
xmgrace is wonderful, but it has some problems when dealing with miscellaneous characters.
How can I make the script small l ($\ell$ in latex) in xmgrace?
I believe the only way to do this is to specify a script-like system font. None of the standard ones are suitable so you will have to make sure that a suitable font is installed on your system.
You can change to any font by enclosing the name in
\f{}
e.g.
\f{Symbol}
or
\f{Century-Schoolbook-L-Bold_italic}
You can see a list of the available fonts (and their labels) by going to the Font tool in the Window menu of the xmgrace GUI.
After typing the special character you can return to your original font in a similar way, or by using \0 to get back to the default font 0.
This is wrt iText 2.1.6.
I have a string containing characters from different languages, for which I'd like to pick a single font (among the registered fonts) that has glyphs for all these characters. I would like to avoid a situation where different substrings in the string are printed using different fonts, if I already have one font that can display all these glyphs.
If there's no such single font, I would still like to pick a minimal set of fonts that covers the characters in my string.
I'm aware of FontSelector, but it doesn't seem to try to find a minimal set of fonts for the given text. Correct? How do I do this?
iText 2.1.6 is obsolete. Please stop using it: http://itextpdf.com/salesfaq
I see two questions in one:
Is there a font that contains all characters for all languages?
Allow me to explain why this is impossible:
There are 1,114,112 code points in Unicode. Not all of these code points are used, but the possible number of different glyphs is huge.
A simple font only contains 256 characters (1 byte per font), a composite font uses CIDs from 0 to 65,535.
65,535 is much smaller that 1,114,112, which means that it is technically impossible to have a single font that contains all possible glyphs.
FontSelector doesn't find a minimal set of fonts!
FontSelector doesn't look for a minimal set of fonts. You have to tell FontSelector which fonts you want to use and in which order! Suppose that you have this code:
FontSelector selector = new FontSelector();
selector.addFont(font1);
selector.addFont(font2);
selector.addFont(font3);
In this case, FontSelector will first look at font1 for each specific glyph. If it's not there, it will look at font2, etc... Obviously font1, font2 and font3 will have different glyphs for the same character in common. For instance: a, a and a. Which glyph will be used depends on the order in which you added the font.
Bottom line:
Select a wide range of fonts that cover all the glyphs you need and add them to a FontSelector instance. Don't expect to find one single font that contains all the glyphs you need.