how to remove characters from a font file?

how to remove characters from a font file? - unicode

i've downloaded the DejaVu open source font and want to use it ad a WebFont, but even when converting it, i get a large file, and because the website i'll use will be only in few languages (arabic, french, amazigh) then, i dont need some characters.
so is there a way to browse the font file and delete the unnecessary range of unicode characters that i'll not need?

Using FontForge, you may open Element->Font Info->Unicode Ranges. You will see all available ranges and you can select a whole Unicode range with a single click. Then, you can tune your selection and delete using Encoding->Detach & Remove Glyphs.
Also, you can use Edit->Select->Select by Script.

The easiest method I found is to use pyftsubset tool from FontTools. Here's an example:
$ pyftsubset NotoSans-Regular.ttf \
--unicodes=U+0400-045F,U+0490-0491,U+04B0-04B1,U+2116 \
--output-file=NotoSans-Regular.cyrillic.woff2 \
--flavor=woff2
Note: woff2 output requires Brotli.
I wrote a simple script around it which automates the whole process including generation of a CSS file after splitting the font file. You may find it here: https://github.com/johncf/ttf2web

Related

How can a MATLAB program test whether MATLAB can render a particular font?

I would like to use some special characters in a MATLAB figure. How can my program ensure the fonts are available before using them?
listfonts() is not reliable. It claims "Zapf Dingbats" is available on my machine, but it is not (and text() renders using a default font instead). listfonts() always includes the standard PostScript fonts. I suppose that's because they are always available for PostScript output, but I'm interested in displayed figures. Likewise uisetfont() and MATLAB -> Preferences -> Fonts -> Custom list "Zapf Dingbats", but render the sample using a default font.
Just looking for the font file doesn't work, either. For example, "Webdings" works fine on my main machine. However, on a second machine, "Webdings" is installed (there's a file /Library/Fonts/Webdings.ttf, and Word can use it), but MATLAB substitutes a default font.
I thought of one test: Create a small figure with one marker symbol, use print() to write it to a .png file, read in the file as data, compute a hash, and compare that hash with a stored value. Is there a less clumsy method?
I found Unicode equivalents for most of the symbols I need, that work for both of my test machines. However, they too apparently depend on my having the right fonts installed. For example, there are many Unicode versions of a square. Hex codes 2588, 25a0, and 25fc work here, but 25fe, 2b1b, and 2bc0 are rendered as blank. Is there a way to tell whether these characters are available?
I'm running R2017b under macOS version 10.13.5, and "set | grep LANG" displays "LANG=en_US.UTF-8".

Cleaning up text files with sed?

I have a bunch of text files that need cleaning up. Example
`E..4B?#.#...
..9J5.....P0.z.n9.9.. ........
.k#a..5
E...y^#.r...J5..
E...y_#.r...J5..
..9.P..n9..0.z............
….2..3..9…n7…..#.yr`
Is there any way sed can do this? Like notice weird patterns?

For this answer, I will assume that you have access to standard unix/linux tools.
Your file might be in some word-processor format. If so, the best way to get rid of the junk is to open it with that program. You may be able to find out which with file:
$ file mysteryfile
mysteryfile: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1 ....
If that doesn't work, there is a standard unix utility for extracting text from binary files. It is called strings:
$ strings mysteryfile
Some
Recovered Text
...
The behavior of strings can be fine tuned with several options. See man strings.

How to find parameters supported in Tesseract OCR config file

I want to know what parameters the config file used by Tesseract OCR accepts, how to write a config file, etc.
I can't find any documentation about this on their site. How can I determine what parameters are supported, and what they mean?

I found these instructions in the link below. They are about writing the config file and where to place it:
config file is simple text file without BOM and with Unix end-of-line mark (on Windows you can use some advanced text editor e.g. Notepad++ to achieve this).
If you use tesseract executable this is only way how to change tesseract parameters.
config file should be located in your tessdata/configs directory. Have a look there for some examples.
There is a list of all the variables plus descriptions of each one in http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version. Note it's for Tesseract 3.02, things may be different in other versions.
Edit: Also adding a pastebin link in case the above link becomes dead.

Tesseract v3.04 now offers the command line option --print-parameters, so you can call tesseract --print-parameters to get a list of the 678 (!) configurable parameters, their default values, and a short description:
Tesseract parameters:
editor_image_xpos 590 Editor image X Pos
editor_image_ypos 10 Editor image Y Pos
editor_image_menuheight 50 Add to image height for menu bar
editor_image_word_bb_color 7 Word bounding box colour
editor_image_blob_bb_color 4 Blob bounding box colour
editor_image_text_color 2 Correct text colour
...and many, many more

It's just a plain text file containing space-delimited key/value pairs for Tesseract config variables, each on separate line; for instance:
interactive_display_mode T
tessedit_display_outwords T
There are several standard config files -- such as digits, hocr -- under Tesseract tessdata/configs folder.

Create Numbers file and open it with Numbers on iPad

I would like to do a task that is quite simple on other OS, but it is not so trivial on iOS. Namely, I want to create file and open it in Numbers.
I can preview the file with UIDocumentInteractionController and then offer it to user that he/she opens it.
THis seems to me quite a reasonable solution. However, I need to offer proper file format. I suppose CSV and XLS would be reasonable to implement and it would most probably work, but I would still like to do it in native Numbers format if possible. However, I can't find any info about this file format.
Basically, this task is about exporting data to another app and then working further with them.

I don't know of a library that can create native Numbers files. There are hoewever some libraries that allow creating XLS files. Since Numbers fully supports XLS, this is probably the way to go.
There is a comercial library available that might work on the iPhone (costs $200): http://www.libxl.com/
As for free XLS libraries, I only know xlwt, a Python module. You could set up a webservice that creates an XLS file for your app, using xlwt on the server side.

If you want to pass information to Numbers, you can probably also use CSV files. If you use CSV files, you must be aware of some things. There are two kinds of CSV files: the comma separated version (used in english speaking countries) and the semicolon separated (used in continental europe).
The comma separated CSV files look for example like this:
"ID","First Name","Last Name","Salary"
1,"John","Malkovich",3400.20
2,"Fred","Astaire",2000.60
The second kind of CSV files are semicolon separated and use a comma as decimal mark. They look like this:
"ID";"First Name";"Last Name";"Salary"
1;"John";"Malkovich";3400,20
2;"Fred";"Astaire";2000,60
On the Macintosh, Numbers expects a different format depending on the Region setting. If you have your Region set to the US, it will expect the first kind. If you choose Germany, it will expect the second kind.
I don't know what kind of files Numbers on the iPad expects.
Another alternative would be using copy and paste. Try to copy tab separated text into the clipboard.

I hope this may help you. I've contacted libxl team and they responded with the link to the demo version of their iPhone library: http://www.libxl.com/download/libxl-iphone.zip

How can I search and replace in a PDF document using Perl?

Does anyone know of a free Perl program (command line preferable), module, or anyway to search and replace text in a PDF file without using it like an editor.
Basically I want to write a program (in Perl preferably) to automate replacing certain words (e.g. our old address) in a few hundred PDF files. I could use any program that supports command line arguments. I know there are many modules on CPAN that manipulate or create pdfs but they don't have (that I've seen) any sort of simple search and replace.
Thanks in advance for any and all advice!!!

Take a look at CAM::PDF. More specifically the changeString method.

How did you generate those PDFs in the first place? Search-and-replace in the original sources and re-generate PDFs seems to be more viable. Direct editing PDFs can be very difficult, and I'm not aware of any free tools that can do it easily.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse