Errors using ps2ascii on some files - operating-system

What does FC_WEIGHT refer to? Please advise: Although a text file was produced it is large and consists largely of numbers which makes it hard to proofread. I need relatively good confidence the output matches the input. If there is a fix please point me to it and bring joy to my dull drab existence.
entered the command
ps2ascii /Users/dwstclair/Desktop/untitled3/stmt_20181130.pdf a.txt
The result was:
DEBUG: FC_WEIGHT didn't match
On the off chance a default font was missing on my system
I added DroidSansFallback.ttf (no joy)

Basically, I wouldn't use ps2ascii. Its long been deprecated and doesn't even ship in more recent versions of Ghostscript.
Instead consider using the txtwrite device. It works with a wider range of input (in particular it can use ToUnicode CMaps in PDF files, which ps2ascii cannot) and is capable of producing output in other than ASCII, which is quite useful. Even if you aren't working with non-Latin languages, the ability to preserve ligatures (eg fi, ffi, ffl etc) is convenient.
The actual answer to your question is 'don't worry about it'.
FC_WEIGHT refers to the weight of a font (light, bold, regular, ExtraBold etc). This message can only arise when you are using FontConfig, and Ghostscript is enumerating the available fonts from font config, trying to find a match for a missing font in the input. This means that a candidate font did not match the target font's weight.
Since you aren't going to use the font, it doesn't affect you.

Related

Swift: Unicode transformations: How to generate a rainbow infinity symbol

In xcode, developing for iOS "\u{1F3F3}\u{FE0F}\u{200D}\u{1F308}" is a rainbow flag.
"\u{1F3F3}" is a white flag, and "\u{1F308}" is a rainbow. The middle symbols "\u{FE0F}\u{200D}" are invisible symbols used to join these two together to make the rainbow flag symbol.
I am trying to combine unicode characters to make a rainbow infinity symbol, but not exactly sure how to implement this.
Not sure if there is an already existing unicode character or apple api I can use to do this, but would appreciate learning how to do this
I wouldn't mind having an infinity symbol over the rainbow flag either (like the apple anti-lgbt flag incident) as an alternative.
Emoji fonts are still just fonts. If they don’t contain a specific glyph, then they cannot display that glyph. The reason “🏳️‍🌈” looks like a rainbow flag is because someone drew a picture of a rainbow flag and then defined their font in such a way that the sequence <U+1F3F3, U+FE0F, U+200D, U+1F308> would be displayed using that specific image. Much like how someone first had to define the precise shape of the letter “A” in their font and then apply that glyph to the codepoint U+0041.
There is no image-rendering code that instinctively knows how to apply the colours of 🌈 to the shape of 🏳️ and then automatically generates a new glyph on the fly. It’s all explicitly pre-defined.
U+200D is the so-called Zero Width Joiner (ZWJ), so emoji sequences using that character are appropriately named Zero Width Joiner Sequences. They were originally invented by Apple to support emoji that weren’t part of the Unicode standard (in particular, variants of 💏, 💑, and 👪️ with different gender configurations), but later other vendors jumped on board as well and nowadays they are officially part of Unicode as an alternative way for defining new emoji without having to encode entirely new characters. Currently, about a third of all officially recommended emoji are ZWJ sequences.
In theory, any person can make up their own ZWJ sequences just by joining existing characters together (as was their original intent). In your case, “♾️+ZWJ+🌈” or <U+267E, U+FE0F, U+200D, U+1F308> would be an obvious sequence for a rainbow-coloured infinity symbol. You just have to create your own font containing the glyph you want, and then distribute that font to other people so that they can see the same glyph as you. There are just a few problems:
Making fonts with colourful glyphs is not easy. I couldn’t tell you whether there even exist freely available tools for that task.
There are four different formats for emoji fonts (used by Apple, Google, Microsoft, and Mozilla respectively) and they generally do not work on each other’s platforms, so you would need to create not just one, but several fonts unless you don’t care about people on other operating systems.
Installing your own fonts is not possible on most mobile phones, so your custom emoji would mostly only be available to desktop users.

How UTF8/Unicode adapt to new writing systems?

An example to clarify my question:
The Hongkongers' native language is Cantonese, however, we all write in a different language: Madarin Chinese. Two languages are kindof similar, and Hongkongers are educated to write in Madarin Chinese language.
Cantonese doesn't have a writing system. Though we are still happy with Madarin as our writing language, however, in case one day Hongkongers decided to develop a 'Cantonese script' which contains not-yet-existing characters, how should UTF8/Unicode/fonts change, to adapt these new characters?
I mean, who will change the UTF8/Unicode/fonts standard? How exactly Linux/Windows OS have to be modified, in order to display these newly created characters?
(The example is just to make my question clear. We're not talking about politics ;D )
The Unicode coding space has over 1,000,000 code points, and only about 10% of them have been allocated, so there is a lot of room for new characters (even though some areas of the coding space have been set apart for use other than added characters). The Unicode Consortium, working in close cooperation with the relevant body at ISO, assigns code points to new characters on the basis of proposals that demonstrate actual usage or, in some cases, plans with a solid basis and widespread support.
Thus, if a new script were designed and there was a large community that would seriously use it, it would be added, with its characters, into Unicode after due proposals and discussion.
It would then be up to font manufacturers to add glyphs for such characters. This might take a long time, but if there is strong enough need, new fonts and enhancements to existing fonts would emerge.
No change to UTF-8 or other Unicode transfer encodings would be needed. They already encode the entire coding space, whether code points are assigned to characters or not.
Rendering software would need no modifications, unless there are some specialties in the writing system. Normal characters would be rendered just fine, as soon as suitable fonts are available.
However, if the characters added were outside the Basic Multilingual Plane (BMP), the “16-bit subset of Unicode”, both rendering and processing (and input) would be problematic. Many programming languages and programs effectively treat Unicode as if it were a 16-bit code and run into problems (possibly solvable, but still) when characters outside the BMP are used. If the writing system had, say, 10,000 characters, it is quite possible that it would have to allocated outside the BMP.
The Unicode committee adds new characters as they see fit. Then fonts add support for the new characters. Operating systems should not require changes simply to display the new characters. Typing the characters would generally require updates or plug-ins to an operating system's input methods.

How can I substitute one glyph for another in an OpenType PostScript OTF font file?

I'm trying to use fonts from the Nitti Basic family for programming. These fonts are packaged as OpenType PostScript OTF files.
Its U+002D (HYPHEN-MINUS) glyph works well as a hyphen, but not so well as a minus.
For example, it doesn't line up with the horizontal bar of the plus sign.
On the other hand, Nitti's glyph for U+2212 (MINUS) is perfect as a minus (of course), and this is what I need when programming. It's not feasible for me to actually use codepoint U+2212; after all, U+002D is what you get when you press the minus sign on the keyboard and it's what programming languages use for subtraction.
So instead I'd like to steal the glyph from U+2212 and use it for U+002D, so that that character looks like a minus sign.
How can I do it?
Update: Yes, it is possible to use U+002D as a hyphen in source code.
As mentioned above, a minus sign is what I need.
I agree with Jukka, there are tools to do this.
However, please don't forget that a font is usually protected by very similar contracts as software. In this case the link you provided for example points to a legal document that reads (amongst much other):
"Except as permitted herein, you may not rename, modify, adapt,
translate, reverse engineer, decompile, disassemble, alter or
otherwise copy the Bold Monday Font Software."
Notice the fact that you're not permitted legally to change this font. If you read the rest of the agreement you'll see a lot of restrictions on the actual use of the font as well. Make sure you're not breaking your license by what you are doing...
For posterity, here's how to do it:
Obtain Adobe's AFDKO font tools and install them.
Put the OTF files into an empty directory.
Run ttx *.otf to convert the OTF files to TTX (XML).
Edit each TTX file in a text editor:
In the cmap section, change occurrences of hyphen to minus. This table maps characters to glyphs. Character U+002D was originally mapped to the hyphen glyph; this change maps it to the minus glyph.
Over the whole file, change ocurrences of NittiBasic to NittiBasicM and Nitti Basic to Nitti Basic M. This will distinguish the modified version of the font from the original once it's installed.
Rename the TTX files, replacing Nitti Basic with Nitti Basic M.
Run ttx -b *.ttx to convert the TTX files back to OTF.
Finally, install the newly-created OTF files.
Tools like FontForge can be used to edit a font in a simple manner.
Note that in programming, too, HYPHEN-MINUS has multiple uses: as a minus sign, but also (in some languages) as allowed in identifiers, as well as in comments, where it usually appears in the role of hyphen. In some uses, a HYPHEN glyph will look odd.

How to prevent line breaks with jasper-reports HTML export when using textfield truncation?

Using iReport 4.5.0, I'm setting these two properties and values:
net.sf.jasperreports.text.truncate.at.char=true
net.sf.jasperreports.text.truncate.suffix=...
The intent is to add "..." to the end of textfields whenever they must be truncated, and that the truncation determination happens at the character level, rather than at the word level. This works as expected when exporting to PDF. However, when exporting to HTML, the last truncated token (with the suffix appended) will often, though not always, wrap incorrectly. (It does this even though StretchType is set to No Stretch.) Example:
If I change net.sf.jasperreports.text.truncate.at.char=false (so that it breaks on words instead of characters) it seems to work more often, but only because word breaks usually leave more space for the suffix. The unexpected line wrapping still occurs with word breaks, especially if I increase the length of the given suffix.
My best guess is that the HTML exporter measurement isn't precisely calculating the width required by the given suffix (if it's calculating it at all).
Can anyone confirm?
Any suggestions as to a workaround?
It seems like with StretchType set to No Stretch, that the HTML exporter should probably also set white-space:nowrap. However, although that would prevent the line from wrapping, the end of the suffix would be partially hidden (due to overflow:hidden styling).
"My best guess is that the HTML exporter measurement isn't precisely calculating the width required by the given suffix (if it's calculating it at all)."
I confirm that this is surely the reason.
But there's not really a simple workaround. Your PDF is good, so you're doing something right. Well... you're doing lots of things right. ;-)
In HTML you don't know--in a very fundamental way--the precise details of the font that will render the text. You can certainly specify the font. But the client machine might not have it. Or it might have one that is the same... but not quite the same. Or the client might choose to use a different font or different size via various client-side override mechanisms.
If you try different fonts, you should notice slightly different results. You may be able to find one that works better more often. (Clearly, this isn't 100% perfect.)
If you aren't using Font Extensions, then you should. If you are using Font Extensions, then you can specify the list of fonts in descending preference that ought to be used in the HTML. This should give you enough control to get behavior that is good in a large number of cases. Often you can make it perfect in all of the cases that you care about.

Where to get a reference image for any unicode code point?

I am looking for an online service (or collection of images) that can return an image for any unicode code point.
Unicode.org does not have an image for each one, consider for example
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=31cf
EDIT: I need to use these images programmatically, so the code chart PDFs provided at unicode.org are not useful.
The images in the PDF are copyrighted, so there are legal issues around extracting them. (I am not a lawyer.) I suspect that those legal issues prevent a simple solution from being provided, unless someone wants to go to the trouble of drawing all of those images. It might happen, but seems unlikely.
Your best bet is to download a selection of fonts that collectively cover the entire range of characters, and display the characters using those fonts. There are two difficulties with this approach: combining characters and invisible characters.
The combining characters can easily be detected from the Unicode database, and you can supply a base character (such as NBSP) to use for displaying them. (There is a special code point intended for this purpose, but I can't find it at the moment.)
Invisible characters could be displayed with a dotted square box containing the abbreviation for the character. Those you may have to locate manually and construct the necessary abbreviations. I am not aware of any shortcuts for that.