How does one get all characters of the font with CTFontCopyCharacterSet() in Swift? ... for macOS?
The issue occured when implementing the approach from an OSX: CGGlyph to UniChar answer in Swift.
func createUnicodeFontMap() {
// Get all characters of the font with CTFontCopyCharacterSet().
let cfCharacterSet: CFCharacterSet = CTFontCopyCharacterSet(ctFont)
//
let cfCharacterSetStr = "\(cfCharacterSet)"
print("CFCharacterSet: \(cfCharacterSet)")
// Map all Unicode characters to corresponding glyphs
var unichars = [UniChar](…NYI…) // NYI: lacking unichars for CFCharacterSet
var glyphs = [CGGlyph](repeating: 0, count: unichars.count)
guard CTFontGetGlyphsForCharacters(
ctFont, // font: CTFont
&unichars, // characters: UnsafePointer<UniChar>
&glyphs, // UnsafeMutablePointer<CGGlyph>
unichars.count // count: CFIndex
)
else {
return
}
// For each Unicode character and its glyph,
// store the mapping glyph -> Unicode in a dictionary.
// ... NYI
}
What to do with CFCharacterSet to retrieve the actual characters has been elusive. Autocompletion of the cfCharacterSet instance offers show no relavant methods.
And the Core Foundation > CFCharacterSet appears have methods for creating another CFCharacterSet, but not something the provides an array|list|string of unichars to be able to create a mapped dictionary.
Note: I'm looking for a solution which is not specific to iOS as in Get all available characters from a font which uses UIFont.
CFCharacterSet is toll-free bridged with the Cocoa Foundation counterpart NSCharacterSet, and can be bridged to the corresponding Swift value type CharacterSet:
let charset = CTFontCopyCharacterSet(ctFont) as CharacterSet
Then the approach from NSArray from NSCharacterSet can be used to enumerate all Unicode scalar values of that character set (including non-BMP points, i.e. Unicode scalar values greater than U+FFFF).
The CTFontGetGlyphsForCharacters() expects non-BMP characters as surrogate pair, i.e. as an array of UTF-16 code units.
Putting it together, the function would look like this:
func createUnicodeFontMap(ctFont: CTFont) -> [CGGlyph : UnicodeScalar] {
let charset = CTFontCopyCharacterSet(ctFont) as CharacterSet
var glyphToUnicode = [CGGlyph : UnicodeScalar]() // Start with empty map.
// Enumerate all Unicode scalar values from the character set:
for plane: UInt8 in 0...16 where charset.hasMember(inPlane: plane) {
for unicode in UTF32Char(plane) << 16 ..< UTF32Char(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), charset.contains(uniChar) {
// Get glyph for this `uniChar` ...
let utf16 = Array(uniChar.utf16)
var glyphs = [CGGlyph](repeating: 0, count: utf16.count)
if CTFontGetGlyphsForCharacters(ctFont, utf16, &glyphs, utf16.count) {
// ... and add it to the map.
glyphToUnicode[glyphs[0]] = uniChar
}
}
}
}
return glyphToUnicode
}
You can do something like this.
let cs = CTFontCopyCharacterSet(font) as NSCharacterSet
let bitmapRepresentation = cs.bitmapRepresentation
The format of the bitmap is defined in the reference page for CFCharacterSetCreateWithBitmapRepresentation
Related
In a class conforming to NSLayoutManagerDelegate I implement this method:
func layoutManager(_ layoutManager: NSLayoutManager,
shouldGenerateGlyphs glyphs: UnsafePointer<CGGlyph>,
properties props: UnsafePointer<NSLayoutManager.GlyphProperty>,
characterIndexes charIndexes: UnsafePointer<Int>,
font aFont: UIFont,
forGlyphRange glyphRange: NSRange) -> Int {
// First, make sure we'll be able to access the NSTextStorage.
guard let textStorage = layoutManager.textStorage
else { return 0 }
// Get the first and last characters indexes for this glyph range,
// and from that create the characters indexes range.
let firstCharIndex = charIndexes[0]
let lastCharIndex = charIndexes[glyphRange.length - 1]
let charactersRange = NSRange(location: firstCharIndex, length: lastCharIndex - firstCharIndex + 1)
var bulletPointRanges = [NSRange]()
var hiddenRanges = [NSRange]()
let finalGlyphs = UnsafeMutablePointer<CGGlyph>(mutating: glyphs)
// Generate the Middle Dot glyph using aFont.
let middleDot: [UniChar] = [0x00B7] // Middle Dot: U+0x00B7
var myGlyphs: [CGGlyph] = [0]
// Get glyphs for `middleDot` character
guard CTFontGetGlyphsForCharacters(aFont, middleDot, &myGlyphs, middleDot.count) == true
else { fatalError("Failed to get the glyphs for characters \(middleDot).") }
}
The problem is that CTFontGetGlyphsForCharacters returns false when I type an emoji into the textview. I think it might have something to do with UTF-8 vs. UTF-16 but I'm kind of out of my depth a little here. Little help?
The font you are using does not have a glyph for that particular character.
The system maintains a list of "font fallbacks" for times when the specific font you are trying to look at does not have a glyph but another font might.
The list of fallbacks is given by CTFontCopyDefaultCascadeListForLanguages, but since you're at the point where you are being asked for the glyph from a particular font, it seems that fallback generation should be handled higher up in the chain.
You should probably return 0 to indicate that the layout manager should use it's default behavior.
As a generic solution, how can we get the unicode code point/s for a character or a string in Swift?
Consider the following:
let A: Character = "A" // "\u{0041}"
let Á: Character = "Á" // "\u{0041}\u{0301}"
let sparklingHeart = "💖" // "\u{1F496}"
let SWIFT = "SWIFT" // "\u{0053}\u{0057}\u{0049}\u{0046}\u{0054}"
If I am not mistaking, the desired function might return an array of strings, for instance:
extension Character {
func getUnicodeCodePoints() -> [String] {
//...
}
}
A.getUnicodeCodePoints()
// the output should be: ["\u{0041}"]
Á.getUnicodeCodePoints()
// the output should be: ["\u{0041}", "\u{0301}"]
sparklingHeart.getUnicodeCodePoints()
// the output should be: ["\u{1F496}"]
SWIFT.getUnicodeCodePoints()
// the output should be: ["\u{0053}", "\u{0057}", "\u{0049}", "\u{0046}", "\u{0054}"]
Any more suggested elegant approach would be appreciated.
Generally, the unicodeScalars property of a String returns a collection
of its unicode scalar values. (A Unicode scalar value is any
Unicode code point except high-surrogate and low-surrogate code points.)
Example:
print(Array("Á".unicodeScalars)) // ["A", "\u{0301}"]
print(Array("💖".unicodeScalars)) // ["\u{0001F496}"]
Up to Swift 3 there is no way to access
the unicode scalar values of a Character directly, it has to be
converted to a String first (for the Swift 4 status, see below).
If you want to see all Unicode scalar values as hexadecimal numbers
then you can access the value property (which is a UInt32 number)
and format it according to your needs.
Example (using the U+NNNN notation for Unicode values):
extension String {
func getUnicodeCodePoints() -> [String] {
return unicodeScalars.map { "U+" + String($0.value, radix: 16, uppercase: true) }
}
}
extension Character {
func getUnicodeCodePoints() -> [String] {
return String(self).getUnicodeCodePoints()
}
}
print("A".getUnicodeCodePoints()) // ["U+41"]
print("Á".getUnicodeCodePoints()) // ["U+41", "U+301"]
print("💖".getUnicodeCodePoints()) // ["U+1F496"]
print("SWIFT".getUnicodeCodePoints()) // ["U+53", "U+57", "U+49", "U+46", "U+54"]
print("🇯🇴".getUnicodeCodePoints()) // ["U+1F1EF", "U+1F1F4"]
Update for Swift 4:
As of Swift 4, the unicodeScalars of a Character can be
accessed directly,
see SE-0178 Add unicodeScalars property to Character. This makes the conversion to a String
obsolete:
let c: Character = "🇯🇴"
print(Array(c.unicodeScalars)) // ["\u{0001F1EF}", "\u{0001F1F4}"]
I'm trying to use a Swift 3 CharacterSet to filter characters out of a String but I'm getting stuck very early on. A CharacterSet has a method called contains
func contains(_ member: UnicodeScalar) -> Bool
Test for membership of a particular UnicodeScalar in the CharacterSet.
But testing this doesn't produce the expected behaviour.
let characterSet = CharacterSet.capitalizedLetters
let capitalAString = "A"
if let capitalA = capitalAString.unicodeScalars.first {
print("Capital A is \(characterSet.contains(capitalA) ? "" : "not ")in the group of capital letters")
} else {
print("Couldn't get the first element of capitalAString's unicode scalars")
}
I'm getting Capital A is not in the group of capital letters yet I'd expect the opposite.
Many thanks.
CharacterSet.capitalizedLetters
returns a character set containing the characters in Unicode General Category Lt aka "Letter, titlecase". That are
"Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)" (compare Wikipedia: Unicode character property or
Unicode® Standard Annex #44 – Table 12. General_Category Values).
You can find a list here: Unicode Characters in the 'Letter, Titlecase' Category.
You can also use the code from
NSArray from NSCharacterset to dump the contents of the character
set:
extension CharacterSet {
func allCharacters() -> [Character] {
var result: [Character] = []
for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
result.append(Character(uniChar))
}
}
}
return result
}
}
let characterSet = CharacterSet.capitalizedLetters
print(characterSet.allCharacters())
// ["Dž", "Lj", "Nj", "Dz", "ᾈ", "ᾉ", "ᾊ", "ᾋ", "ᾌ", "ᾍ", "ᾎ", "ᾏ", "ᾘ", "ᾙ", "ᾚ", "ᾛ", "ᾜ", "ᾝ", "ᾞ", "ᾟ", "ᾨ", "ᾩ", "ᾪ", "ᾫ", "ᾬ", "ᾭ", "ᾮ", "ᾯ", "ᾼ", "ῌ", "ῼ"]
What you probably want is CharacterSet.uppercaseLetters which
Returns a character set containing the characters in Unicode General Category Lu and Lt.
This is working well for English:
public static func posOf(needle: String, haystack: String) -> Int {
return haystack.distance(from: haystack.startIndex, to: (haystack.range(of: needle)?.lowerBound)!)
}
But for foreign characters the returned value is always too small. For example "का" is considered one unit instead of 2.
posOf(needle: "काम", haystack: "वह बीना की खुली कोयला खदान में काम करता था।") // 21
I later use the 21 in NSRange(location:length:) where it needs to be 28 to make NSRange work properly.
A Swift String is a collection of Characters, and each Character
represents an "extended Unicode grapheme cluster".
NSString is a collection of UTF-16 code units.
Example:
print("का".characters.count) // 1
print(("का" as NSString).length) // 2
Swift String ranges are represented as Range<String.Index>,
and NSString ranges are represented as NSRange.
Your function counts the number of Characters from the start
of the haystack to the start of the needle, and that is different
from the number of UTF-16 code points.
If you need a "NSRange compatible"
character count then the easiest method would be use the
range(of:) method of NSString:
let haystack = "वह बीना की खुली कोयला खदान में काम करता था।"
let needle = "काम"
if let range = haystack.range(of: needle) {
let pos = haystack.distance(from: haystack.startIndex, to: range.lowerBound)
print(pos) // 21
}
let nsRange = (haystack as NSString).range(of: needle)
if nsRange.location != NSNotFound {
print(nsRange.location) // 31
}
Alternatively, use the utf16 view of the Swift string to
count UTF-16 code units:
if let range = haystack.range(of: needle) {
let lower16 = range.lowerBound.samePosition(in: haystack.utf16)
let pos = haystack.utf16.distance(from: haystack.utf16.startIndex, to: lower16)
print(pos) // 31
}
(See for example
NSRange to Range<String.Index> for more methods to convert between Range<String.Index>
and NSRange).
In Obj-C I can successfully append bytes enclosed inside two quotation marks like so:
[commands appendBytes:"\x1b\x61\x01"
length:sizeof("\x1b\x61\x01") - 1];
In Swift I supposed I would do something like:
commands.appendBytes("\x1b\x61\x01", length: sizeof("\x1b\x61\x01") - 1)
But this throws the error "invalid escape sequence in literal", how do I escape bytes in Swift?
As already said, in Swift a string stores Unicode characters, and not – as in (Objective-)C – an arbitrary (NUL-terminated) sequence of char, which is a signed
or unsigned byte on most platforms.
Now theoretically you can retrieve a C string from a Swift string:
let commands = NSMutableData()
let cmd = "\u{1b}\u{61}\u{01}"
cmd.withCString {
commands.appendBytes($0, length: 3)
}
println(commands) // <1b6101>
But this produces not the expected result for all non-ASCII characters:
let commands = NSMutableData()
let cmd = "\u{1b}\u{c4}\u{01}"
cmd.withCString {
commands.appendBytes($0, length: 3)
}
println(commands) // <1bc384>
Here \u{c4} is "Ä" which has the UTF-8 representation C3 84.
A Swift string cannot represent an arbitrary sequence of bytes.
Therefore you better work with an UInt8 array for (binary) control sequences:
let commands = NSMutableData()
let cmd : [UInt8] = [ 0x1b, 0x61, 0xc4, 0x01 ]
commands.appendBytes(cmd, length: cmd.count)
println(commands) // <1b61c401>
For text you have to know which encoding the printer expects.
As an example, NSISOLatin1StringEncoding is the ISO-8859-1 encoding, which is intended for "Western European" languages:
let text = "123Ö\n"
if let data = text.dataUsingEncoding(NSISOLatin1StringEncoding) {
commands.appendData(data)
println(commands) // 313233d6 0a>
} else {
println("conversion failed")
}
Unicode characters in Swift are entered differently - you need to add curly braces around the hex number:
"\u{1b}\u{61}\u{01}"
To avoid duplicating the literal, define a constant for it:
let toAppend = "\u{1b}\u{61}\u{01}"
commands.appendBytes(toAppend, length: toAppend.length - 1)