greek char functions (for fun) - swift

why can I write (in swift)
func β(a: Double, b: Double) -> Double { exp( lgamma(a) + lgamma(b) - lgamma(a + b) ) }
or
func Γ(_ x: Double) -> Double { tgamma(x) }
but not
func √(_ x: Double) -> Double { return sqrt(x) }

See Identifiers in the Swift Language Reference:
Identifiers begin with an uppercase or lowercase letter A through Z, an underscore (_), a noncombining alphanumeric Unicode character in the Basic Multilingual Plane, or a character outside the Basic Multilingual Plane that isn’t in a Private Use Area. After the first character, digits and combining Unicode characters are also allowed.
β and Γ are each a "noncombining alphanumeric Unicode character in the Basic Multilingual Plane." √ is not (nor does it meet any of the other requirements).
That said, √ is a valid operator, so you can write:
prefix operator √
prefix func √(_ x: Double) -> Double { return sqrt(x) }
print(√2)
The basic rules for Operators (from the document above) are:
Custom operators can begin with one of the ASCII characters /, =, -, +, !, *, %, <, >, &, |, ^, ?, or ~, or one of the Unicode characters defined in the grammar below (which include characters from the Mathematical Operators, Miscellaneous Symbols, and Dingbats Unicode blocks, among others). After the first character, combining Unicode characters are also allowed.
√ is included in "Mathematical Operators."

The square root character appears to be a valid operation identifier in Swift.
Character
Unicode Value
Unicode Name
√
221A
SQUARE ROOT
Have you checked the last method declaration without the return keyword?
func √(_ x: Double) -> Double { sqrt(x) }

Related

Swift custom operators with Unicode combining characters

TL;DR
Can I coax the compiler to accept a combining character as a postfix operator?
The references at Swift.org and GitHub and this useful gist suggest that combining characters (e.g. U+0300 ff.) may serve as operators in Swift.
With judicious implementation (omitted here) I can say “Fiat Lux” and there is
prefix operator ‖ // Find the norm.
postfix operator ‖ // Does nothing.
func / // Scalar division.
which allows
let vHat = v / ‖v‖ // Readable as math.
or even
let v̂ = v / ‖v‖ // Loving it.
The OCD in me wants now to use the combining circumflex as a (topfix) operator like this:
let normalizedV = v̂ // Combining char is really a postfix.
So I leap in and try to write:
postfix operator ^ // Want this to be *combining* circumflex.
postfix func ^(v: Vector) -> Vector { v / ‖v‖ }
and can do it with plain old U+005E circumflex, but get (various) compiler errors when I try with the combining circumflex U+0302.
An operator name (or any other identifier) cannot start with the U+0302 character. Like all combining marks, it is an allowed “operator-character” but not an allowed “operator-head”. From Lexical Structure > Operators in “The Swift Programming Language”:
GRAMMAR OF OPERATORS
operator → operator-head operator-charactersopt
...
operator-character → U+0300–U+036F

Is there an equation for a function isSpaceChar in Java on Swift Character Class?

Is there an equation for a function isSpaceChar like in Java on the class Character stdlib in Swift?
In Java this function is to get true or false from a value of AsciiValue character.
For example a character space " " the AsciiValue is 32.
Unicode properties for Character and UnicodeScalar were introduced with Swift 5, see
SE-0211 Add Unicode Properties to Unicode.Scalar and
SE-0221 Character Properties
In particular, Character.isWhiteSpace respectively Unicode.Scalar.isWhiteSpace is
A Boolean value indicating whether this character represents whitespace, including newlines.
Example for characters:
let char: Character = " "
if char.isWhitespace {
// ...
}
Example for Unicode scalar values:
let value = 32
if let uc = UnicodeScalar(value), uc.properties.isWhitespace {
// ...
}

Swift string indexing combines "\r\n" as one char instead of two

I am dealing with strings containing \r\n with Swift 4.2. I ran into kind of strange behavior of Swift index, it appears \r\n will be treated as one character instead of two by Swift indexing methods. I wrote a piece of code to present this behavior:
var text = "ABC\r\n\r\nDEF"
func printChar(_ lower: Int, _ upper: Int) {
let start = text.index(text.startIndex, offsetBy: lower)
let end = text.index(text.startIndex, offsetBy: upper)
print("\"" + text[start..<end] + "\"")
}
printChar(0, 1) // "A"
printChar(1, 2) // "B"
printChar(2, 3) // "C"
printChar(3, 4) // new line
printChar(4, 5) // new line (okay, what's going on here?)
printChar(5, 6) // "D"
printChar(6, 7) // "E"
printChar(7, 8) // "F"
The print result will be
"A"
"B"
"C"
"
"
"
"
"D"
"E"
"F"
Any idea why it's like this?
TLDR: \r\n is a grapheme cluster and is treated as a single Character in Swift because Unicode.
Swift treats \r\n as one Character.
Objective-C NSString treats it as two characters (in terms of the result from length).
On the swift-users forum someone wrote:
– "\r\n" is a single Character. Is this the correct behaviour?
– Yes, a Character corresponds to a Unicode grapheme cluster, and "\r\n" is considered a single grapheme cluster.
And the subsequent response posted a link to Unicode documentation, check out this table which officially states CRLF is a grapheme cluster.
Take a look at the Apple documentation on Characters and Grapheme Clusters.
It's common to think of a string as a sequence of characters, but when working with NSString objects, or with Unicode strings in general, in most cases it is better to deal with substrings rather than with individual characters. The reason for this is that what the user perceives as a character in text may in many cases be represented by multiple characters in the string.
The Swift documentation on Strings and Characters is also worth reading.
This overview from objc.io is interesting as well.
NSString represents UTF-16-encoded text. Length, indices, and ranges are all based on UTF-16 code units.
Another example of this is an emoji like 👍🏻. This single character is actually %uD83D%uDC4D%uD83C%uDFFB, four different unicode scalars. But if you called count on a string with just that emoji you'd (correctly) get 1.
If you wanted to see the scalars you could iterate them as follows:
for scalar in text.unicodeScalars {
print("\(scalar.value) ", terminator: "")
}
Which for "\r\n" would give you 13 10
In the Swift documentation you'll find why NSString is different:
The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.
Thus this isn't really "strange" behaviour of Swift string indexing, but rather a result of how Unicode treats these characters and how String in Swift is designed. Swift string indexing goes by Character and \r\n is a single Character.

Why is there space at end of method names ending with an operator?

I've been learning Scala recently, and learned that for method names, if the method name ends in an operator symbol (such as defining unary_- for a class), and we specify the return type, we need a space between the final character of the method and the : which let's us specify the return type.
def unary_-: Rational = new Rational(-numer, denom)
The reasoning I have heard for this is that : is also a legal part of an identifier, so we need a way of separating the identifier and the end of the method name. But letters are legal parts of identifiers too, so why don't we need a space if we just have a method name that is all letters?
To quote the language spec (p. 12) or html:
First, an identifier can start with a letter
which can be followed by an arbitrary sequence of letters and digits. This may be
followed by underscore ‘_’ characters and another string composed of either letters
and digits or of operator characters
That is, to include operator characters into identifiers, they must be joined with an underscore.
Looking at def unary_-: Rational = new Rational(-numer, denom), with the underscore joining unary with -:, the colon is interpreted as part of the method name if there is no space. Therefore, with the colon being part of the method name, it can't find the colon precedes the return type.
scala> def test_-: Int = 1 // the method name is `test_-:`
<console>:1: error: '=' expected but identifier found.
scala> def test_- : Int = 1 // now the method name is `test_-`, and this is okay.
test_$minus: Int
If you want the colon to be part of the method name, it would have to look like this:
scala> def test_-: : Int = 1
test_$minus$colon: Int
Method names with just letters will not have this problem, because the colon isn't absorbed into the name following an underscore.

How can I get the Unicode code point(s) of a Character?

How can I extract the Unicode code point(s) of a given Character without first converting it to a String? I know that I can use the following:
let ch: Character = "A"
let s = String(ch).unicodeScalars
s[s.startIndex].value // returns 65
but it seems like there should be a more direct way to accomplish this using just Swift's standard library. The Language Guide sections "Working with Characters" and "Unicode" only discuss iterating through the characters in a String, not working directly with Characters.
From what I can gather in the documentation, they want you to get Character values from a String because it gives context. Is this Character encoded with UTF8, UTF16, or 21-bit code points (scalars)?
If you look at how a Character is defined in the Swift framework, it is actually an enum value. This is probably done due to the various representations from String.utf8, String.utf16, and String.unicodeScalars.
It seems they do not expect you to work with Character values but rather Strings and you as the programmer decide how to get these from the String itself, allowing encoding to be preserved.
That said, if you need to get the code points in a concise manner, I would recommend an extension like such:
extension Character
{
func unicodeScalarCodePoint() -> UInt32
{
let characterString = String(self)
let scalars = characterString.unicodeScalars
return scalars[scalars.startIndex].value
}
}
Then you can use it like so:
let char : Character = "A"
char.unicodeScalarCodePoint()
In summary, string and character encoding is a tricky thing when you factor in all the possibilities. In order to allow each possibility to be represented, they went with this scheme.
Also remember this is a 1.0 release, I'm sure they will expand Swift's syntactical sugar soon.
I think there are some misunderstandings about the Unicode. Unicode itself is NOT an encoding, it does not transform any grapheme clusters (or "Characters" from human reading respect) into any sort of binary sequence. The Unicode is just a big table which collects all the grapheme clusters used by all languages on Earth (unofficially also includes the Klingon). Those grapheme clusters are organized and indexed by the code points (a 21-bit number in swift, and looks like U+D800). You can find where the character you are looking for in the big Unicode table by using the code points
Meanwhile, the protocol called UTF8, UTF16, UTF32 is actually encodings. Yes, there are more than one ways to encode the Unicode characters into binary sequences. Using which protocol depends on the project you are working, but most of the web page is encoded by UTF-8 (you can actually check it now).
Concept 1: The Unicode point is called the Unicode Scalar in Swift
A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive. Unicode scalars do not include the Unicode surrogate pair code points, which are the code points in the range U+D800 to U+DFFF inclusive.
Concept 2: The Code Unit is the abstract representation of the encoding.
Consider the following code snippet
let theCat = "Cat!🐱"
for char in theCat.utf8 {
print("\(char) ", terminator: "") //Code Unit of each grapheme cluster for the UTF-8 encoding
}
print("")
for char in theCat.utf8 {
print("\(String(char, radix: 2)) ", terminator: "") //Encoding of each grapheme cluster for the UTF-8 encoding
}
print("")
for char in theCat.utf16 {
print("\(char) ", terminator: "") //Code Unit of each grapheme cluster for the UTF-16 encoding
}
print("")
for char in theCat.utf16 {
print("\(String(char, radix: 2)) ", terminator: "") //Encoding of each grapheme cluster for the UTF-16 encoding
}
print("")
for char in theCat.unicodeScalars {
print("\(char.value) ", terminator: "") //Code Unit of each grapheme cluster for the UTF-32 encoding
}
print("")
for char in theCat.unicodeScalars {
print("\(String(char.value, radix: 2)) ", terminator: "") //Encoding of each grapheme cluster for the UTF-32 encoding
}
Abstract representation means: Code unit is written by the base-10 number (decimal number) it equals to the base-2 encoding (binary sequence). Encoding is made for the machines, Code Unit is more for humans, it is easy to read than binary sequences.
Concept 3: A character may have different Unicode point(s). It depends on how the character is contracted by what grapheme clusters, (this is why I said "Characters" from human reading respect in the beginning)
consider the following code snippet
let precomposed: String = "\u{D55C}"
let decomposed: String = "\u{1112}\u{1161}\u{11AB}"
print(precomposed.characters.count) // print "1"
print(decomposed.characters.count) // print "1" => Character != grapheme cluster
print(precomposed) //print "한"
print(decomposed) //print "한"
The character precomposed and decomposed is visually and linguistically equal, But they have different Unicode point and different code unit if they encoded by the same encoding protocol (see the following example)
for preCha in precomposed.utf16 {
print("\(preCha) ", terminator: "") //print 55357 56374 128054 54620
}
print("")
for deCha in decomposed.utf16 {
print("\(deCha) ", terminator: "") //print 4370 4449 4523
}
Extra example
var word = "cafe"
print("the number of characters in \(word) is \(word.characters.count)")
word += "\u{301}"
print("the number of characters in \(word) is \(word.characters.count)")
Summary: Code Points, A.k.a the position index of the characters in Unicode, has nothing to do with UTF-8, UTF-16 and UTF-32 encoding schemes.
Further Readings:
http://www.joelonsoftware.com/articles/Unicode.html
http://kunststube.net/encoding/
https://www.mikeash.com/pyblog/friday-qa-2015-11-06-why-is-swifts-string-api-so-hard.html
I think the issue is that Character doesn't represent a Unicode code point. It represents a "Unicode grapheme cluster", which can consist of multiple code points.
Instead, UnicodeScalar represents a Unicode code point.
I agree with you, there should be a way to get the code directly from character. But all I can offer is a shorthand:
let ch: Character = "A"
for code in String(ch).utf8 { println(code) }
#1. Using Unicode.Scalar's value property
With Swift 5, Unicode.Scalar has a value property that has the following declaration:
A numeric representation of the Unicode scalar.
var value: UInt32 { get }
The following Playground sample code shows how to iterate over the unicodeScalars property of a Character and print the value of each Unicode scalar that composes it:
let character: Character = "A"
for scalar in character.unicodeScalars {
print(scalar.value)
}
/*
prints: 65
*/
As an alternative, you can use the sample code below if you only want to print the value of the first unicode scalar of a Character:
let character: Character = "A"
let scalars = character.unicodeScalars
let firstScalar = scalars[scalars.startIndex]
print(firstScalar.value)
/*
prints: 65
*/
#2. Using Character's asciiValue property
If what you really want is to get the ASCII encoding value of a character, you can use Character's asciiValue. asciiValue has the following declaration:
Returns the ASCII encoding value of this Character, if ASCII.
var asciiValue: UInt8? { get }
The Playground sample code below show how to use asciiValue:
let character: Character = "A"
print(String(describing: character.asciiValue))
/*
prints: Optional(65)
*/
let character: Character = "П"
print(String(describing: character.asciiValue))
/*
prints: nil
*/
Have you tried:
import Foundation
let characterString: String = "abc"
var numbers: [Int] = Array<Int>()
for character in characterString.utf8 {
let stringSegment: String = "\(character)"
let anInt: Int = stringSegment.toInt()!
numbers.append(anInt)
}
numbers
Output:
[97, 98, 99]
It may also be only one Character in the String.