Swift custom operators with Unicode combining characters - swift

TL;DR
Can I coax the compiler to accept a combining character as a postfix operator?
The references at Swift.org and GitHub and this useful gist suggest that combining characters (e.g. U+0300 ff.) may serve as operators in Swift.
With judicious implementation (omitted here) I can say “Fiat Lux” and there is
prefix operator ‖ // Find the norm.
postfix operator ‖ // Does nothing.
func / // Scalar division.
which allows
let vHat = v / ‖v‖ // Readable as math.
or even
let v̂ = v / ‖v‖ // Loving it.
The OCD in me wants now to use the combining circumflex as a (topfix) operator like this:
let normalizedV = v̂ // Combining char is really a postfix.
So I leap in and try to write:
postfix operator ^ // Want this to be *combining* circumflex.
postfix func ^(v: Vector) -> Vector { v / ‖v‖ }
and can do it with plain old U+005E circumflex, but get (various) compiler errors when I try with the combining circumflex U+0302.

An operator name (or any other identifier) cannot start with the U+0302 character. Like all combining marks, it is an allowed “operator-character” but not an allowed “operator-head”. From Lexical Structure > Operators in “The Swift Programming Language”:
GRAMMAR OF OPERATORS
operator → operator-head operator-charactersopt
...
operator-character → U+0300–U+036F

Related

greek char functions (for fun)

why can I write (in swift)
func β(a: Double, b: Double) -> Double { exp( lgamma(a) + lgamma(b) - lgamma(a + b) ) }
or
func Γ(_ x: Double) -> Double { tgamma(x) }
but not
func √(_ x: Double) -> Double { return sqrt(x) }
See Identifiers in the Swift Language Reference:
Identifiers begin with an uppercase or lowercase letter A through Z, an underscore (_), a noncombining alphanumeric Unicode character in the Basic Multilingual Plane, or a character outside the Basic Multilingual Plane that isn’t in a Private Use Area. After the first character, digits and combining Unicode characters are also allowed.
β and Γ are each a "noncombining alphanumeric Unicode character in the Basic Multilingual Plane." √ is not (nor does it meet any of the other requirements).
That said, √ is a valid operator, so you can write:
prefix operator √
prefix func √(_ x: Double) -> Double { return sqrt(x) }
print(√2)
The basic rules for Operators (from the document above) are:
Custom operators can begin with one of the ASCII characters /, =, -, +, !, *, %, <, >, &, |, ^, ?, or ~, or one of the Unicode characters defined in the grammar below (which include characters from the Mathematical Operators, Miscellaneous Symbols, and Dingbats Unicode blocks, among others). After the first character, combining Unicode characters are also allowed.
√ is included in "Mathematical Operators."
The square root character appears to be a valid operation identifier in Swift.
Character
Unicode Value
Unicode Name
√
221A
SQUARE ROOT
Have you checked the last method declaration without the return keyword?
func √(_ x: Double) -> Double { sqrt(x) }

Swift string indexing combines "\r\n" as one char instead of two

I am dealing with strings containing \r\n with Swift 4.2. I ran into kind of strange behavior of Swift index, it appears \r\n will be treated as one character instead of two by Swift indexing methods. I wrote a piece of code to present this behavior:
var text = "ABC\r\n\r\nDEF"
func printChar(_ lower: Int, _ upper: Int) {
let start = text.index(text.startIndex, offsetBy: lower)
let end = text.index(text.startIndex, offsetBy: upper)
print("\"" + text[start..<end] + "\"")
}
printChar(0, 1) // "A"
printChar(1, 2) // "B"
printChar(2, 3) // "C"
printChar(3, 4) // new line
printChar(4, 5) // new line (okay, what's going on here?)
printChar(5, 6) // "D"
printChar(6, 7) // "E"
printChar(7, 8) // "F"
The print result will be
"A"
"B"
"C"
"
"
"
"
"D"
"E"
"F"
Any idea why it's like this?
TLDR: \r\n is a grapheme cluster and is treated as a single Character in Swift because Unicode.
Swift treats \r\n as one Character.
Objective-C NSString treats it as two characters (in terms of the result from length).
On the swift-users forum someone wrote:
– "\r\n" is a single Character. Is this the correct behaviour?
– Yes, a Character corresponds to a Unicode grapheme cluster, and "\r\n" is considered a single grapheme cluster.
And the subsequent response posted a link to Unicode documentation, check out this table which officially states CRLF is a grapheme cluster.
Take a look at the Apple documentation on Characters and Grapheme Clusters.
It's common to think of a string as a sequence of characters, but when working with NSString objects, or with Unicode strings in general, in most cases it is better to deal with substrings rather than with individual characters. The reason for this is that what the user perceives as a character in text may in many cases be represented by multiple characters in the string.
The Swift documentation on Strings and Characters is also worth reading.
This overview from objc.io is interesting as well.
NSString represents UTF-16-encoded text. Length, indices, and ranges are all based on UTF-16 code units.
Another example of this is an emoji like 👍🏻. This single character is actually %uD83D%uDC4D%uD83C%uDFFB, four different unicode scalars. But if you called count on a string with just that emoji you'd (correctly) get 1.
If you wanted to see the scalars you could iterate them as follows:
for scalar in text.unicodeScalars {
print("\(scalar.value) ", terminator: "")
}
Which for "\r\n" would give you 13 10
In the Swift documentation you'll find why NSString is different:
The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.
Thus this isn't really "strange" behaviour of Swift string indexing, but rather a result of how Unicode treats these characters and how String in Swift is designed. Swift string indexing goes by Character and \r\n is a single Character.

What does the ?= operator do in Swift?

I just came across some code that looks like this:
var msg:String = "";
msg ?= err["ErrorMessage"].text;
The err variable is from SwiftyXMLParser from what I can see in the code. I'm at a loss about the meaning of the ?= (questionmark-equals) operator. I cannot find documentation about it. What is it doing?
This question is a quite interesting topic in Swift language.
In other programming languages, it is closed to operator overloading whereas in Swifty terms, it is called Custom Operators. Swift has his own standard operator, but we can add additional operator too. Swift has 4 types of operators, among them, first 3 are available to use with custom operators:
Infix: Used between two values, like the addition operator (e.g. 1 + 2)
Prefix: Added before a value, like the negative operator (e.g. -3).
Postfix: Added after a value, like the force-unwrap operator (e.g. objectNil!)
Ternary: Two symbols inserted between three values.
Custom operators can begin with one of the ASCII characters /, =, -, +, !, *, %, <, >, &, |, ^, ?, or ~, or one of the Unicode characters.
New operators are declared at a global level using the operator keyword, and are marked with the prefix, infix or postfix modifiers:
Here is a sample example in the playground[Swift 4].
infix operator ?=
func ?= (base: inout String, with: String)
{
base = base + " " + with
}
var str = "Stack"
str ?= "Overflow"
print(str)
Output:
Stack Overflow
Please check the topic name Advanced operator in apple doc.

Is there a clean way to specify character literals in Swift?

Swift seems to be trying to deprecate the notion of a string being composed of an array of atomic characters, which makes sense for many uses, but there's an awful lot of programming that involves picking through datastructures that are ASCII for all practical purposes: particularly with file I/O. The absence of a built in language feature to specify a character literal seems like a gaping hole, i.e. there is no analog of the C/Java/etc-esque:
String foo="a"
char bar='a'
This is rather inconvenient, because even if you convert your strings into arrays of characters, you can't do things like:
let ch:unichar = arrayOfCharacters[n]
if ch >= 'a' && ch <= 'z' {...whatever...}
One rather hacky workaround is to do something like this:
let LOWCASE_A = ("a" as NSString).characterAtIndex(0)
let LOWCASE_Z = ("z" as NSString).characterAtIndex(0)
if ch >= LOWCASE_A && ch <= LOWCASE_Z {...whatever...}
This works, but obviously it's pretty ugly. Does anyone have a better way?
Characters can be created from Strings as long as those Strings are only made up of a single character. And, since Character implements ExtendedGraphemeClusterLiteralConvertible, Swift will do this for you automatically on assignment. So, to create a Character in Swift, you can simply do something like:
let ch: Character = "a"
Then, you can use the contains method of an IntervalType (generated with the Range operators) to check if a character is within the range you're looking for:
if ("a"..."z").contains(ch) {
/* ... whatever ... */
}
Example:
let ch: Character = "m"
if ("a"..."z").contains(ch) {
println("yep")
} else {
println("nope")
}
Outputs:
yep
Update: As #MartinR pointed out, the ordering of Swift characters is based on Unicode Normalization Form D which is not in the same order as ASCII character codes. In your specific case, there are more characters between a and z than in straight ASCII (ä for example). See #MartinR's answer here for more info.
If you need to check if a character is in between two ASCII character codes, then you may need to do something like your original workaround. However, you'll also have to convert ch to an unichar and not a Character for it to work (see this question for more info on Character vs unichar):
let a_code = ("a" as NSString).characterAtIndex(0)
let z_code = ("z" as NSString).characterAtIndex(0)
let ch_code = (String(ch) as NSString).characterAtIndex(0)
if (a_code...z_code).contains(ch_code) {
println("yep")
} else {
println("nope")
}
Or, the even more verbose way without using NSString:
let startCharScalars = "a".unicodeScalars
let startCode = startCharScalars[startCharScalars.startIndex]
let endCharScalars = "z".unicodeScalars
let endCode = endCharScalars[endCharScalars.startIndex]
let chScalars = String(ch).unicodeScalars
let chCode = chScalars[chScalars.startIndex]
if (startCode...endCode).contains(chCode) {
println("yep")
} else {
println("nope")
}
Note: Both of those examples only work if the character only contains a single code point, but, as long as we're limited to ASCII, that shouldn't be a problem.
If you need C-style ASCII literals, you can just do this:
let chr = UInt8(ascii:"A") // == UInt8( 0x41 )
Or if you need 32-bit Unicode literals you can do this:
let unichr1 = UnicodeScalar("A").value // == UInt32( 0x41 )
let unichr2 = UnicodeScalar("é").value // == UInt32( 0xe9 )
let unichr3 = UnicodeScalar("😀").value // == UInt32( 0x1f600 )
Or 16-bit:
let unichr1 = UInt16(UnicodeScalar("A").value) // == UInt16( 0x41 )
let unichr2 = UInt16(UnicodeScalar("é").value) // == UInt16( 0xe9 )
All of these initializers will be evaluated at compile time, so it really is using an immediate literal at the assembly instruction level.
The feature you want was proposed to be in Swift 5.1, but that proposal was rejected for a few reasons:
Ambiguity
The proposal as written, in the current Swift ecosystem, would have allowed for expressions like 'x' + 'y' == "xy", which was not intended (the proper syntax would be "x" + "y" == "xy").
Amalgamation
The proposal was two in one.
First, it proposed a way to introduce single-quote literals into the language.
Second, it proposed that these would be convertible to numerical types to deal with ASCII values and Unicode codepoints.
These are both good proposals, and it was recommended that this be split into two and re-proposed. Those follow-up proposals have not yet been formalized.
Disagreement
It never reached consensus whether the default type of 'x' would be a Character or a Unicode.Scalar. The proposal went with Character, citing the Principle of Least Surprise, despite this lack of consensus.
You can read the full rejection rationale here.
The syntax might/would look like this:
let myChar = 'f' // Type is Character, value is solely the unicode U+0066 LATIN SMALL LETTER F
let myInt8: Int8 = 'f' // Type is Int8, value is 102 (0x66)
let myUInt8Array: [UInt8] = [ 'a', 'b', '1', '2' ] // Type is [UInt8], value is [ 97, 98, 49, 50 ] ([ 0x61, 0x62, 0x31, 0x32 ])
switch someUInt8 {
case 'a' ... 'f': return "Lowercase hex letter"
case 'A' ... 'F': return "Uppercase hex letter"
case '0' ... '9': return "Hex digit"
default: return "Non-hex character"
}
It also looks like you can use the following syntax:
Character("a")
This will create a Character from the specified single character string.
I have only tested this in Swift 4 and Xcode 10.1
Why do I exhume 7 year old posts? Fun I guess? Seriously though, I think I can add to the discussion.
It is not a gaping hole, or rather, it is a deliberate gaping hole that explicitly discourages conflating a string of text with a sequence of ASCII bytes.
You absolutely can pick apart a String. A String implements BidirectionalCollection and has many ways to manipulate the atoms. See: https://developer.apple.com/documentation/swift/string.
But you have to get used to the more generalized notion of a String. It can be picked apart from the User perspective, which is a sequence of grapheme clusters, each (usually) which a visually separable appearance, or from the encoding perspective, which can be one of several (UTF32, UTF16, UTF8).
At the risk of overanalyzing the wording of your question:
A data structure is conceptual, and independent of encoding in storage
A data structure encoded as an ASCII string is just one kind of ASCII string
By design the encoding of ASCII values 0-127 will have an identical encoding in UTF-8, so loading that stream with a UTF8 API is fine
A data structure encoded as a string where fields of the structure have UTF-8 Unicode string values is not an ASCII string, but a UTF-8 string itself
A string is either ASCII-encoded or not; "for practical purposes" isn't a meaningful qualifier. A UTF-8 database field where 99.99% of the text falls in the ASCII range (where encodings will match), but occasionally doesn't, will present some nasty bug opportunities.
Instead of a terse and low-level equivalence of fixed-width integers and English-only text, Swift has a richer API that forces more explicit naming of the involved categories and entities. If you want to deal with ASCII, there's a name (method) for that, and if you want to deal with human sub-categories, there's a name for that, too, and they're totally independent of one another. There is a strong move away from ASCII and the English-centric string handling model of C. This is factual, not evangelizing, and it can present an irksome learning curve.
(This is aimed at new-comers, acknowledging the OP probably has years of experience with this now.)
For what you're trying to do there, consider:
let foo = "abcDeé#¶œŎO!##"
foo.forEach { c in
print((c.isASCII ? "\(c) is ascii with value \(c.asciiValue ?? 0); " : "\(c) is not ascii; ")
+ ((c.isLetter ? "\(c) is a letter" : "\(c) is not a letter")))
}
b is ascii with value 98; b is a letter
c is ascii with value 99; c is a letter
D is ascii with value 68; D is a letter
e is ascii with value 101; e is a letter
é is not ascii; é is a letter
# is ascii with value 64; # is not a letter
¶ is not ascii; ¶ is not a letter
œ is not ascii; œ is a letter
Ŏ is not ascii; Ŏ is a letter
O is ascii with value 79; O is a letter
! is ascii with value 33; ! is not a letter
# is ascii with value 64; # is not a letter
# is ascii with value 35; # is not a letter

Scala string pattern matching for mathematical symbols

I have the following code:
val z: String = tree.symbol.toString
z match {
case "method +" | "method -" | "method *" | "method ==" =>
println("no special op")
false
case "method /" | "method %" =>
println("we have the special div operation")
true
case _ =>
false
}
Is it possible to create a match for the primitive operations in Scala:
"method *".matches("(method) (+-*==)")
I know that the (+-*) signs are used as quantifiers. Is there a way to match them anyway?
Thanks from a avidly Scala scholar!
Sure.
val z: String = tree.symbol.toString
val noSpecialOp = "method (?:[-+*]|==)".r
val divOp = "method [/%]".r
z match {
case noSpecialOp() =>
println("no special op")
false
case divOp() =>
println("we have the special div operation")
true
case _ =>
false
}
Things to consider:
I choose to match against single characters using [abc] instead of (?:a|b|c).
Note that - has to be the first character when using [], or it will be interpreted as a range. Likewise, ^ cannot be the first character inside [], or it will be interpreted as negation.
I'm using (?:...) instead of (...) because I don't want to extract the contents. If I did want to extract the contents -- so I'd know what was the operator, for instance, then I'd use (...). However, I'd also have to change the matching to receive the extracted content, or it would fail the match.
It is important not to forget () on the matches -- like divOp(). If you forget them, a simple assignment is made (and Scala will complain about unreachable code).
And, as I said, if you are extracting something, then you need something inside those parenthesis. For instance, "method ([%/])".r would match divOp(op), but not divOp().
Much the same as in Java. To escape a character in a regular expression, you prefix the character with \. However, backslash is also the escape character in standard Java/Scala strings, so to pass it through to the regular expression processing you must again prefix it with a backslash. You end up with something like:
scala> "+".matches("\\+")
res1 : Boolean = true
As James Iry points out in the comment below, Scala also has support for 'raw strings', enclosed in three quotation marks: """Raw string in which I don't need to escape things like \!""" This allows you to avoid the second level of escaping, that imposed by Java/Scala strings. Note that you still need to escape any characters that are treated as special by the regular expression parser:
scala> "+".matches("""\+""")
res1 : Boolean = true
Escaping characters in Strings works like in Java.
If you have larger Strings which need a lot of escaping, consider Scala's """.
E. g. """String without needing to escape anything \n \d"""
If you put three """ around your regular expression you don't need to escape anything anymore.