What is the difference between char::is_digit and char::is_numeric? - unicode

What is the difference between char::is_digit and char::is_numeric?
I notice that a general numeric character gives an invalid digit error when converting to a number; is it possible to get the numeric value of a numeric character? Is that a valid thing to do?

char::is_numeric checks whether a character is numeric according to Unicode (specifically if it falls under Unicode General Categories Nd, Nl and No) while char::is_digit can recognize regular digits and digits in radixes different than 10 (up to 36), e.g. hexadecimal a-f (radix 16).
Example difference:
assert!(char::is_numeric('a')); // fails
assert!(char::is_digit('a', 10)); // fails
assert!(char::is_digit('a', 16)); // works
It's ok to obtain numeric values of characters - you just need to provide the right radix:
println!("{}", 'a'.to_digit(16).unwrap()); // 10
println!("{}", 'z'.to_digit(36).unwrap()); // 35

According to the Rust docs, 'digit' is defined to be only the following characters: 0-9 a-z A-Z.
The is_numeric function looks to just check to see if the value is in fact a number there are some cool examples in the docs.

Related

Swift string indexing combines "\r\n" as one char instead of two

I am dealing with strings containing \r\n with Swift 4.2. I ran into kind of strange behavior of Swift index, it appears \r\n will be treated as one character instead of two by Swift indexing methods. I wrote a piece of code to present this behavior:
var text = "ABC\r\n\r\nDEF"
func printChar(_ lower: Int, _ upper: Int) {
let start = text.index(text.startIndex, offsetBy: lower)
let end = text.index(text.startIndex, offsetBy: upper)
print("\"" + text[start..<end] + "\"")
}
printChar(0, 1) // "A"
printChar(1, 2) // "B"
printChar(2, 3) // "C"
printChar(3, 4) // new line
printChar(4, 5) // new line (okay, what's going on here?)
printChar(5, 6) // "D"
printChar(6, 7) // "E"
printChar(7, 8) // "F"
The print result will be
"A"
"B"
"C"
"
"
"
"
"D"
"E"
"F"
Any idea why it's like this?
TLDR: \r\n is a grapheme cluster and is treated as a single Character in Swift because Unicode.
Swift treats \r\n as one Character.
Objective-C NSString treats it as two characters (in terms of the result from length).
On the swift-users forum someone wrote:
– "\r\n" is a single Character. Is this the correct behaviour?
– Yes, a Character corresponds to a Unicode grapheme cluster, and "\r\n" is considered a single grapheme cluster.
And the subsequent response posted a link to Unicode documentation, check out this table which officially states CRLF is a grapheme cluster.
Take a look at the Apple documentation on Characters and Grapheme Clusters.
It's common to think of a string as a sequence of characters, but when working with NSString objects, or with Unicode strings in general, in most cases it is better to deal with substrings rather than with individual characters. The reason for this is that what the user perceives as a character in text may in many cases be represented by multiple characters in the string.
The Swift documentation on Strings and Characters is also worth reading.
This overview from objc.io is interesting as well.
NSString represents UTF-16-encoded text. Length, indices, and ranges are all based on UTF-16 code units.
Another example of this is an emoji like 👍🏻. This single character is actually %uD83D%uDC4D%uD83C%uDFFB, four different unicode scalars. But if you called count on a string with just that emoji you'd (correctly) get 1.
If you wanted to see the scalars you could iterate them as follows:
for scalar in text.unicodeScalars {
print("\(scalar.value) ", terminator: "")
}
Which for "\r\n" would give you 13 10
In the Swift documentation you'll find why NSString is different:
The count of the characters returned by the count property isn’t always the same as the length property of an NSString that contains the same characters. The length of an NSString is based on the number of 16-bit code units within the string’s UTF-16 representation and not the number of Unicode extended grapheme clusters within the string.
Thus this isn't really "strange" behaviour of Swift string indexing, but rather a result of how Unicode treats these characters and how String in Swift is designed. Swift string indexing goes by Character and \r\n is a single Character.

Why is Swift Decimal Returning Number from String Containing Letters?

I am working with Swift's Decimal type, trying to ensure that an user-entered String is a valid Decimal.
I have two String values, each including a letter, within my Playground file. One of the values contains a letter at the start, while the other contains a letter at the end. I initialize a Decimal using each value, and only one Decimal initialization fails; the Decimal initialized with the value that contains the letter at the beginning.
Why does the Decimal initialized with a value that contains a letter at the end return a valid Decimal? I expect nil to be returned.
Attached is a screenshot from my Playground file.
It works this way because Decimal accepts any number values before the letters. The letters act as a terminator for any numbers that comes after it. So in your example:
12a = 12 ( a is the terminator in position 3 )
a12 = nil ( a is the terminator in position 1 )
If wanting both to be invalid if the string contains a letter then you could use Float instead.

Converting a hex to string in Swift formatted to keep the same number of digits

I'm trying to create a string from hex values in an array, but whenever a hex in the array starts with a zero it disappears in the resulting string as well.
I use String(value:radix:uppercase) to create the string.
An example:
Here's an array: [0x13245678, 0x12345678, 0x12345678, 0x12345678].
Which gives me the string: 12345678123456781234567812345678 (32 characters)
But the following array: [0x02345678, 0x12345678, 0x02345678, 0x12345678] (notice that I replaced two 1's with zeroes).
Gives me the string: 234567812345678234567812345678 (30 characters)
I'm not sure why it removes the zeroes. I know the value is correct; how can I format it to keep the zero if it was there?
The number 0x01234567 is really just 0x1234567. Leading zeros in number literals don't mean anything (unless you are using the leading 0 for octal number literals).
Instead of using String(value:radix:uppercase), use String(format:).
let num = 0x1234567
let str = String(format: "%08X", num)
Explanation of the format:
The 0 means to pad the left end of the string with zeros as needed.
The 8 means you want the result to be 8 characters long
The X means you want the number converted to uppercase hex. Use x if you want lowercase hex.

String to Integer (atoi) [Leetcode] gave wrong answer?

String to Integer (atoi)
This problem is implement atoi to convert a string to an integer.
When test input = " +0 123"
My code return = 123
But why expected answer = 0?
======================
And if test input = " +0123"
My code return = 123
Now expected answer = 123
So is that answer wrong?
I think this is expected result as it said
Requirements for atoi:
The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.
Your first test case has a space in between two different digit groups, and atoi only consider the first group which is '0' and convert into integer

Format String to truncate a number to a specific number of digits

Is there a format string to truncate a number to a specific number of digits?
For example, any number greater than 5 digits i would like to truncate to 3 digits.
132456 -> 132
5000000 -> 500
#Erik : Format specifiers like %2d are specific to a language? I actually want to use it in javascript
Pseudo-Code
Function returning a String, receiving a String representing a Number as a parameter
IF the String has more than 5 characters
RETURN a substring containing the first 3 characters.
ELSE
RETURN the string received as a parameter
END IF
END Function
I assume you refer to printf format strings. I couldn't find anything that will truncate an integer argument (i.e. %d). But you can specify the maximum length of a string by referring to a string format string and specifying lengths via "%<MinLength>.<MaxLength>s".
So in your case you could turn your number arguments into strings and then use "%3.3s".