How to convert String to UTF-8 to Integer in Swift - swift

I'm trying to take each character (individual number, letter, or symbol) from a string file name without the extension and put each one into an array index as an integer of the utf-8 code (i.e. if the file name is "A1" without the extension, I would want "A" as an int "41" in first index, and "1" as int "31" in second index)
Here is the code I have but I'm getting this error "No exact matches in call to instance method 'append'", my guess is because .utf8 still keeps it as a string type:
for i in allNoteFiles {
var CharacterArray : [Int] = []
for character in i {
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character) //error is here
}
....`//more code down here within the for in loop using CharacterArray indexes`
I'm sure the answer is probably simple, but I'm very new to Swift.
I've tried appending var number instead with:
var number = Int(utf8Character)
and
var number = (utf8Character).IntegerValue
but I get errors "No exact matches in call to initializer" and "Value of type 'String.UTF8View' has no member 'IntegerValue'"
Any help at all would be greatly appreciated. Thanks!

The reason
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character)
doesn't work for you is because utf8Character is not a single integer, but a UTF8View: a lightweight way to iterate over the UTF-8 codepoints in a string. Every Character in a String can be made up of any number of UTF-8 bytes (individual integers) β€” while ASCII characters like "A" and "1" map to a single UTF-8 byte, the vast majority of characters do not: every UTF-8 code point maps to between 1 and 4 individual bytes. The Encoding section of UTF-8 on Wikipedia has a few very illustrative examples of how this works.
Now, assuming that you do want to split a string into individual UTF-8 bytes (either because you can guarantee your original string is ASCII-only, so the assumption that "character = byte" holds, or because you actually care about the bytes [though this is rarely the case]), there's a short and idiomatic solution to what you're looking for.
String.UTF8View is a Sequence of UInt8 values (individual bytes), and as such, you can use the Array initializer which takes a Sequence:
let characterArray: [UInt8] = Array(i.utf8)
If you need an array of Int values instead of UInt8, you can map the individual bytes ahead of time:
let characterArray: [Int] = Array(i.utf8.lazy.map { Int($0) })
(The .lazy avoids creating and storing an array of values in the middle of the operation.)
However, do note that if you aren't careful (e.g., your original string is not ASCII), you're bound to get very unexpected results from this operation, so keep that in mind.

Related

convert ByteArray to String to ByteArray

I want to convert ByteArray to string and then convert the string to ByteArray,But while converting values changed. someone help to solve this problem.
person.proto:
syntax = "proto3";
message Person{
string name = 1;
int32 age = 2;
}
After sbt compile it gives case class Person (created by google protobuf while compiling)
My MainClass:
val newPerson = Person(
name = "John Cena",
age = 44 //output
)
println(newPerson.toByteArray) //[B#50da041d
val l = newPerson.toByteArray.toString
println(l) //[B#7709e969
val l1 = l.getBytes
println(l1) //[B#f44b405
why the values changed?? how to convert correctly??
[B#... is the format that a JVM byte array's .toString returns, and is just [B (which means "byte array") and a hex-string which is analogous to the memory address at which the array resides (I'm deliberately not calling it a pointer but it's similar; the precise mapping of that hex-string to a memory address is JVM-dependent and could be affected by things like which garbage collector is in use). The important thing is that two different arrays with the same bytes in them will have different .toStrings. Note that in some places (e.g. the REPL), Scala will instead print something like Array(-127, 0, 0, 1) instead of calling .toString: this may cause confusion.
It appears that toByteArray emits a new array each time it's called. So the first time you call newPerson.toByteArray, you get an array at a location corresponding to 50da041d. The second time you call it you get a byte array with the same contents at a location corresponding to 7709e969 and you save the string [B#7709e969 into the variable l. When you then call getBytes on that string (saving it in l1), you get a byte array which is an encoding of the string "[B#7709e969" at the location corresponding to f44b405.
So at the locations corresponding to 50da041d and 7709e969 you have two different byte arrays which happen to contain the same elements (those elements being the bytes in the proto representation of newPerson). At the location corresponding to f44b405 you have a byte array where the bytes encode (in some character set, probably UTF-16?) [B#7709e969.
Because a proto isn't really a string, there's no general way to get a useful string (depending on what definition of useful you're dealing with). You could try interpreting a byte array from toByteArray as a string with a given character encoding, but there's no guarantee that any given proto will be valid in an arbitrary character encoding.
An encoding which is purely 8-bit, like ISO-8859-1 is guaranteed to at least be decodable from a byte array, but there could be non-printable or control characters, so it's not likely to that useful:
val iso88591Representation = new String(newPerson.toByteArray, java.nio.charset.StandardCharsets.ISO_8859_1)
Alternatively, you might want a representation like how the Scala REPL will (sometimes) render it:
"Array(" + newPerson.toByteArray.mkString(", ") + ")"

Getting character ASCII value as an Integer in Swift

I have been trying to get the character ascii code as an int so as then I can modify it and change the character by doing some math. However I am finding it difficult to do so as I get conversion errors between the different types of integers and can't seem to find an answer
var n:Character = pass[I] //using the string protocol extension
if n.isASCII
{
var tempo:Int = Int(n.asciiValue)
temp += (tempo | key) //key and temp are of type int
}
In Swift, a Character may not necessarily be an ASCII one. It would for example have no sense to return the ascii value of "πŸͺ‚" which requires a large unicode encoding. This is why asciiValue property has an optional UInt8 value, which is annotated UInt8?.
The simplest solution
Since you checked yourself that the character isAscii, you can safely go for an unconditional unwrapping with !:
var tempo:Int = Int(n.asciiValue!) // <--- just change this line
A more elegant alternative
You could also take advantage of optional binding that uses the fact that the optional is nil when there is no ascii value (i.e. n was not an ASCII character):
if let tempo = n.asciiValue // is true only if there is an ascii value
{
temp += (Int(tempo) | key)
}

If a sequence of code points forms a Unicode character, does every non-empty prefix of that sequence also form a valid character?

The problem I have is that given a sequence of bytes, I want to determine its longest prefix which forms a valid Unicode character (extended grapheme cluster) assuming UTF8 encoding.
I am using Swift, so I would like to use Swift's built-in function(s) to do so. But these functions only decode a complete sequence of bytes. So I was thinking to convert prefixes of the byte sequence via Swift and take the last prefix that didn't fail and consists of 1 character only. Obviously, this might lead to trying out the entire sequence of bytes, which I want to avoid. A solution would be to stop trying out prefixes after 4 prefixes in a row failed. If the property asked in my question holds, this would then guarantee that all longer prefixes must also fail.
I find the Unicode Text Segmentation Standard unreadable, otherwise I would try to directly implement boundary detection of extended grapheme clusters...
After taking a long hard look at the specification for computing the boundaries for extended grapheme clusters (EGCs) at https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules,
it is obvious that the rules for EGCs all have the shape of describing when it is allowed to append a code point to an existing EGC to form a longer EGC. From that fact alone my two questions follow: 1) Yes, every non-empty prefix of code points which form an EGC is also an EGC. 2) No, by adding a code point to a valid Unicode string you will not decrease its length in terms of number of EGCs it consists of.
So, given this, the following Swift code will extract the longest Unicode character from the start of a byte sequence (or return nil if there is no valid Unicode character there):
func lex<S : Sequence>(_ input : S) -> (length : Int, out: Character)? where S.Element == UInt8 {
// This code works under three assumptions, all of which are true:
// 1) If a sequence of codepoints does not form a valid character, then appending codepoints to it does not yield a valid character
// 2) Appending codepoints to a sequence of codepoints does not decrease its length in terms of extended grapheme clusters
// 3) a codepoint takes up at most 4 bytes in an UTF8 encoding
var chars : [UInt8] = []
var result : String = ""
var resultLength = 0
func value() -> (length : Int, out : Character)? {
guard let character = result.first else { return nil }
return (length: resultLength, out: character)
}
var length = 0
var iterator = input.makeIterator()
while length - resultLength <= 4 {
guard let char = iterator.next() else { return value() }
chars.append(char)
length += 1
guard let s = String(bytes: chars, encoding: .utf8) else { continue }
guard s.count == 1 else { return value() }
result = s
resultLength = length
}
return value()
}

How can I convert a single Character type to uppercase?

All I want to do is convert a single Character to uppercase without the overhead of converting to a String and then calling .uppercased(). Is there any built-in way to do this, or a way for me to call the toupper() function from C without any bridging? I really don't think I should have to go out of my way for something so simple.
To call the C toupper() you need to get the Unicode code point of the Character. But Character has no method for getting its code point (a Character may consist of multiple code points), so you have to convert the Character into a String to obtain any of its code points.
So you really have to convert to String to get anywhere. Unless you store the character as a UnicodeScalar instead of a Character. In this case you can do this:
assert(unicodeScalar.isASCII) // toupper argument must be "representable as an unsigned char"
let uppercase = UnicodeScalar(toupper(CInt(unicodeScalar.value)))
But this isn't really more readable than simply using String:
let uppercase = Character(String(character).uppercased())
just add this to your program
extension Character {
//converts a character to uppercase
func convertToUpperCase() -> Character {
if(self.isUppercase){
return self
}
return Character(self.uppercased())
}
}

Convert Character to Integer in Swift

I am creating an iPhone app and I need to convert a single digit number into an integer.
My code has a variable called char that has a type Character, but I need to be able to do math with it, therefore I think I need to convert it to a string, however I cannot find a way to do that.
In the latest Swift versions (at least in Swift 5) there is a more straighforward way of converting Character instances. Character has property wholeNumberValue which tries to convert a character to Int and returns nil if the character does not represent and integer.
let char: Character = "5"
if let intValue = char.wholeNumberValue {
print("Value is \(intValue)")
} else {
print("Not an integer")
}
With a Character you can create a String. And with a String you can create an Int.
let char: Character = "1"
if let number = Int(String(char)) {
// use number
}
The String middleman type conversion isn’t necessary if you use the unicodeScalars property of Swift 4.0’s Character type.
let myChar: Character = "3"
myChar.unicodeScalars.first!.value - Unicode.Scalar("0")!.value // 3: UInt32
This uses a trick commonly seen in C code of subtracting the value of the char ’0’ literal to convert from ascii values to decimal values. See this site for the conversions: https://www.asciitable.com
Also there are some implicit unwraps in my answer. To avoid those, you can validate that you have a decimal digit with CharacterSet.decimalDigits, and/or use guard lets around the first property. You can also subtract 48 directly rather than converting ”0” through Unicode.Scalar.