Character is not convertible to String - swift

How I can handle this error, I try to get ascii number of every character in string but I can't convert character back to string in order to check symbol whether it necessary?
Here is my code
var n = "KNjNKJbbsibdcjkdcn___*(&0786"
let r = n.characters.count
for i in stride(from: 0, to: r, by: 1) {
let t = n.characters.index(n.startIndex, offsetBy: i)
String?(n[t])
}
In output should be separated character in string type.

This bit of code will convert a string to an array of ASCII characters (excluding those with no ASCII code):
let str = "KNjNKJbbsibdcjkdcn___*(&0786"
let charCodes = str.unicodeScalars
.filter({ $0.isASCII })
.map({ $0.value })
print(charCodes)

Related

Swift Strings and [Character]

I have this code:
let txt = "over 100MB+ of text..."
let tokenizedText = Array (txt)
let regex = try NSRegularExpression (pattern: "(?s)<tu>.*?</tu>")
let r = regex.matches (in: txt, range: NSRange (txt.startIndex..<txt.endIndex, in: txt))
for match in r {
let befOfMatch = match.range.lowerBound
let endOfMatch = match.range.lowerBound + match.range.length
// check the result
if tokenizedText[begOfMatch] != "<" {
print ("error") // from time to time!!!!
}
}
=> regex.matches produces integer ranges that are not always in sync with the characters array.
I know that UTF8 does not have a one-to-one correspondance between bytes and characters, but how to sync Strings and [Characters] ? I would need to:
-- retrieve the sequence of characters inside the matching sequences as [Character]
-- insert a tag (e.g. <found> ... </found>) around each matching sequence in the buffer (string)
How can I do that?
The issue there is that NSRange it is based on UTF16 therefore the location of the resulting NSRange it is not necessarily the same as the character position in the array of characters (Not every character can be represented by a single byte). You need to convert the resulting NSRange to Range and check the original string using the lowerbound of the String Range:
let txt = "over 100MB+ of text... <tu>whatever</tu>"
let tokenizedText = Array (txt)
let regex = try NSRegularExpression (pattern: "(?s)<tu>.*?</tu>")
let r = regex.matches (in: txt, range: NSRange (txt.startIndex..<txt.endIndex, in: txt))
for match in r {
if let range = Range(match.range, in: txt) {
print (txt[range])
if txt[range.lowerBound] == "<" {
print(true)
} else {
print(false)
}
}
}

Convert UTF-8 (Bytes) Emoji Code to Emoji icon as a text

I am getting this below string as a response from WS API when they send emoji as a string:
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
I want it to be converted to the emoji icon like this -> Hii 😁
I think so it is coming in UTF-8 Format as explained in the below Image: Image Unicode
I have tried decoding it Online using UTF-8 Decoder
And i got the emoticon Successfully decoded
Before Decoding:
After Decoding:
But the issue here is I do not know how to work with it in Swift.
I referred following link but it did not worked for me.
Swift Encode/decode emojis
Any help would be appreciated.
Thanks.
As you already given the link of converter tool which is clearly doing UTF-8 encoding and decoding. You have UTF-8 encoded string so here is an example of UTF8-Decoding.
Objective-C
const char *ch = [#"Hii \xF0\x9F\x98\x81" cStringUsingEncoding:NSUTF8StringEncoding];
NSString *decode_string = [NSString stringWithUTF8String:ch];
NSLog(#"%#",decode_string);
Output: Hii 😁
Swift
I'm able to convert \\xF0\\x9F\\x98\\x81 to 😁 in SWift.
First I converted the hexa string into Data and then back to String using UTF-8 encoding.
var str = "\\xF0\\x9F\\x98\\x81"
if let data = data(fromHexaStr: str) {
print(String(data: data, encoding: String.Encoding.utf8) ?? "")
}
Output: 😁
Below is the function I used to convert the hexa string into data. I followed this answer.
func data(fromHexaStr hexaStr: String) -> Data? {
var data = Data(capacity: hexaStr.characters.count / 2)
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
regex.enumerateMatches(in: hexaStr, range: NSMakeRange(0, hexaStr.utf16.count)) { match, flags, stop in
let byteString = (hexaStr as NSString).substring(with: match!.range)
var num = UInt8(byteString, radix: 16)!
data.append(&num, count: 1)
}
guard data.count > 0 else { return nil }
return data
}
Note: Problem with above code is it converts hexa string only not combined strings.
FINAL WORKING SOLUTION: SWIFT
I have done this by using for loop instead of [0-9a-f]{1,2} regex because this will also scan 81, 9F, Any Two digits number which is wrong obviously.
For example: I have 81 INR \\xF0\\x9F\\x98\\x81.
/// This line will convert "F0" into hexa bytes
let byte = UInt8("F0", radix: 16)
I made a String extension in which I check upto every 4 characters if it has prefix \x and count 4 and last two characters are convertible into hexa bytes by using radix as mentioned above.
extension String {
func hexaDecoededString() -> String {
var newData = Data()
var emojiStr: String = ""
for char in self.characters {
let str = String(char)
if str == "\\" || str.lowercased() == "x" {
emojiStr.append(str)
}
else if emojiStr.hasPrefix("\\x") || emojiStr.hasPrefix("\\X") {
emojiStr.append(str)
if emojiStr.count == 4 {
/// It can be a hexa value
let value = emojiStr.replacingOccurrences(of: "\\x", with: "")
if let byte = UInt8(value, radix: 16) {
newData.append(byte)
}
else {
newData.append(emojiStr.data(using: .utf8)!)
}
/// Reset emojiStr
emojiStr = ""
}
}
else {
/// Append the data as it is
newData.append(str.data(using: .utf8)!)
}
}
let decodedString = String(data: newData, encoding: String.Encoding.utf8)
return decodedString ?? ""
}
}
USAGE:
var hexaStr = "Hi \\xF0\\x9F\\x98\\x81 81"
print(hexaStr.hexaDecoededString())
Hi 😁 81
hexaStr = "Welcome to SP19!\\xF0\\x9f\\x98\\x81"
print(hexaStr.hexaDecoededString())
Welcome to SP19!😁
I fix your issue but it need more work to make it general , the problem here is that your Emijo is Represented by Hex Byte x9F , so we have to convert this Hex to utf8 then convert it to Data and at last convert data to String
Final result Hii 😁 Please read comment
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
// get all matched hex xF0 , x9f,..etc
let matches = regex.matches(in: strTemp, options: [], range: NSMakeRange(0, strTemp.count))
// Data that will hanlde convert hex to UTf8
var emijoData = Data(capacity: strTemp.count / 2)
matches.enumerated().forEach { (offset , check) in
let byteString = (strTemp as NSString).substring(with: check.range)
var num = UInt8(byteString, radix: 16)!
emijoData.append(&num, count: 1)
}
let subStringEmijo = String.init(data: emijoData, encoding: String.Encoding.utf8)!
//now we have your emijo text 😁 we can replace by its code from string using matched ranges `first` and `last`
// All range range of \\xF0\\x9F\\x98\\x81 in "Hii \\xF0\\x9F\\x98\\x81" to replce by your emijo
if let start = matches.first?.range.location, let end = matches.last?.range.location , let endLength = matches.last?.range.length {
let startLocation = start - 2
let length = end - startLocation + endLength
let sub = (strTemp as NSString).substring(with: NSRange.init(location: startLocation, length: length))
print( strTemp.replacingOccurrences(of: sub, with: subStringEmijo))
// Hii 😁
}

Swift 4 hex string to binary string

Noted that the old method to convert a hex string to a binary string has been removed from swift i.e. : String(hex, radix: 2) -> binary string
What is an alternative in swift 4?
You need first to convert your hexaString to an Array of Bytes [UInt8]. Then you can use String(_, radix:) to convert the bytes to binary. Note that if you would like to return a String instead of an array of strings [String] you would need to add leading zeros to make your binary strings length consistent (8 characters):
extension String {
typealias Byte = UInt8
var hexaToBytes: [Byte] {
var start = startIndex
return stride(from: 0, to: count, by: 2).compactMap { _ in // use flatMap for older Swift versions
let end = index(after: start)
defer { start = index(after: end) }
return Byte(self[start...end], radix: 16)
}
}
var hexaToBinary: String {
return hexaToBytes.map {
let binary = String($0, radix: 2)
return repeatElement("0", count: 8-binary.count) + binary
}.joined()
}
}
let hexString = "00ff01fe"
hexString.hexaToBinary // "00000000111111110000000111111110"
I don't recall any function that would convert a hex string to another string of arbitrary radix. Perhaps you are thinking about the initializer functions that convert between strings and integer values (and vice versa) using an arbitrary radix:
let hex = "00ff01fe"
let value = UInt64(hex, radix: 16)!
let binary = String(value, radix: 2)
let paddedBinary = repeatElement("0", count: 64 - binary.count) + binary
But that only applies when the hex string represents a 64 bit value, but it illustrates the basic idea. Convert to some integer type, and then convert back to binary, padding it out with zeros.
If you have a hex string that is longer than that, you cannot use the above. But you can map the individual characters of your hex string to numeric values, build binary representation of each, zero pad them, and use joined to concatenate them together:
let hex = "ffeeddccbbaa99887766554433221100"
let result = hex.compactMap { c -> String? in
guard let value = Int(String(c), radix: 16) else { return nil }
let string = String(value, radix: 2)
return repeatElement("0", count: 4 - string.count) + string
}.joined()

Why does swift substring with range require a special type of Range

Consider this function to build a string of random characters:
func makeToken(length: Int) -> String {
let chars: String = "abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var result: String = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let idxEnd = idx + 1
let range: Range = idx..<idxEnd
let char = chars.substring(with: range)
result += char
}
return result
}
This throws an error on the substring method:
Cannot convert value of type 'Range<Int>' to expected argument
type 'Range<String.Index>' (aka 'Range<String.CharacterView.Index>')
I'm confused why I can't simply provide a Range with 2 integers, and why it's making me go the roundabout way of making a Range<String.Index>.
So I have to change the Range creation to this very over-complicated way:
let idx = Int(arc4random_uniform(UInt32(chars.characters.count)))
let start = chars.index(chars.startIndex, offsetBy: idx)
let end = chars.index(chars.startIndex, offsetBy: idx + 1)
let range: Range = start..<end
Why isn't it good enough for Swift for me to simply create a range with 2 integers and the half-open range operator? (..<)
Quite the contrast to "swift", in javascript I can simply do chars.substr(idx, 1)
I suggest converting your String to [Character] so that you can index it easily with Int:
func makeToken(length: Int) -> String {
let chars = Array("abcdefghijklmnopqrstuvwxyz0123456789!?##$%ABCDEFGHIJKLMNOPQRSTUVWXYZ".characters)
var result = ""
for _ in 0..<length {
let idx = Int(arc4random_uniform(UInt32(chars.count)))
result += String(chars[idx])
}
return result
}
Swift takes great care to provide a fully Unicode-compliant, type-safe, String abstraction.
Indexing a given Character, in an arbitrary Unicode string, is far from a trivial task. Each Character is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character. In particular, hiding all this complexity behind a simple Int based indexing scheme might result in the wrong performance mental model for programmers.
Having said that, you can always convert your string to a Array<Character> once for easy (and fast!) indexing. For instance:
let chars: String = "abcdefghijklmnop"
var charsArray = Array(chars.characters)
...
let resultingString = String(charsArray)

Fit Swift string in database VARCHAR(255)

I'm trying to get a valid substring of at most 255 UTF8 code units from a Swift string (the idea is to be able to store it an a database VARCHAR(255) field).
The standard way of getting a substring is this :
let string: String = "Hello world!"
let startIndex = string.startIndex
let endIndex = string.startIndex.advancedBy(255, limit: string.endIndex)
let databaseSubstring1 = string[startIndex..<endIndex]
But obviously that would give me a string of 255 characters that may require more than 255 bytes in UTF8 representation.
For UTF8 I can write this :
let utf8StartIndex = string.utf8.startIndex
let utf8EndIndex = utf8StartIndex.advancedBy(255, limit: string.utf8.endIndex)
let databaseSubstringUTF8View = name.utf8[utf8StartIndex..<utf8EndIndex]
let databaseSubstring2 = String(databaseSubstringUTF8View)
But I run the risk of having half a character at the end, which means my UTF8View would not be a valid UTF8 sequence.
And as expected databaseSubstring2 is an optional string because the initializer can fail (it is defined as public init?(_ utf8: String.UTF8View)).
So I need some way of stripping invalid UTF8 code points at the end, or – if possible – a builtin way of doing what I'm trying to do here.
EDIT
Turns out that databases understand characters, so I should not try to count UTF8 code units, but rather how many characters the database will count in my string (which will probably depend on the database).
According to #OOPer, MySQL counts characters as UTF-16 code units. I have come up with the following implementation :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of UTF-16 code units
while (string.utf16.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
The idea is to count UTF-16 code units, but remove characters from the end (that is what Swift think what a character is).
EDIT 2
Still according to #OOPer, Posgresql counts characters as unicode scalars, so this should probably work :
private func databaseStringForString(string: String, maxLength: Int = 255) -> String
{
// Start by clipping to 255 characters
let startIndex = string.startIndex
let endIndex = startIndex.advancedBy(maxLength, limit: string.endIndex)
var string = string[startIndex..<endIndex]
// Remove characters from the end one by one until we have less than
// the maximum number of Unicode Scalars
while (string.unicodeScalars.count > maxLength) {
let startIndex = string.startIndex
let endIndex = string.endIndex.advancedBy(-1, limit: startIndex)
string = string[startIndex..<endIndex]
}
return string
}
As I write in my comment, you may need your databaseStringForString(_:maxLength:) to truncate your string to match the length limit of your DBMS. PostgreSQL with utf8, MySQL with utf8mb4.
And I would write the same functionality as your EDIT 2:
func databaseStringForString(string: String, maxUnicodeScalarLength: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].unicodeScalars.count > maxUnicodeScalarLength {
return string[start..<index]
}
}
return string
}
This may be less efficient, but a little bit shorter.
let s = "abc\u{1D122}\u{1F1EF}\u{1F1F5}" //->"abc𝄢🇯🇵"
let dbus = databaseStringForString(s, maxUnicodeScalarLength: 5) //->"abc𝄢"(=="abc\u{1D122}")
So, someone who works with MySQL with utf8(=utf8mb3) needs something like this:
func databaseStringForString(string: String, maxUTF16Length: Int = 255) -> String {
let start = string.startIndex
for index in start..<string.endIndex {
if string[start..<index.successor()].utf16.count > maxUTF16Length {
return string[start..<index]
}
}
return string
}
let dbu16 = databaseStringForString(s, maxUTF16Length: 4) //->"abc"