Swift Strings and [Character] - swift

I have this code:
let txt = "over 100MB+ of text..."
let tokenizedText = Array (txt)
let regex = try NSRegularExpression (pattern: "(?s)<tu>.*?</tu>")
let r = regex.matches (in: txt, range: NSRange (txt.startIndex..<txt.endIndex, in: txt))
for match in r {
let befOfMatch = match.range.lowerBound
let endOfMatch = match.range.lowerBound + match.range.length
// check the result
if tokenizedText[begOfMatch] != "<" {
print ("error") // from time to time!!!!
}
}
=> regex.matches produces integer ranges that are not always in sync with the characters array.
I know that UTF8 does not have a one-to-one correspondance between bytes and characters, but how to sync Strings and [Characters] ? I would need to:
-- retrieve the sequence of characters inside the matching sequences as [Character]
-- insert a tag (e.g. <found> ... </found>) around each matching sequence in the buffer (string)
How can I do that?

The issue there is that NSRange it is based on UTF16 therefore the location of the resulting NSRange it is not necessarily the same as the character position in the array of characters (Not every character can be represented by a single byte). You need to convert the resulting NSRange to Range and check the original string using the lowerbound of the String Range:
let txt = "over 100MB+ of text... <tu>whatever</tu>"
let tokenizedText = Array (txt)
let regex = try NSRegularExpression (pattern: "(?s)<tu>.*?</tu>")
let r = regex.matches (in: txt, range: NSRange (txt.startIndex..<txt.endIndex, in: txt))
for match in r {
if let range = Range(match.range, in: txt) {
print (txt[range])
if txt[range.lowerBound] == "<" {
print(true)
} else {
print(false)
}
}
}

Related

Split String or Substring with Regex pattern in Swift

First let me point out... I want to split a String or Substring with any character that is not an alphabet, a number, # or #. That means, I want to split with whitespaces(spaces & line breaks) and special characters or symbols excluding # and #
In Android Java, I am able to achieve this with:
String[] textArr = text.split("[^\\w_##]");
Now, I want to do the same in Swift. I added an extension to String and Substring classes
extension String {}
extension Substring {}
In both extensions, I added a method that returns an array of Substring
func splitWithRegex(by regexStr: String) -> [Substring] {
//let string = self (for String extension) | String(self) (for Substring extension)
let regex = try! NSRegularExpression(pattern: regexStr)
let range = NSRange(string.startIndex..., in: string)
return regex.matches(in: string, options: .anchored, range: range)
.map { match -> Substring in
let range = Range(match.range(at: 1), in: string)!
return string[range]
}
}
And when I tried to use it, (Only tested with a Substring, but I also think String will give me the same result)
let textArray = substring.splitWithRegex(by: "[^\\w_##]")
print("substring: \(substring)")
print("textArray: \(textArray)")
This is the out put:
substring: This,is a #random #text written for debugging
textArray: []
Please can Someone help me. I don't know if the problem if from my regex [^\\w_##] or from splitWithRegex method
The main reason why the code doesn't work is range(at: 1) which returns the content of the first captured group, but the pattern does not capture anything.
With just range the regex returns the ranges of the found matches, but I suppose you want the characters between.
To accomplish that you need a dynamic index starting at the first character. In the map closure return the string from the current index to the lowerBound of the found range and set the index to its upperBound. Finally you have to add manually the string from the upperBound of the last match to the end.
The Substring type is a helper type for slicing strings. It should not be used beyond a temporary scope.
extension String {
func splitWithRegex(by regexStr: String) -> [String] {
guard let regex = try? NSRegularExpression(pattern: regexStr) else { return [] }
let range = NSRange(startIndex..., in: self)
var index = startIndex
var array = regex.matches(in: self, range: range)
.map { match -> String in
let range = Range(match.range, in: self)!
let result = self[index..<range.lowerBound]
index = range.upperBound
return String(result)
}
array.append(String(self[index...]))
return array
}
}
let text = "This,is a #random #text written for debugging"
let textArray = text.splitWithRegex(by: "[^\\w_##]")
print(textArray) // ["This", "is", "a", "#random", "#text", "written", "for", "debugging"]
However in macOS 13 and iOS 16 there is a new API quite similar to the java API
let text = "This,is a #random #text written for debugging"
let textArray = Array(text.split(separator: /[^\w_##]/))
print(textArray)
The forward slashes indicate a regex literal

Convert UTF-8 (Bytes) Emoji Code to Emoji icon as a text

I am getting this below string as a response from WS API when they send emoji as a string:
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
I want it to be converted to the emoji icon like this -> Hii 😁
I think so it is coming in UTF-8 Format as explained in the below Image: Image Unicode
I have tried decoding it Online using UTF-8 Decoder
And i got the emoticon Successfully decoded
Before Decoding:
After Decoding:
But the issue here is I do not know how to work with it in Swift.
I referred following link but it did not worked for me.
Swift Encode/decode emojis
Any help would be appreciated.
Thanks.
As you already given the link of converter tool which is clearly doing UTF-8 encoding and decoding. You have UTF-8 encoded string so here is an example of UTF8-Decoding.
Objective-C
const char *ch = [#"Hii \xF0\x9F\x98\x81" cStringUsingEncoding:NSUTF8StringEncoding];
NSString *decode_string = [NSString stringWithUTF8String:ch];
NSLog(#"%#",decode_string);
Output: Hii 😁
Swift
I'm able to convert \\xF0\\x9F\\x98\\x81 to 😁 in SWift.
First I converted the hexa string into Data and then back to String using UTF-8 encoding.
var str = "\\xF0\\x9F\\x98\\x81"
if let data = data(fromHexaStr: str) {
print(String(data: data, encoding: String.Encoding.utf8) ?? "")
}
Output: 😁
Below is the function I used to convert the hexa string into data. I followed this answer.
func data(fromHexaStr hexaStr: String) -> Data? {
var data = Data(capacity: hexaStr.characters.count / 2)
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
regex.enumerateMatches(in: hexaStr, range: NSMakeRange(0, hexaStr.utf16.count)) { match, flags, stop in
let byteString = (hexaStr as NSString).substring(with: match!.range)
var num = UInt8(byteString, radix: 16)!
data.append(&num, count: 1)
}
guard data.count > 0 else { return nil }
return data
}
Note: Problem with above code is it converts hexa string only not combined strings.
FINAL WORKING SOLUTION: SWIFT
I have done this by using for loop instead of [0-9a-f]{1,2} regex because this will also scan 81, 9F, Any Two digits number which is wrong obviously.
For example: I have 81 INR \\xF0\\x9F\\x98\\x81.
/// This line will convert "F0" into hexa bytes
let byte = UInt8("F0", radix: 16)
I made a String extension in which I check upto every 4 characters if it has prefix \x and count 4 and last two characters are convertible into hexa bytes by using radix as mentioned above.
extension String {
func hexaDecoededString() -> String {
var newData = Data()
var emojiStr: String = ""
for char in self.characters {
let str = String(char)
if str == "\\" || str.lowercased() == "x" {
emojiStr.append(str)
}
else if emojiStr.hasPrefix("\\x") || emojiStr.hasPrefix("\\X") {
emojiStr.append(str)
if emojiStr.count == 4 {
/// It can be a hexa value
let value = emojiStr.replacingOccurrences(of: "\\x", with: "")
if let byte = UInt8(value, radix: 16) {
newData.append(byte)
}
else {
newData.append(emojiStr.data(using: .utf8)!)
}
/// Reset emojiStr
emojiStr = ""
}
}
else {
/// Append the data as it is
newData.append(str.data(using: .utf8)!)
}
}
let decodedString = String(data: newData, encoding: String.Encoding.utf8)
return decodedString ?? ""
}
}
USAGE:
var hexaStr = "Hi \\xF0\\x9F\\x98\\x81 81"
print(hexaStr.hexaDecoededString())
Hi 😁 81
hexaStr = "Welcome to SP19!\\xF0\\x9f\\x98\\x81"
print(hexaStr.hexaDecoededString())
Welcome to SP19!😁
I fix your issue but it need more work to make it general , the problem here is that your Emijo is Represented by Hex Byte x9F , so we have to convert this Hex to utf8 then convert it to Data and at last convert data to String
Final result Hii 😁 Please read comment
let strTemp = "Hii \\xF0\\x9F\\x98\\x81"
let regex = try! NSRegularExpression(pattern: "[0-9a-f]{1,2}", options: .caseInsensitive)
// get all matched hex xF0 , x9f,..etc
let matches = regex.matches(in: strTemp, options: [], range: NSMakeRange(0, strTemp.count))
// Data that will hanlde convert hex to UTf8
var emijoData = Data(capacity: strTemp.count / 2)
matches.enumerated().forEach { (offset , check) in
let byteString = (strTemp as NSString).substring(with: check.range)
var num = UInt8(byteString, radix: 16)!
emijoData.append(&num, count: 1)
}
let subStringEmijo = String.init(data: emijoData, encoding: String.Encoding.utf8)!
//now we have your emijo text 😁 we can replace by its code from string using matched ranges `first` and `last`
// All range range of \\xF0\\x9F\\x98\\x81 in "Hii \\xF0\\x9F\\x98\\x81" to replce by your emijo
if let start = matches.first?.range.location, let end = matches.last?.range.location , let endLength = matches.last?.range.length {
let startLocation = start - 2
let length = end - startLocation + endLength
let sub = (strTemp as NSString).substring(with: NSRange.init(location: startLocation, length: length))
print( strTemp.replacingOccurrences(of: sub, with: subStringEmijo))
// Hii 😁
}

Character is not convertible to String

How I can handle this error, I try to get ascii number of every character in string but I can't convert character back to string in order to check symbol whether it necessary?
Here is my code
var n = "KNjNKJbbsibdcjkdcn___*(&0786"
let r = n.characters.count
for i in stride(from: 0, to: r, by: 1) {
let t = n.characters.index(n.startIndex, offsetBy: i)
String?(n[t])
}
In output should be separated character in string type.
This bit of code will convert a string to an array of ASCII characters (excluding those with no ASCII code):
let str = "KNjNKJbbsibdcjkdcn___*(&0786"
let charCodes = str.unicodeScalars
.filter({ $0.isASCII })
.map({ $0.value })
print(charCodes)

How can I substring this string?

how can I substring the next 2 characters of a string after a certian character. For example I have a strings str1 = "12:34" and other like str2 = "12:345. I want to get the next 2 characters after : the colons.
I want a same code that will work for str1 and str2.
Swift's substring is complicated:
let str = "12:345"
if let range = str.range(of: ":") {
let startIndex = str.index(range.lowerBound, offsetBy: 1)
let endIndex = str.index(startIndex, offsetBy: 2)
print(str[startIndex..<endIndex])
}
It is very easy to use str.index() method as shown in #MikeHenderson's answer, but an alternative to that, without using that method is iterating through the string's characters and creating a new string for holding the first two characters after the ":", like so:
var string1="12:458676"
var nr=0
var newString=""
for c in string1.characters{
if nr>0{
newString+=String(c)
nr-=1
}
if c==":" {nr=2}
}
print(newString) // prints 45
Hope this helps!
A possible solution is Regular Expression,
The pattern checks for a colon followed by two digits and captures the two digits:
let string = "12:34"
let pattern = ":(\\d{2})"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: string, range: NSRange(location: 0, length: string.characters.count)) {
print((string as NSString).substring(with: match.rangeAt(1)))
}

Return results from regular expression pattern matching

I have a string (HTML in this example case) which contains the same pattern for displaying the results of sports games. So, the HTML tags are known, but the values for each game are not.
In Perl, we can do this:
if ( $content =~ /\<\/a\>\<br\>(\d+)\<\/span\>\<br\>(\d+)\-(\d+).+\<\/a\>\<br\>(\d+)\<\/span\>\<br\>(\d+)\-(\d+)/) {
$visitingTeamScore = $1; // $1 is the 1st matched digit
$visitingTeamWins = $2; // $2 is the 2nd matched digit
$visitingTeamLosses = $3; // Etc
$homeTeamScore = $4;
$homeTeamWins = $5;
$homeTeamLosses = $6;
}
which returns the digits inside the parentheses, in this case 6 total integers of varying digit lengths. We can then assign those matches to variables.
From an answer in this question: Swift Get string between 2 strings in a string, I have the following Swift code:
extension String {
func sliceFrom(start: String, to: String) -> String? {
return (rangeOfString(start)?.endIndex).flatMap { sInd in
(rangeOfString(to, range: sInd..<endIndex)?.startIndex).map { eInd in
substringWithRange(sInd..<eInd)
}
}
}
}
let firstMatch = content?.sliceFrom("</a><br>", to: "</span>") // The first integer in the string
The problem comes in when getting the 4th integer which is also between </a\><br> and </span> so the resulting match will be the first digit again.
I can manually count the characters (which itself isn't a perfect science because the digits in each integer can differ) to do something ugly like:
let newRawHTML = content![content!.startIndex.advancedBy(15)...content!.startIndex.advancedBy(5)]
Another possibility is to remove anything matched already from the string, making it shorter for each subsequent search (which I'm not sure how to implement.) What's the way to do this? Is there any way in Swift to "pluck out" the matches?
The code you have shown as a Perl example, uses regular expression.
And in case the pattern is getting a little bit complex, you'd better use NSRegularExpression directly.
let pattern = "</a><br>(\\d+)</span><br>(\\d+)-(\\d+).+</a><br>(\\d+)</span><br>(\\d+)-(\\d+)"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatchInString(content, options: [], range: NSRange(0..<content.utf16.count)) {
let visitingTeamScore = (content as NSString).substringWithRange(match.rangeAtIndex(1))
let visitingTeamWins = (content as NSString).substringWithRange(match.rangeAtIndex(2))
let visitingTeamLosses = (content as NSString).substringWithRange(match.rangeAtIndex(3))
let homeTeamScore = (content as NSString).substringWithRange(match.rangeAtIndex(4))
let homeTeamWins = (content as NSString).substringWithRange(match.rangeAtIndex(5))
let homeTeamLosses = (content as NSString).substringWithRange(match.rangeAtIndex(6))
//...use the values
}