how to run multiples NSRegularExpression once - swift

I have a bunch of NSRegularExpression and I want to run it once. Anyone knows how to do it ?
For the moment I do it in a .forEach, for performance reasons I do not think this is the best idea
Each NSRegularExpression needs to match a different pattern, after the matching I need to deal with each different kind of match. As example if I match with the first regex in my array I need to make something different from the second etc...
let test: String = "Stuff"
let range: NSRange = // a range
var regexes = [NSRegularExpression] = // all of my regexes
regexes.forEach { $0.matches(in: text, options: [], range: range) }
Thanks for you help

You may be able to evaluate several regular expressions as one if you concatenate them using capture groups and an OR expressions.
If you want to search for: language, Objective-C and Swift strings you should use a pattern like this: (language)|(Objective-C)|(Swift). Each capture group has an order number, so if language is found in the source string the match object provides the index number.
You can used the code in this playground sample:
import Foundation
let sourceString: String = "Swift is a great language to program, but don't forget Objective-C."
let expresions = [ "language", // Expression 0
"Objective-C", // Expression 1
"Swift" // Expression 2
]
let pattern = expresions
.map { "(\($0))" }
.joined(separator: "|") // pattern is defined as : (language)|(Objective-C)|(Swift)
let regex = try? NSRegularExpression(pattern: pattern, options: [])
let matches = regex?.matches(in: sourceString, options: [],
range: NSRange(location: 0, length: sourceString.utf16.count))
let results = matches?.map({ (match) -> (Int, String) in // Array of type (Int: String) which
// represents index of expression and
// string capture
let index = (1...match.numberOfRanges-1) // Go through all ranges to test which one was used
.map{ Range(match.range(at: $0), in: sourceString) != nil ? $0 : nil }
.compactMap { $0 }.first! // Previous map return array with nils and just one Int
// with the correct position, lets apply compactMap to
// get just this number
let foundString = String(sourceString[Range(match.range(at: 0), in: sourceString)!])
let position = match.range(at: 0).location
let niceReponse = "\(foundString) [position: \(position)]"
return (index - 1, niceReponse) // Let's substract 1 to index in order to match zero based array index
})
print("Matches: \(results?.count ?? 0)\n")
results?.forEach({ result in
print("Group \(result.0): \(result.1)")
})
If you run it the result is:
How many matches: 3
Expression 2: Swift [position: 0]
Expression 0: language [position: 17]
Expression 1: Objective-C [position: 55]
I hope I understood correctly your question and this code helps you.

Related

Split String or Substring with Regex pattern in Swift

First let me point out... I want to split a String or Substring with any character that is not an alphabet, a number, # or #. That means, I want to split with whitespaces(spaces & line breaks) and special characters or symbols excluding # and #
In Android Java, I am able to achieve this with:
String[] textArr = text.split("[^\\w_##]");
Now, I want to do the same in Swift. I added an extension to String and Substring classes
extension String {}
extension Substring {}
In both extensions, I added a method that returns an array of Substring
func splitWithRegex(by regexStr: String) -> [Substring] {
//let string = self (for String extension) | String(self) (for Substring extension)
let regex = try! NSRegularExpression(pattern: regexStr)
let range = NSRange(string.startIndex..., in: string)
return regex.matches(in: string, options: .anchored, range: range)
.map { match -> Substring in
let range = Range(match.range(at: 1), in: string)!
return string[range]
}
}
And when I tried to use it, (Only tested with a Substring, but I also think String will give me the same result)
let textArray = substring.splitWithRegex(by: "[^\\w_##]")
print("substring: \(substring)")
print("textArray: \(textArray)")
This is the out put:
substring: This,is a #random #text written for debugging
textArray: []
Please can Someone help me. I don't know if the problem if from my regex [^\\w_##] or from splitWithRegex method
The main reason why the code doesn't work is range(at: 1) which returns the content of the first captured group, but the pattern does not capture anything.
With just range the regex returns the ranges of the found matches, but I suppose you want the characters between.
To accomplish that you need a dynamic index starting at the first character. In the map closure return the string from the current index to the lowerBound of the found range and set the index to its upperBound. Finally you have to add manually the string from the upperBound of the last match to the end.
The Substring type is a helper type for slicing strings. It should not be used beyond a temporary scope.
extension String {
func splitWithRegex(by regexStr: String) -> [String] {
guard let regex = try? NSRegularExpression(pattern: regexStr) else { return [] }
let range = NSRange(startIndex..., in: self)
var index = startIndex
var array = regex.matches(in: self, range: range)
.map { match -> String in
let range = Range(match.range, in: self)!
let result = self[index..<range.lowerBound]
index = range.upperBound
return String(result)
}
array.append(String(self[index...]))
return array
}
}
let text = "This,is a #random #text written for debugging"
let textArray = text.splitWithRegex(by: "[^\\w_##]")
print(textArray) // ["This", "is", "a", "#random", "#text", "written", "for", "debugging"]
However in macOS 13 and iOS 16 there is a new API quite similar to the java API
let text = "This,is a #random #text written for debugging"
let textArray = Array(text.split(separator: /[^\w_##]/))
print(textArray)
The forward slashes indicate a regex literal

Regular expressions in swift

I'm bit confused by NSRegularExpression in swift, can any one help me?
task:1 given ("name","john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets.
task:2 given ("name"," john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets and extra spaces and finally get array of strings.
task:3 given key = value // comment
then I should get ["key","value","comment"]. Here I should get only strings in the line by avoiding = and //
I have tried below code for task 1 but not passed.
let string = "(name,john,string for user name)"
let pattern = "(?:\\w.*)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
Thanks in advance.
RegEx in Swift
These posts might help you to explore regular expressions in swift:
Does a string match a pattern?
Swift extract regex matches
How can I use String slicing subscripts in Swift 4?
How to use regex with Swift?
Swift 3 - How do I extract captured groups in regular expressions?
How to group search regular expressions using swift?
Task 1 & 2
This expression might help you to match your desired outputs for both Task 1 and 2:
"(\s+)?([a-z\s]+?)(\s+)?"
Based on Rob's advice, you could much reduce the boundaries, such as the char list [a-z\s]. For example, here, we can also use:
"(\s+)?(.*?)(\s+)?"
or
"(\s+)?(.+?)(\s+)?"
to simply pass everything in between two " and/or space.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /"(\s+)?([a-z\s]+?)(\s+)?"/gm;
const str = `"name","john","name of john"
"name"," john","name of john"
" name "," john","name of john "
" name "," john"," name of john "`;
const subst = `\n$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Task 3
This expression might help you to design an expression for the third task:
(.*?)([a-z\s]+)(.*?)
const regex = /(.*?)([a-z\s]+)(.*?)/gm;
const str = `key = value // comment
key = value with some text // comment`;
const subst = `$2,`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Separate the string by non alpha numeric characters except white spaces. Then trim the elements with white spaces.
extension String {
func words() -> [String] {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted.subtracting(.whitespaces))
.filter({ !$0.isEmpty })
.map({ $0.trimmingCharacters(in: .whitespaces) })
}
}
let string1 = "(name,john,string for user name)"
let string2 = "(name, john,name of john)"
let string3 = "key = value // comment"
print(string1.words())//["name", "john", "string for user name"]
print(string2.words())//["name", "john", "name of john"]
print(string3.words())//["key", "value", "comment"]
Here I have done with after understanding all of above comments.
let text = """
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial
"""
extension String {
func rex (_ expr : String)->[String] {
return try! NSRegularExpression(pattern: expr, options: [.caseInsensitive])
.matches(in: self, options: [], range: NSRange(location: 0, length: self.count))
.map {
String(self[Range($0.range, in: self)!])
}
}
}
let r = text.rex("(?:\\w+-\\w+)") // pass any rex
A single pattern, works for test:1...3, in Swift.
let string =
//"(name,john,string for user name)" //test:1
//#"("name"," john","name of john")"# //test:2
"key = value // comment" //test:3
let pattern = #"(?:\w+)(?:\s+\w+)*"# //Swift 5+ only
//let pattern = "(?:\\w+)(?:\\s+\\w+)*"
do {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: string, range: NSRange(0..<string.utf16.count))
let matchingWords = matches.map {
String(string[Range($0.range, in: string)!])
}
print(matchingWords) //(test:3)->["key", "value", "comment"]
} catch {
print("Regex was bad!")
}
Let’s consider:
let string = "(name,José,name is José)"
I’d suggest a regex that looks for strings where:
It’s the substring either after the ( at the start of the full string or after a comma, i.e., look behind assertion of (?<=^\(|,);
It’s the substring that does not contain , within it, i.e., [^,]+?;
It’s the substring that is terminated by either a comma or ) at the end of the full string, i.e., look ahead assertion of (?=,|\)$), and
If you want to have it skip white space before and after the substrings, throw in the \s*+, too.
Thus:
let pattern = #"(?<=^\(|,)\s*+([^,]+?)\s*+(?=,|\)$)"#
let regex = try! NSRegularExpression(pattern: pattern)
regex.enumerateMatches(in: string, range: NSRange(string.startIndex..., in: string)) { match, _, _ in
if let nsRange = match?.range(at: 1), let range = Range(nsRange, in: string) {
let substring = String(string[range])
// do something with `substring` here
}
}
Note, I’m using the Swift 5 extended string delimiters (starting with #" and ending with "#) so that I don’t have to escape my backslashes within the string. If you’re using Swift 4 or earlier, you’ll want to escape those back slashes:
let pattern = "(?<=^\\(|,)\\s*+([^,]+?)\\s*+(?=,|\\)$)"

How to exit a function where no regex matches

Attempting to create a function that uses regex matches to return an array of NSRange values to use with a UITextView to allow the user to click through the matched words using animation.
I assume the solution is to break out of the function if there is no regex match. I cannot figure out how to do this where the function requires a NSRange value.
Moreover, when there is no match, the regex function matches does not return nil. Instead, it automatically returns an empty array which appears to make the guard statement useless.
Here is the function:
func rangeOfSearchText(searchString: String, UIText: String) -> [NSRange] {
var matches:[NSTextCheckingResult]?
let regex = try! NSRegularExpression(pattern: searchString, options: .caseInsensitive)
matches = regex.matches(in: UIText, options: [], range: NSRange(location: 0, length: UIText.count))
guard let find = matches else {
//return need to find a way to break out of function if nil without returning an NSRange object...
}
var rangeArray:[NSRange] = []
for match in find {
rangeArray.append(match.range(at: 0))
}
return rangeArray
}
let sString = "z"
let longString = "I need a solution."
let test = rangeOfSearchText(searchString: sString, UIText: longString)
The above returns an empty array.

Converting numbers to string in a given string in Swift

I am given a string like 4eysg22yl3kk and my output should be like this:
foureysgtweny-twoylthreekk or if I am given 0123 it should be output as one hundred twenty-three. So basically, as I scan the string, I need to convert numbers to string.
I do not know how to implement this in Swift as I iterate through the string? Any idea?
You actually have two basic problems.
The first is convert a "number" to "spelt out" value (ie 1 to one). This is actually easy to solve, as NumberFormatter has a spellOut style property
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let text = formatter.string(from: NSNumber(value: 1))
which will result in "one", neat.
The other issue though, is how to you separate the numbers from the text?
While I can find any number of solutions for "extract" numbers or characters from a mixed String, I can't find one which return both, split on their boundaries, so, based on your input, we'd end up with ["4", "eysg", "22", "yl", "3", "kk"].
So, time to role our own...
func breakApart(_ text: String, withPattern pattern: String) throws -> [String]? {
do {
let regex = try NSRegularExpression(pattern: "[0-9]+", options: .caseInsensitive)
var previousRange: Range<String.Index>? = nil
var parts: [String] = []
for match in regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.count)) {
guard let range = Range(match.range, in: text) else {
return nil
}
let part = text[range]
if let previousRange = previousRange {
let textRange = Range<String.Index>(uncheckedBounds: (lower: previousRange.upperBound, upper: range.lowerBound))
parts.append(String(text[textRange]))
}
parts.append(String(part))
previousRange = range
}
if let range = previousRange, range.upperBound != text.endIndex {
let textRange = Range<String.Index>(uncheckedBounds: (lower: range.upperBound, upper: text.endIndex))
parts.append(String(text[textRange]))
}
return parts
} catch {
}
return nil
}
Okay, so this is a little "dirty" (IMHO), but I can't seem to think of a better approach, hopefully someone will be kind enough to provide some hints towards one ;)
Basically what it does is uses a regular expression to find all the groups of numbers, it then builds an array, cutting the string apart around the matching boundaries - like I said, it's crude, but it gets the job done.
From there, we just need to map the results, spelling out the numbers as we go...
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let value = "4eysg22yl3kk"
if let parts = try breakApart(value, withPattern: pattern) {
let result = parts.map { (part) -> String in
if let number = Int(part), let text = formatter.string(from: NSNumber(value: number)) {
return text
}
return part
}.joined(separator: " ")
print(result)
}
This will end up printing four eysg twenty-two yl three kk, if you don't want the spaces, just get rid of separator in the join function
I did this in Playgrounds, so it probably needs some cleaning up
I was able to solve my question without dealing with anything extra than converting my String to an array and check char by char. If I found a digit I was saving it in a temp String and as soon as I found out the next char is not digit, I converted my digit to its text.
let inputString = Array(string.lowercased())

Get numbers characters from a string [duplicate]

This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}