NSRegularExpressions - Non Capture Group not working - swift

Hello I am having trouble using the Non-Capture group feature of regex in NSRegularExpressions
Heres some code to capture matches:
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex);
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text));
return results.map {
String(text[Range($0.range, in: text)!]);
};
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return [];
};
};
So now moving onto the regex, I have a string of text that is in the form workcenter:WDO-POLD should be very easy to make this work but the regex string ((?:workcenter:)(.{0,20})) does not return what I need
I get no errors on running but I get a return of the same string that I input - I am trying to retrieve the value that would be after workcenter: which is (.{0,20})

The first problem is with your regular expression. You do not want the outer capture group. Change your regular expression to:
(?:workcenter:)(.{0,20}) <-- outer capture group removed
The next problem is with how you are doing the mapping. You are accessing the full range of the match and not the desired capture group. Since you have a generalized function for handling any regular expression, it's hard to deal with all possibilities but the following change solves your immediate example and should work with regular expressions that have no capture group as well as those with one capture group.
Update your mapping line to:
return results.map {
regex.numberOfCaptureGroups == 0 ?
String(text[Range($0.range, in: text)!]) :
String(text[Range($0.range(at: 1), in: text)!])
}
This checks how many capture groups are in your regular expression. If none, it returns the full match. But if there is 1 or more, it returns just the value of the first capture group.
You can also get your original mapping to work if you change your regular expression to:
(?<=workcenter:)(.{0,20})
There's a much simpler solution here.
You have a lot of extra groups. Remove the outermost and no need for the non-capture group. Just use workcenter:(.{0,20}). Then you can reference the desired capture group with $1.
And no need for NSRegularExpression in this case. Use a simple string replacement.
let str = "workcenter:WDO-POLD"
let res = str.replacingOccurrences(of: "workcenter:(.{0,20})", with: "$1", options: .regularExpression)
This gives WDO-POLD.

Related

Regular expression in Swift

I am trying to parse a string with a regex, I am getting some problems trying to extract all the information in substrings. I am almost done, but I am stacked at this point:
For a string like this:
[00/0/00, 00:00:00] User: This is the message text and any other stuff
I can parse Date, User and Message in Swift with this code:
let line = "[00/0/00, 00:00:00] User: This is the message text and any other stuff"
let result = line.match("(.+)\\s([\\S ]*):\\s(.*\n(?:[^-]*)*|.*)$")
extension String {
func match(_ regex: String) -> [[String]] {
let nsString = self as NSString
return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, count)).map { match in
(0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
} ?? []
}
}
The resulting array is something like this:
[["[00/0/00, 00:00:00] User: This is the message text and any other stuff","[00/0/00, 00:00:00]","User","This is the message text and any other stuff"]]
Now my problem is this, if the message has a ':' on it, the resulting array is not following the same format and breaks the parsing function.
So I think I am missing some cases in the regex, can anyone help me with this? Thanks in advance.
In the pattern, you are making use of parts that are very broad matches.
For example, .+ will first match until the end of the line, [\\S ]* will match either a non whitespace char or a space and [^-]* matches any char except a -
The reason it could potentially break is that the broad matches first match until the end of the string. As a single : is mandatory in your pattern, it will backtrack from the end of the string until it can match a : followed by a whitespace, and then tries to match the rest of the pattern.
Adding another : in the message part, may cause the backtracking to stop earlier than you would expect making the message group shorter.
You could make the pattern a bit more precise, so that the last part can also contain : without breaking the groups.
(\[[^][]*\])\s([^:]*):\s(.*)$
(\[[^][]*\]) Match the part from an opening till closing square bracket [...] in group 1
\s Match a whitespace char
([^:]*): Match any char except : in group 2, then match the expected :
\s(.*) Match a whitespace char, and capture 0+ times any char in group 3
$ End of string
Regex demo

Disregard letter when finding substring - Swift

How do you make a letter count as another when finding a substring in Swift? For example, I have a string:
I like to eat apples
But I want to be able to make it where all instances of 'p' could be written as 'b'.
If the user searches "abbles", it should still return the substring "apples" from the quote. I have this issue because I want whenever a user searches
اكل
But the quote contains
أكل
it would return that value. I tried fullString.range(of: string, options: [.diacriticInsensitive, .caseInsensitive] but this does not fix it since the "ء" are not diacritics, so أ إ ا all behave differently when they should all be the same. Users only use ا. How do I make it count for أ and إ without replacing all instances of them with ا?
You could add a small String extension that uses simple regular expressions (as matt suggested in the comments) to do the actual matching. Like so:
extension String {
func contains(substring: String, disregarding: [String]) -> Bool {
var escapedPattern = NSRegularExpression.escapedPattern(for: substring)
for string in disregarding {
let replacement = String(repeating: ".", count: string.count)
escapedPattern = escapedPattern.replacingOccurrences(of: string, with: replacement)
}
let regEx = ".*" + escapedPattern + ".*"
return self.range(of: regEx,
options: .regularExpression) != nil
}
}
Example output:
"I like apples".contains(substring: "apples", disregarding: ["p"]) //true
"I like abbles".contains(substring: "apples", disregarding: ["p"]) //true
"I like oranges".contains(substring: "apples", disregarding: ["p"]) //false

Regular expressions in swift

I'm bit confused by NSRegularExpression in swift, can any one help me?
task:1 given ("name","john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets.
task:2 given ("name"," john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets and extra spaces and finally get array of strings.
task:3 given key = value // comment
then I should get ["key","value","comment"]. Here I should get only strings in the line by avoiding = and //
I have tried below code for task 1 but not passed.
let string = "(name,john,string for user name)"
let pattern = "(?:\\w.*)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
Thanks in advance.
RegEx in Swift
These posts might help you to explore regular expressions in swift:
Does a string match a pattern?
Swift extract regex matches
How can I use String slicing subscripts in Swift 4?
How to use regex with Swift?
Swift 3 - How do I extract captured groups in regular expressions?
How to group search regular expressions using swift?
Task 1 & 2
This expression might help you to match your desired outputs for both Task 1 and 2:
"(\s+)?([a-z\s]+?)(\s+)?"
Based on Rob's advice, you could much reduce the boundaries, such as the char list [a-z\s]. For example, here, we can also use:
"(\s+)?(.*?)(\s+)?"
or
"(\s+)?(.+?)(\s+)?"
to simply pass everything in between two " and/or space.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /"(\s+)?([a-z\s]+?)(\s+)?"/gm;
const str = `"name","john","name of john"
"name"," john","name of john"
" name "," john","name of john "
" name "," john"," name of john "`;
const subst = `\n$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Task 3
This expression might help you to design an expression for the third task:
(.*?)([a-z\s]+)(.*?)
const regex = /(.*?)([a-z\s]+)(.*?)/gm;
const str = `key = value // comment
key = value with some text // comment`;
const subst = `$2,`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Separate the string by non alpha numeric characters except white spaces. Then trim the elements with white spaces.
extension String {
func words() -> [String] {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted.subtracting(.whitespaces))
.filter({ !$0.isEmpty })
.map({ $0.trimmingCharacters(in: .whitespaces) })
}
}
let string1 = "(name,john,string for user name)"
let string2 = "(name, john,name of john)"
let string3 = "key = value // comment"
print(string1.words())//["name", "john", "string for user name"]
print(string2.words())//["name", "john", "name of john"]
print(string3.words())//["key", "value", "comment"]
Here I have done with after understanding all of above comments.
let text = """
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial
"""
extension String {
func rex (_ expr : String)->[String] {
return try! NSRegularExpression(pattern: expr, options: [.caseInsensitive])
.matches(in: self, options: [], range: NSRange(location: 0, length: self.count))
.map {
String(self[Range($0.range, in: self)!])
}
}
}
let r = text.rex("(?:\\w+-\\w+)") // pass any rex
A single pattern, works for test:1...3, in Swift.
let string =
//"(name,john,string for user name)" //test:1
//#"("name"," john","name of john")"# //test:2
"key = value // comment" //test:3
let pattern = #"(?:\w+)(?:\s+\w+)*"# //Swift 5+ only
//let pattern = "(?:\\w+)(?:\\s+\\w+)*"
do {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: string, range: NSRange(0..<string.utf16.count))
let matchingWords = matches.map {
String(string[Range($0.range, in: string)!])
}
print(matchingWords) //(test:3)->["key", "value", "comment"]
} catch {
print("Regex was bad!")
}
Let’s consider:
let string = "(name,José,name is José)"
I’d suggest a regex that looks for strings where:
It’s the substring either after the ( at the start of the full string or after a comma, i.e., look behind assertion of (?<=^\(|,);
It’s the substring that does not contain , within it, i.e., [^,]+?;
It’s the substring that is terminated by either a comma or ) at the end of the full string, i.e., look ahead assertion of (?=,|\)$), and
If you want to have it skip white space before and after the substrings, throw in the \s*+, too.
Thus:
let pattern = #"(?<=^\(|,)\s*+([^,]+?)\s*+(?=,|\)$)"#
let regex = try! NSRegularExpression(pattern: pattern)
regex.enumerateMatches(in: string, range: NSRange(string.startIndex..., in: string)) { match, _, _ in
if let nsRange = match?.range(at: 1), let range = Range(nsRange, in: string) {
let substring = String(string[range])
// do something with `substring` here
}
}
Note, I’m using the Swift 5 extended string delimiters (starting with #" and ending with "#) so that I don’t have to escape my backslashes within the string. If you’re using Swift 4 or earlier, you’ll want to escape those back slashes:
let pattern = "(?<=^\\(|,)\\s*+([^,]+?)\\s*+(?=,|\\)$)"

how to run multiples NSRegularExpression once

I have a bunch of NSRegularExpression and I want to run it once. Anyone knows how to do it ?
For the moment I do it in a .forEach, for performance reasons I do not think this is the best idea
Each NSRegularExpression needs to match a different pattern, after the matching I need to deal with each different kind of match. As example if I match with the first regex in my array I need to make something different from the second etc...
let test: String = "Stuff"
let range: NSRange = // a range
var regexes = [NSRegularExpression] = // all of my regexes
regexes.forEach { $0.matches(in: text, options: [], range: range) }
Thanks for you help
You may be able to evaluate several regular expressions as one if you concatenate them using capture groups and an OR expressions.
If you want to search for: language, Objective-C and Swift strings you should use a pattern like this: (language)|(Objective-C)|(Swift). Each capture group has an order number, so if language is found in the source string the match object provides the index number.
You can used the code in this playground sample:
import Foundation
let sourceString: String = "Swift is a great language to program, but don't forget Objective-C."
let expresions = [ "language", // Expression 0
"Objective-C", // Expression 1
"Swift" // Expression 2
]
let pattern = expresions
.map { "(\($0))" }
.joined(separator: "|") // pattern is defined as : (language)|(Objective-C)|(Swift)
let regex = try? NSRegularExpression(pattern: pattern, options: [])
let matches = regex?.matches(in: sourceString, options: [],
range: NSRange(location: 0, length: sourceString.utf16.count))
let results = matches?.map({ (match) -> (Int, String) in // Array of type (Int: String) which
// represents index of expression and
// string capture
let index = (1...match.numberOfRanges-1) // Go through all ranges to test which one was used
.map{ Range(match.range(at: $0), in: sourceString) != nil ? $0 : nil }
.compactMap { $0 }.first! // Previous map return array with nils and just one Int
// with the correct position, lets apply compactMap to
// get just this number
let foundString = String(sourceString[Range(match.range(at: 0), in: sourceString)!])
let position = match.range(at: 0).location
let niceReponse = "\(foundString) [position: \(position)]"
return (index - 1, niceReponse) // Let's substract 1 to index in order to match zero based array index
})
print("Matches: \(results?.count ?? 0)\n")
results?.forEach({ result in
print("Group \(result.0): \(result.1)")
})
If you run it the result is:
How many matches: 3
Expression 2: Swift [position: 0]
Expression 0: language [position: 17]
Expression 1: Objective-C [position: 55]
I hope I understood correctly your question and this code helps you.

Cannot find Substring "n't"

I am trying to determine whether an input string contains "n't" or "not".
For example, if the input were:
let part = "Hi, I can't be found!"
I want to find the presence of the negation.
I have tried input.contains, .range, and NSRegularExpression. All of these succeed in finding "not", but fail to find "n't". I have tried escaping the character as well.
'//REGEX:
let negationPattern = "(?:n't|[Nn]ot)"
do {
let regex = try NSRegularExpression(pattern: negationPattern)
let results = regex.matches(in: text,range: NSRange(part.startIndex..., in: part))
print("results are \(results)")
negation = (results.count > 0)
} catch let error {
print("invalid regex: \(error.localizedDescription)")
}
//.CONTAINS
if part.contains("not") || part.contains("n't"){
print("negation present in part")
negation = true
}
//.RANGE (showing .regex option; also tried without)
if part.lowercased().range(of:"not", options: .regularExpression) != nil || part.lowercased().range(of:"n't", options: .regularExpression) != nil {
print("negation present in part")
negation = true
}
Here is a picture:
This is a bit tricky, and the screenshot is actually what gives it away: your regex pattern has a plain single quote in it, but the input text has a "smart" or "curly" apostrophe in it. The difference is subtle:
Regular: '
Smart: ’
Lots of text fields will automatically replace regular single quotes with "smart" apostrophes when they think it's appropriate. Your regex, however, only matches the plain single quote, as evidenced by this tiny test:
func isNegation(input text: String) -> Bool {
let negationPattern = "(?:n't|[Nn]ot)"
let regex = try! NSRegularExpression(pattern: negationPattern)
let matches = regex.matches(in: text,range: NSRange(text.startIndex..., in: text))
return matches.count > 0
}
for input in ["not", "n't", "n’t"] {
print("\"\(input)\" is negation: \(isNegation(input: input) ? "YES" : "NO")")
}
This prints:
"not" is negation: YES
"n't" is negation: YES
"n’t" is negation: NO
If you want to continue using a regex for this problem, you'll need to modify it to match this kind of punctuation character, and avoid assuming all your input text includes "plain" single quotes.