Escaping Regex with special characters in Swift - swift

I have a relatively complex regex that I need to run in Swift. Originally was:
"typedef\W+struct\W+{([^}]*)}\W+(\w+);"
You can see the pattern working in JS here.
To make it compile in Swift I escaped the backslashes to:
"typedef\\W+struct\\W+{([^}]*)}\\W+(\\w+);"
On runtime the expression fails to compile with 2048 error. I tried escaping other characters too and tried also escapedPatternForString but without luck. Is there a script to convert JS regexs to Swift? Thanks!

You need to escape both { and } that are outside of a character class:
let rx = "typedef\\W+struct\\W+\\{([^}]*)\\}\\W+(\\w+);"
A quick demo:
let rx = "typedef\\W+struct\\W+\\{([^}]*)\\}\\W+(\\w+);"
let str = "typedef: struct { something } text;"
print(str.range(of: rx, options: .regularExpression) != nil)
// => true
When the { and } are inside a character class they may stay unescaped (as in [^}]).
Using this code (answer by Confused Vorlon), you may get the first match with all capturing groups:
extension NSTextCheckingResult {
func groups(testedString:String) -> [String] {
var groups = [String]()
for i in 0 ..< self.numberOfRanges
{
let group = String(testedString[Range(self.range(at: i), in: testedString)!])
groups.append(group)
}
return groups
}
}
let str = "typedef: struct { something } text;"
let rx = "typedef\\W+struct\\W+\\{([^}]*)\\}\\W+(\\w+);"
let MyRegex = try! NSRegularExpression(pattern: rx)
if let match = MyRegex.firstMatch(in: str, range: NSMakeRange(0, str.count)) {
let groups = match.groups(testedString: str)
print(groups)
}
// => ["typedef: struct { something } text;", " something ", "text"]

Related

Replace in string with regex

I am struggling to modify captured value with regex.
For example, I wanna change "Hello, he is hero" to "HEllo, HE is HEro" using Regex.
I know there are ways to change this without regex, but it is just an example to show the problem. I actually use the regex instead of just he, but I cannot provide it here. That is why using regex is required.
The code below somehow does not work. Are there any ways to make it work?
"Hello, he is hero".replacingOccurrences(
of: #"(he)"#,
with: "$1".uppercased(), // <- uppercased is not applied
options: .regularExpression
)
You need to use your regex in combination with Range (range(of:)) to find matches and then replace each found range separately
Here is a function as an extension to String that does this by using range(of:) starting from the start of the string and then moving the start index to match from forward to after the last match. The actual replacement is done inside a separate function that is passed as an argument
extension String {
func replace(regex: String, with replace: (Substring) -> String) -> String {
var string = self
var startIndex = self.startIndex
let endIndex = self.endIndex
while let range = string.range(of: regex, options: [.regularExpression] , range: startIndex..<endIndex) {
if range.isEmpty {
startIndex = string.index(startIndex, offsetBy: 1)
if startIndex >= endIndex { break }
continue
}
string.replaceSubrange(range, with: replace(string[range]))
startIndex = range.upperBound
}
return string
}
}
Example where we do an case insensitive search for words starting with "he" and replace each match with the uppercased version
let result = "Hello, he is hero. There he is".replace(regex: #"(?i)\bhe"#) {
$0.uppercased()
}
Output
HEllo, HE is HEro. There HE is
You can try NSRegularExpression. Something like:
import Foundation
var sourceStr = "Hello, he is hero"
let regex = try! NSRegularExpression(pattern: "(he)")
let matches = regex.matches(in: sourceStr, range: NSRange(sourceStr.startIndex..., in: sourceStr))
regex.enumerateMatches(in: sourceStr, range: NSRange(sourceStr.startIndex..., in: sourceStr)) { (match, _, _) in
guard let match = match else { return }
guard let range = Range(match.range, in: sourceStr) else { return }
let sub = sourceStr[range]
sourceStr = sourceStr.replacingOccurrences(of: sub, with: sub.uppercased(), options: [], range: range)
}
print(sourceStr)
this is the solution i can provide
var string = "Hello, he is hero"
let occurrence = "he"
string = string.lowercased().replacingOccurrences(
of: occurrence,
with: occurrence.uppercased(),
options: .regularExpression
)
print(string)

Swift - Getting only AlphaNumeric Characters from String

I'm trying to create an internal function for the String class to get only AlphaNumeric characters and return a string. I'm running into a few errors with how to convert the matches back into a string using Regex. Can someone tell me how to fix the code or if there's an easier way?
I want something like this
let testString = "_<$abc$>_"
let alphaNumericString = testString.alphaNumeric() //abc
So far I have:
extension String {
internal func alphaNumeric() -> String {
let regex = try? NSRegularExpression(pattern: "[^a-z0-9]", options: .caseInsensitive)
let string = self as NSString
let results = regex?.matches(in: self, options: [], range: NSRange(location: 0, length: string.length))
let matches = results.map {
String(self[Range($0.range, in: self)!])
}
return matches.join()
}
}
You may directly use replacingOccurrences (that removes all non-overlapping matches from the input string) with [^A-Za-z0-9]+ pattern:
let str = "_<$abc$>_"
let pattern = "[^A-Za-z0-9]+"
let result = str.replacingOccurrences(of: pattern, with: "", options: [.regularExpression])
print(result) // => abc
The [^A-Za-z0-9]+ pattern is a negated character class that matches any char but the ones defined in the class, one or more occurrences (due to + quantifier).
See the regex demo.
Try below extension:
extension String {
var alphanumeric: String {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted).joined().lowercased()
}
}
Usage: print("alphanumeric :", "_<$abc$>_".alphanumeric)
Output : abc
You can also use characterset for this like
extension String {
var alphaNumeric: String {
components(separatedBy: CharacterSet.alphanumerics.inverted).joined()
}
}

Trim only trailing whitespace from end of string in Swift 3

Every example of trimming strings in Swift remove both leading and trailing whitespace, but how can only trailing whitespace be removed?
For example, if I have a string:
" example "
How can I end up with:
" example"
Every solution I've found shows trimmingCharacters(in: CharacterSet.whitespaces), but I want to retain the leading whitespace.
RegEx is a possibility, or a range can be derived to determine index of characters to remove, but I can't seem to find an elegant solution for this.
With regular expressions:
let string = " example "
let trimmed = string.replacingOccurrences(of: "\\s+$", with: "", options: .regularExpression)
print(">" + trimmed + "<")
// > example<
\s+ matches one or more whitespace characters, and $ matches
the end of the string.
In Swift 4 & Swift 5
This code will also remove trailing new lines.
It works based on a Character struct's method .isWhitespace
var trailingSpacesTrimmed: String {
var newString = self
while newString.last?.isWhitespace == true {
newString = String(newString.dropLast())
}
return newString
}
This short Swift 3 extension of string uses the .anchored and .backwards option of rangeOfCharacter and then calls itself recursively if it needs to loop. Because the compiler is expecting a CharacterSet as the parameter, you can just supply the static when calling, e.g. "1234 ".trailing(.whitespaces) will return "1234". (I've not done timings, but would expect faster than regex.)
extension String {
func trailingTrim(_ characterSet : CharacterSet) -> String {
if let range = rangeOfCharacter(from: characterSet, options: [.anchored, .backwards]) {
return self.substring(to: range.lowerBound).trailingTrim(characterSet)
}
return self
}
}
In Foundation you can get ranges of indices matching a regular expression. You can also replace subranges. Combining this, we get:
import Foundation
extension String {
func trimTrailingWhitespace() -> String {
if let trailingWs = self.range(of: "\\s+$", options: .regularExpression) {
return self.replacingCharacters(in: trailingWs, with: "")
} else {
return self
}
}
}
You can also have a mutating version of this:
import Foundation
extension String {
mutating func trimTrailingWhitespace() {
if let trailingWs = self.range(of: "\\s+$", options: .regularExpression) {
self.replaceSubrange(trailingWs, with: "")
}
}
}
If we match against \s* (as Martin R. did at first) we can skip the if let guard and force-unwrap the optional since there will always be a match. I think this is nicer since it's obviously safe, and remains safe if you change the regexp. I did not think about performance.
Handy String extension In Swift 4
extension String {
func trimmingTrailingSpaces() -> String {
var t = self
while t.hasSuffix(" ") {
t = "" + t.dropLast()
}
return t
}
mutating func trimmedTrailingSpaces() {
self = self.trimmingTrailingSpaces()
}
}
Swift 4
extension String {
var trimmingTrailingSpaces: String {
if let range = rangeOfCharacter(from: .whitespacesAndNewlines, options: [.anchored, .backwards]) {
return String(self[..<range.lowerBound]).trimmingTrailingSpaces
}
return self
}
}
Demosthese's answer is a useful solution to the problem, but it's not particularly efficient. This is an upgrade to their answer, extending StringProtocol instead, and utilizing Substring to remove the need for repeated copying.
extension StringProtocol {
#inline(__always)
var trailingSpacesTrimmed: Self.SubSequence {
var view = self[...]
while view.last?.isWhitespace == true {
view = view.dropLast()
}
return view
}
}
No need to create a new string when dropping from the end each time.
extension String {
func trimRight() -> String {
String(reversed().drop { $0.isWhitespace }.reversed())
}
}
This operates on the collection and only converts the result back into a string once.
It's a little bit hacky :D
let message = " example "
var trimmed = ("s" + message).trimmingCharacters(in: .whitespacesAndNewlines)
trimmed = trimmed.substring(from: trimmed.index(after: trimmed.startIndex))
Without regular expression there is not direct way to achieve that.Alternatively you can use the below function to achieve your required result :
func removeTrailingSpaces(with spaces : String) -> String{
var spaceCount = 0
for characters in spaces.characters{
if characters == " "{
print("Space Encountered")
spaceCount = spaceCount + 1
}else{
break;
}
}
var finalString = ""
let duplicateString = spaces.replacingOccurrences(of: " ", with: "")
while spaceCount != 0 {
finalString = finalString + " "
spaceCount = spaceCount - 1
}
return (finalString + duplicateString)
}
You can use this function by following way :-
let str = " Himanshu "
print(removeTrailingSpaces(with : str))
One line solution with Swift 4 & 5
As a beginner in Swift and iOS programming I really like #demosthese's solution above with the while loop as it's very easy to understand. However the example code seems longer than necessary. The following uses essentially the same logic but implements it as a single line while loop.
// Remove trailing spaces from myString
while myString.last == " " { myString = String(myString.dropLast()) }
This can also be written using the .isWhitespace property, as in #demosthese's solution, as follows:
while myString.last?.isWhitespace == true { myString = String(myString.dropLast()) }
This has the benefit (or disadvantage, depending on your point of view) that this removes all types of whitespace, not just spaces but (according to Apple docs) also including newlines, and specifically the following characters:
“\t” (U+0009 CHARACTER TABULATION)
“ “ (U+0020 SPACE)
U+2029 PARAGRAPH SEPARATOR
U+3000 IDEOGRAPHIC SPACE
Note: Even though .isWhitespace is a Boolean it can't be used directly in the while loop as it ends up being optional ? due to the chaining of the optional .last property, which returns nil if the String (or collection) is empty. The == true logic gets around this since nil != true.
I'd love to get some feedback on this, esp. in case anyone sees any issues or drawbacks with this simple single line approach.
Swift 5
extension String {
func trimTrailingWhiteSpace() -> String {
guard self.last == " " else { return self }
var tmp = self
repeat {
tmp = String(tmp.dropLast())
} while tmp.last == " "
return tmp
}
}

Building Swift / ObjC regular expression

In a text string, I am trying to fetch everything between
[DATA FORMAT] and /DATA FORMAT]
and
Columns Format: and /DATA FORMAT]
to this goal I use regular expressions.
While the pattern
"\\[DATA FORMAT\\](.*?)\\[/DATA FORMAT\\]"
works as expected, the pattern
"Columns Format(*.?)\\[/DATA FORMAT\\]"
gives an error
Optional("The value “Columns Format(*.?)\\[/DATA FORMAT\\]” is invalid.")
The value “Columns Format(*.?)\[/DATA FORMAT\]” is invalid.
printed in the console (first line: localizedFailureReason, second line:localizedDescription)
What did I miss ?
Code :
extension String
{
func match (pattern: String,
options: RegularExpression.Options = [.caseInsensitive, .dotMatchesLineSeparators]) -> [String]
{
do
{
let regex = try RegularExpression(pattern: pattern, options: options)
let regexAnsw = regex.matches(in: self, options: RegularExpression.MatchingOptions.withTransparentBounds, range: NSMakeRange(0, self.characters.count))
var retStrings = [String]()
for rs in regexAnsw
{
if let range = self.range(from: rs.range)
{
retStrings.append(self.substring(with: range))
}
else
{
print("match: cant' convert NSRange to range")
}
}
return retStrings
}
catch let error as NSError
{
print(error.localizedFailureReason)
print(error.localizedDescription)
return [String]()
}
}
}
You have your * and . swapped in the second regex (in the capture group right after "Column Format"). This makes the regex invalid; the * isn't referring to anything.

Remove all non-numeric characters from a string in swift

I have the need to parse some unknown data which should just be a numeric value, but may contain whitespace or other non-alphanumeric characters.
Is there a new way of doing this in Swift? All I can find online seems to be the old C way of doing things.
I am looking at stringByTrimmingCharactersInSet - as I am sure my inputs will only have whitespace/special characters at the start or end of the string. Are there any built in character sets I can use for this? Or do I need to create my own?
I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep
I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep.
You can either use trimmingCharacters with the inverted character set to remove characters from the start or the end of the string. In Swift 3 and later:
let result = string.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
Or, if you want to remove non-numeric characters anywhere in the string (not just the start or end), you can filter the characters, e.g. in Swift 4.2.1:
let result = string.filter("0123456789.".contains)
Or, if you want to remove characters from a CharacterSet from anywhere in the string, use:
let result = String(string.unicodeScalars.filter(CharacterSet.whitespaces.inverted.contains))
Or, if you want to only match valid strings of a certain format (e.g. ####.##), you could use regular expression. For example:
if let range = string.range(of: #"\d+(\.\d*)?"#, options: .regularExpression) {
let result = string[range] // or `String(string[range])` if you need `String`
}
The behavior of these different approaches differ slightly so it just depends on precisely what you're trying to do. Include or exclude the decimal point if you want decimal numbers, or just integers. There are lots of ways to accomplish this.
For older, Swift 2 syntax, see previous revision of this answer.
let result = string.stringByReplacingOccurrencesOfString("[^0-9]", withString: "", options: NSStringCompareOptions.RegularExpressionSearch, range:nil).stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
Swift 3
let result = string.replacingOccurrences( of:"[^0-9]", with: "", options: .regularExpression)
You can upvote this answer.
I prefer this solution, because I like extensions, and it seems a bit cleaner to me. Solution reproduced here:
extension String {
var digits: String {
return components(separatedBy: CharacterSet.decimalDigits.inverted)
.joined()
}
}
You can filter the UnicodeScalarView of the string using the pattern matching operator for ranges, pass a UnicodeScalar ClosedRange from 0 to 9 and initialise a new String with the resulting UnicodeScalarView:
extension String {
private static var digits = UnicodeScalar("0")..."9"
var digits: String {
return String(unicodeScalars.filter(String.digits.contains))
}
}
"abc12345".digits // "12345"
edit/update:
Swift 4.2
extension RangeReplaceableCollection where Self: StringProtocol {
var digits: Self {
return filter(("0"..."9").contains)
}
}
or as a mutating method
extension RangeReplaceableCollection where Self: StringProtocol {
mutating func removeAllNonNumeric() {
removeAll { !("0"..."9" ~= $0) }
}
}
Swift 5.2 • Xcode 11.4 or later
In Swift5 we can use a new Character property called isWholeNumber:
extension RangeReplaceableCollection where Self: StringProtocol {
var digits: Self { filter(\.isWholeNumber) }
}
extension RangeReplaceableCollection where Self: StringProtocol {
mutating func removeAllNonNumeric() {
removeAll { !$0.isWholeNumber }
}
}
To allow a period as well we can extend Character and create a computed property:
extension Character {
var isDecimalOrPeriod: Bool { "0"..."9" ~= self || self == "." }
}
extension RangeReplaceableCollection where Self: StringProtocol {
var digitsAndPeriods: Self { filter(\.isDecimalOrPeriod) }
}
Playground testing:
"abc12345".digits // "12345"
var str = "123abc0"
str.removeAllNonNumeric()
print(str) //"1230"
"Testing0123456789.".digitsAndPeriods // "0123456789."
Swift 4
I found a decent way to get only alpha numeric characters set from a string.
For instance:-
func getAlphaNumericValue() {
var yourString = "123456789!##$%^&*()AnyThingYouWant"
let unsafeChars = CharacterSet.alphanumerics.inverted // Remove the .inverted to get the opposite result.
let cleanChars = yourString.components(separatedBy: unsafeChars).joined(separator: "")
print(cleanChars) // 123456789AnyThingYouWant
}
A solution using the filter function and rangeOfCharacterFromSet
let string = "sld [f]34é7*˜µ"
let alphaNumericCharacterSet = NSCharacterSet.alphanumericCharacterSet()
let filteredCharacters = string.characters.filter {
return String($0).rangeOfCharacterFromSet(alphaNumericCharacterSet) != nil
}
let filteredString = String(filteredCharacters) // -> sldf34é7µ
To filter for only numeric characters use
let string = "sld [f]34é7*˜µ"
let numericSet = "0123456789"
let filteredCharacters = string.characters.filter {
return numericSet.containsString(String($0))
}
let filteredString = String(filteredCharacters) // -> 347
or
let numericSet : [Character] = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
let filteredCharacters = string.characters.filter {
return numericSet.contains($0)
}
let filteredString = String(filteredCharacters) // -> 347
Swift 4
But without extensions or componentsSeparatedByCharactersInSet which doesn't read as well.
let allowedCharSet = NSCharacterSet.letters.union(.whitespaces)
let filteredText = String(sourceText.unicodeScalars.filter(allowedCharSet.contains))
let string = "+1*(234) fds567#-8/90-"
let onlyNumbers = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined()
print(onlyNumbers) // "1234567890"
or
extension String {
func removeNonNumeric() -> String {
return self.components(separatedBy: CharacterSet.decimalDigits.inverted).joined()
}
}
let onlyNumbers = "+1*(234) fds567#-8/90-".removeNonNumeric()
print(onlyNumbers)// "1234567890"
Swift 3, filters all except numbers
let myString = "dasdf3453453fsdf23455sf.2234"
let result = String(myString.characters.filter { String($0).rangeOfCharacter(from: CharacterSet(charactersIn: "0123456789")) != nil })
print(result)
Swift 4.2
let numericString = string.filter { (char) -> Bool in
return char.isNumber
}
You can do something like this...
let string = "[,myString1. \"" // string : [,myString1. "
let characterSet = NSCharacterSet(charactersInString: "[,. \"")
let finalString = (string.componentsSeparatedByCharactersInSet(characterSet) as NSArray).componentsJoinedByString("")
print(finalString)
//finalString will be "myString1"
The issue with Rob's first solution is stringByTrimmingCharactersInSet only filters the ends of the string rather than throughout, as stated in Apple's documentation:
Returns a new string made by removing from both ends of the receiver characters contained in a given character set.
Instead use componentsSeparatedByCharactersInSet to first isolate all non-occurrences of the character set into arrays and subsequently join them with an empty string separator:
"$$1234%^56()78*9££".componentsSeparatedByCharactersInSet(NSCharacterSet(charactersInString: "0123456789").invertedSet)).joinWithSeparator("")
Which returns 123456789
Swift 3
extension String {
var keepNumericsOnly: String {
return self.components(separatedBy: CharacterSet(charactersIn: "0123456789").inverted).joined(separator: "")
}
}
Swift 4.0 version
extension String {
var numbers: String {
return String(describing: filter { String($0).rangeOfCharacter(from: CharacterSet(charactersIn: "0123456789")) != nil })
}
}
Swift 4
String.swift
import Foundation
extension String {
func removeCharacters(from forbiddenChars: CharacterSet) -> String {
let passed = self.unicodeScalars.filter { !forbiddenChars.contains($0) }
return String(String.UnicodeScalarView(passed))
}
func removeCharacters(from: String) -> String {
return removeCharacters(from: CharacterSet(charactersIn: from))
}
}
ViewController.swift
let character = "1Vi234s56a78l9"
let alphaNumericSet = character.removeCharacters(from: CharacterSet.decimalDigits.inverted)
print(alphaNumericSet) // will print: 123456789
let alphaNumericCharacterSet = character.removeCharacters(from: "0123456789")
print("no digits",alphaNumericCharacterSet) // will print: Vishal
Swift 4.2
let digitChars = yourString.components(separatedBy:
CharacterSet.decimalDigits.inverted).joined(separator: "")
Swift 3 Version
extension String
{
func trimmingCharactersNot(in charSet: CharacterSet) -> String
{
var s:String = ""
for unicodeScalar in self.unicodeScalars
{
if charSet.contains(unicodeScalar)
{
s.append(String(unicodeScalar))
}
}
return s
}
}