I have an array of strings like:
"Foo", "Foo1", "Foo$", "$Foo", "1Foo", "1$", "20$", "1$Foo", "12$$", etc.
My required format is [Any number without dots][Must end with single $ symbol] (I mean, 1$ and 20$ from the above array)
I have tried like the below way, but it's not working.
func isValidItem(_ item: String) -> Bool {
let pattern = #"^[0-9]$"#
return (item.range(of: pattern, options: .regularExpression) != nil)
}
Can some one help me on this? Also, please share some great links to learn about the regex patterns if you have any.
Thank you
You can use
func isValidItem(_ item: String) -> Bool {
let pattern = #"^[0-9]+\$\z"#
return (item.range(of: pattern, options: .regularExpression) != nil)
}
let arr = ["Foo", "Foo1", "Foo$", "$Foo", "1Foo", "1$", "20$", "1$Foo", "12$$"]
print(arr.filter {isValidItem($0)})
// => ["1$", "20$"]
Here,
^ - matches start of a line
[0-9]+ - one or more ASCII digits (note that Swift regex engine is ICU and \d matches any Unicode digits in this flavor, so [0-9] is safer if you need to only match digits from the 0-9 range)
\$ - a $ char
\z - the very end of string.
See the online regex demo ($ is used instead of \z since the demo is run against a single multiline string, hence the use of the m flag at regex101.com).
Related
I am trying to parse a string with a regex, I am getting some problems trying to extract all the information in substrings. I am almost done, but I am stacked at this point:
For a string like this:
[00/0/00, 00:00:00] User: This is the message text and any other stuff
I can parse Date, User and Message in Swift with this code:
let line = "[00/0/00, 00:00:00] User: This is the message text and any other stuff"
let result = line.match("(.+)\\s([\\S ]*):\\s(.*\n(?:[^-]*)*|.*)$")
extension String {
func match(_ regex: String) -> [[String]] {
let nsString = self as NSString
return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, count)).map { match in
(0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
} ?? []
}
}
The resulting array is something like this:
[["[00/0/00, 00:00:00] User: This is the message text and any other stuff","[00/0/00, 00:00:00]","User","This is the message text and any other stuff"]]
Now my problem is this, if the message has a ':' on it, the resulting array is not following the same format and breaks the parsing function.
So I think I am missing some cases in the regex, can anyone help me with this? Thanks in advance.
In the pattern, you are making use of parts that are very broad matches.
For example, .+ will first match until the end of the line, [\\S ]* will match either a non whitespace char or a space and [^-]* matches any char except a -
The reason it could potentially break is that the broad matches first match until the end of the string. As a single : is mandatory in your pattern, it will backtrack from the end of the string until it can match a : followed by a whitespace, and then tries to match the rest of the pattern.
Adding another : in the message part, may cause the backtracking to stop earlier than you would expect making the message group shorter.
You could make the pattern a bit more precise, so that the last part can also contain : without breaking the groups.
(\[[^][]*\])\s([^:]*):\s(.*)$
(\[[^][]*\]) Match the part from an opening till closing square bracket [...] in group 1
\s Match a whitespace char
([^:]*): Match any char except : in group 2, then match the expected :
\s(.*) Match a whitespace char, and capture 0+ times any char in group 3
$ End of string
Regex demo
How do you make a letter count as another when finding a substring in Swift? For example, I have a string:
I like to eat apples
But I want to be able to make it where all instances of 'p' could be written as 'b'.
If the user searches "abbles", it should still return the substring "apples" from the quote. I have this issue because I want whenever a user searches
اكل
But the quote contains
أكل
it would return that value. I tried fullString.range(of: string, options: [.diacriticInsensitive, .caseInsensitive] but this does not fix it since the "ء" are not diacritics, so أ إ ا all behave differently when they should all be the same. Users only use ا. How do I make it count for أ and إ without replacing all instances of them with ا?
You could add a small String extension that uses simple regular expressions (as matt suggested in the comments) to do the actual matching. Like so:
extension String {
func contains(substring: String, disregarding: [String]) -> Bool {
var escapedPattern = NSRegularExpression.escapedPattern(for: substring)
for string in disregarding {
let replacement = String(repeating: ".", count: string.count)
escapedPattern = escapedPattern.replacingOccurrences(of: string, with: replacement)
}
let regEx = ".*" + escapedPattern + ".*"
return self.range(of: regEx,
options: .regularExpression) != nil
}
}
Example output:
"I like apples".contains(substring: "apples", disregarding: ["p"]) //true
"I like abbles".contains(substring: "apples", disregarding: ["p"]) //true
"I like oranges".contains(substring: "apples", disregarding: ["p"]) //false
I am trying to determine whether an input string contains "n't" or "not".
For example, if the input were:
let part = "Hi, I can't be found!"
I want to find the presence of the negation.
I have tried input.contains, .range, and NSRegularExpression. All of these succeed in finding "not", but fail to find "n't". I have tried escaping the character as well.
'//REGEX:
let negationPattern = "(?:n't|[Nn]ot)"
do {
let regex = try NSRegularExpression(pattern: negationPattern)
let results = regex.matches(in: text,range: NSRange(part.startIndex..., in: part))
print("results are \(results)")
negation = (results.count > 0)
} catch let error {
print("invalid regex: \(error.localizedDescription)")
}
//.CONTAINS
if part.contains("not") || part.contains("n't"){
print("negation present in part")
negation = true
}
//.RANGE (showing .regex option; also tried without)
if part.lowercased().range(of:"not", options: .regularExpression) != nil || part.lowercased().range(of:"n't", options: .regularExpression) != nil {
print("negation present in part")
negation = true
}
Here is a picture:
This is a bit tricky, and the screenshot is actually what gives it away: your regex pattern has a plain single quote in it, but the input text has a "smart" or "curly" apostrophe in it. The difference is subtle:
Regular: '
Smart: ’
Lots of text fields will automatically replace regular single quotes with "smart" apostrophes when they think it's appropriate. Your regex, however, only matches the plain single quote, as evidenced by this tiny test:
func isNegation(input text: String) -> Bool {
let negationPattern = "(?:n't|[Nn]ot)"
let regex = try! NSRegularExpression(pattern: negationPattern)
let matches = regex.matches(in: text,range: NSRange(text.startIndex..., in: text))
return matches.count > 0
}
for input in ["not", "n't", "n’t"] {
print("\"\(input)\" is negation: \(isNegation(input: input) ? "YES" : "NO")")
}
This prints:
"not" is negation: YES
"n't" is negation: YES
"n’t" is negation: NO
If you want to continue using a regex for this problem, you'll need to modify it to match this kind of punctuation character, and avoid assuming all your input text includes "plain" single quotes.
I try to include ' symbol to Regular Expressions
I use this function
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return results.map {
text.substring(with: Range($0.range, in: text)!)
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
and this Regular Expressions
let matched = matches(for: "^[‘]|[0-9]|[a-zA-Z]+$", in: string)
when I search I can find numbers and english letters
but not ' symbol
I guess that what you really want is this:
"['0-9a-zA-Z]+"
Note that I have removed the ^ (text start) and $ (text end) characters because then your whole text would have to match.
I have merged the groups because otherwise you would not match the text as a whole word. You would get separate apostrophe and then the word.
I have changed the ‘ character into the proper ' character. The automatic conversion from the simple apostrophe is caused by iOS 11 Smart Punctuation. You can turn it off on an input using:
input.smartQuotesType = .no
See https://developer.apple.com/documentation/uikit/uitextinputtraits/2865931-smartquotestype
This question already has answers here:
Filter non-digits from string
(12 answers)
Closed 6 years ago.
How to get numbers characters from a string? I don't want to convert in Int.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
Pattern matching a String's unicode scalars against Western Arabic Numerals
You could pattern match the unicodeScalars view of a String to a given UnicodeScalar pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
Extending to matching any of several given patterns
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns array in the Eastern Arabic example above.
Why use the UnicodeScalar patterns approach over Character ones?
A Character in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView (.characters property) for some test String instances, its very apparent that the UnicodeScalar approach is faster than the Character approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café". The last character, "é", is represented by two unicode scalars, "e" and "\u{301}". If we were to implement pattern matching against, say, UnicodeScalar("a")...e, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character pattern matching over UnicodeScalar.
Edit: Updated for Swift 4 & 5
Here's a straightforward method that doesn't require Foundation:
let newstring = string.filter { "0"..."9" ~= $0 }
or borrowing from #dfri's idea to make it a String extension:
extension String {
var numbers: String {
return filter { "0"..."9" ~= $0 }
}
}
print("3 little pigs".numbers) // "3"
print("1, 2, and 3".numbers) // "123"
import Foundation
let string = "a_1_b_2_3_c34"
let result = string.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")
print(result)
Output:
12334
Here is a Swift 2 example:
let str = "Hello 1, World 62"
let intString = str.componentsSeparatedByCharactersInSet(
NSCharacterSet
.decimalDigitCharacterSet()
.invertedSet)
.joinWithSeparator("") // Return a string with all the numbers
This method iterate through the string characters and appends the numbers to a new string:
class func getNumberFrom(string: String) -> String {
var number: String = ""
for var c : Character in string.characters {
if let n: Int = Int(String(c)) {
if n >= Int("0")! && n < Int("9")! {
number.append(c)
}
}
}
return number
}
For example with regular expression
let text = "string_20_certified"
let pattern = "\\d+"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: text.characters.count)) {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}
If there are multiple occurrences of the pattern use matches(in..
let matches = regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.characters.count))
for match in matches {
let newString = (text as NSString).substring(with: match.range)
print(newString)
}