How can I match a break line from OCR text using regex?
For example I have this text:
"NAME JESUS LASTNAME"
I want to find a match with NAME and then get the next two lines
if (line.text.range(of: "^NAME+\\n", options: .regularExpression) != nil){
let name = line.text
print(name)
}
You can use a positive look behind to find NAME followed by a new line, and try to match a line followed by any text that ends on a new line or the end of a string "(?s)(?<=NAME\n).*\n.*(?=$|\n)":
For more info about the regex above you can check this
Playground testing:
let str = "NAME\nJESUS\nLASTNAME"
let pattern = "(?s)(?<=NAME\n).*\n.*(?=$|\n)"
if let range = str.range(of: pattern, options: .regularExpression) {
let text = String(str[range])
print(text)
}
This will print
JESUS
LASTNAME
You can use
(?m)(?<=^NAME\n).*\n.*
See the regex demo. Details:
(?m) - a multiline option making ^ match start of a line
(?<=^NAME\n) - a positive lookbehind that matches a location that is immediately preceeded with start of a line, NAME and then a line feed char
.*\n.* - two subsequent lines (.* matches zero or more chars other than line break chars as many as possible).
See the Swift fiddle:
import Foundation
let line_text = "NAME\nJESUS\nLASTNAME"
if let rng = line_text.range(of: #"(?m)(?<=^NAME\n).*\n.*"#, options: .regularExpression) {
print(String(line_text[rng]))
}
// => JESUS
// LASTNAME
I need to create a predicate that will look for the following string:
"fred\n5" where \n is a newline.
At least, this is string that is returned when reading the metadata back
You can do it with Regular Expression
let string = """
fred
5
"""
let predicate = NSPredicate(format: "self MATCHES %#", "fred\\n5")
predicate.evaluate(with: string) // true
It's also possible to use the pattern fred(\\n|\\r)5, it considers both linefeed and return.
Alternatively remove the newline character (actually any whitespace and newline characters)
let trimmedString = string.replacingOccurrences(of: "\\s", with: "", options: .regularExpression)
I'm bit confused by NSRegularExpression in swift, can any one help me?
task:1 given ("name","john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets.
task:2 given ("name"," john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets and extra spaces and finally get array of strings.
task:3 given key = value // comment
then I should get ["key","value","comment"]. Here I should get only strings in the line by avoiding = and //
I have tried below code for task 1 but not passed.
let string = "(name,john,string for user name)"
let pattern = "(?:\\w.*)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
Thanks in advance.
RegEx in Swift
These posts might help you to explore regular expressions in swift:
Does a string match a pattern?
Swift extract regex matches
How can I use String slicing subscripts in Swift 4?
How to use regex with Swift?
Swift 3 - How do I extract captured groups in regular expressions?
How to group search regular expressions using swift?
Task 1 & 2
This expression might help you to match your desired outputs for both Task 1 and 2:
"(\s+)?([a-z\s]+?)(\s+)?"
Based on Rob's advice, you could much reduce the boundaries, such as the char list [a-z\s]. For example, here, we can also use:
"(\s+)?(.*?)(\s+)?"
or
"(\s+)?(.+?)(\s+)?"
to simply pass everything in between two " and/or space.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /"(\s+)?([a-z\s]+?)(\s+)?"/gm;
const str = `"name","john","name of john"
"name"," john","name of john"
" name "," john","name of john "
" name "," john"," name of john "`;
const subst = `\n$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Task 3
This expression might help you to design an expression for the third task:
(.*?)([a-z\s]+)(.*?)
const regex = /(.*?)([a-z\s]+)(.*?)/gm;
const str = `key = value // comment
key = value with some text // comment`;
const subst = `$2,`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Separate the string by non alpha numeric characters except white spaces. Then trim the elements with white spaces.
extension String {
func words() -> [String] {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted.subtracting(.whitespaces))
.filter({ !$0.isEmpty })
.map({ $0.trimmingCharacters(in: .whitespaces) })
}
}
let string1 = "(name,john,string for user name)"
let string2 = "(name, john,name of john)"
let string3 = "key = value // comment"
print(string1.words())//["name", "john", "string for user name"]
print(string2.words())//["name", "john", "name of john"]
print(string3.words())//["key", "value", "comment"]
Here I have done with after understanding all of above comments.
let text = """
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial
"""
extension String {
func rex (_ expr : String)->[String] {
return try! NSRegularExpression(pattern: expr, options: [.caseInsensitive])
.matches(in: self, options: [], range: NSRange(location: 0, length: self.count))
.map {
String(self[Range($0.range, in: self)!])
}
}
}
let r = text.rex("(?:\\w+-\\w+)") // pass any rex
A single pattern, works for test:1...3, in Swift.
let string =
//"(name,john,string for user name)" //test:1
//#"("name"," john","name of john")"# //test:2
"key = value // comment" //test:3
let pattern = #"(?:\w+)(?:\s+\w+)*"# //Swift 5+ only
//let pattern = "(?:\\w+)(?:\\s+\\w+)*"
do {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: string, range: NSRange(0..<string.utf16.count))
let matchingWords = matches.map {
String(string[Range($0.range, in: string)!])
}
print(matchingWords) //(test:3)->["key", "value", "comment"]
} catch {
print("Regex was bad!")
}
Let’s consider:
let string = "(name,José,name is José)"
I’d suggest a regex that looks for strings where:
It’s the substring either after the ( at the start of the full string or after a comma, i.e., look behind assertion of (?<=^\(|,);
It’s the substring that does not contain , within it, i.e., [^,]+?;
It’s the substring that is terminated by either a comma or ) at the end of the full string, i.e., look ahead assertion of (?=,|\)$), and
If you want to have it skip white space before and after the substrings, throw in the \s*+, too.
Thus:
let pattern = #"(?<=^\(|,)\s*+([^,]+?)\s*+(?=,|\)$)"#
let regex = try! NSRegularExpression(pattern: pattern)
regex.enumerateMatches(in: string, range: NSRange(string.startIndex..., in: string)) { match, _, _ in
if let nsRange = match?.range(at: 1), let range = Range(nsRange, in: string) {
let substring = String(string[range])
// do something with `substring` here
}
}
Note, I’m using the Swift 5 extended string delimiters (starting with #" and ending with "#) so that I don’t have to escape my backslashes within the string. If you’re using Swift 4 or earlier, you’ll want to escape those back slashes:
let pattern = "(?<=^\\(|,)\\s*+([^,]+?)\\s*+(?=,|\\)$)"
I referred this SO post to remove whitespaces and newline characters from a string. But in my string, I may have extra whitespaces as well as extra newline characters. I want to remove the unnecessary \n's and whitespaces from that string.
But if there is a string like so..."This \n is a st\tri\rng" then I don't want Thisisastring as the result but instead something like this..
This is a string
To replace contiguous spaces with a single space, replace Regular Expression \s+ with a single space:
let str = "This \n\n is a string"
if let regex = try? NSRegularExpression(pattern: "\\s+", options: NSRegularExpression.Options.caseInsensitive)
{
let result = regex.stringByReplacingMatches(in: str, options: [], range: NSMakeRange(0, str.count), withTemplate: " ")
print(result) //output: "This is a string"
}
I'm trying to pull out the parts of a string that are in quotation marks, i.e. in "Rouge One" is an awesome movie I want to extract Rouge One.
This is what I have so far but can't figure out where to go from here: I create a copy of the text so that I can remove the first quotation mark so that I can get the index of the second.
if text.contains("\"") {
guard let firstQuoteMarkIndex = text.range(of: "\"") else {return}
var textCopy = text
let textWithoutFirstQuoteMark = textCopy.replacingCharacters(in: firstQuoteMarkIndex, with: "")
let secondQuoteMarkIndex = textCopy.range(of: "\"")
let stringBetweenQuotes = text.substring(with: Range(start: firstQuoteMarkIndex, end: secondQuoteMarkIndex))
}
There is no need to create copies or to replace substrings for this task.
Here is a possible approach:
Use text.range(of: "\"") to find the first quotation mark.
Use text.range(of: "\"", range:...) to find the second quotation mark, i.e. the first one after the range found in step 1.
Extract the substring between the two ranges.
Example:
let text = " \"Rouge One\" is an awesome movie"
if let r1 = text.range(of: "\""),
let r2 = text.range(of: "\"", range: r1.upperBound..<text.endIndex) {
let stringBetweenQuotes = text.substring(with: r1.upperBound..<r2.lowerBound)
print(stringBetweenQuotes) // "Rouge One"
}
Another option is a regular expression search with "positive lookbehind" and "positive lookahead" patterns:
if let range = text.range(of: "(?<=\\\").*?(?=\\\")", options: .regularExpression) {
let stringBetweenQuotes = text.substring(with: range)
print(stringBetweenQuotes)
}
var rouge = "\"Rouge One\" is an awesome movie"
var separated = rouge.components(separatedBy: "\"") // ["", "Rouge One", " is an awesome movie"]
separated.dropFirst().first
I would use .components(separatedBy:)
let stringArray = text.components(separatedBy: "\"")
Check if stringArray count is > 2 (there is at least 2 quotes).
Check if stringArray count is odd, aka count % 2 == 1.
If it is odd, all the even indices are between 2 quotes and they are what you want.
If it is even, all the even indices - 1 are between 2 quotes (the last one doesn't have an end quote).
This will allow you to also capture multiple sets of quoted strings, like:
"Rogue One" is a "Star Wars" movie.
Another option is to use regular expressions to find pairs of quotes:
let pattern = try! NSRegularExpression(pattern: "\\\"([^\"]+)\\\"")
// Small helper methods making it easier to work with enumerateMatches(in:...)
extension String {
subscript(utf16Range range: Range<Int>) -> String? {
get {
let start = utf16.index(utf16.startIndex, offsetBy: range.lowerBound)
let end = utf16.index(utf16.startIndex, offsetBy: range.upperBound)
return String(utf16[start..<end])
}
}
var fullUTF16Range: NSRange {
return NSRange(location: 0, length: utf16.count)
}
}
// Loop through *all* quoted substrings in the original string.
let str = "\"Rogue One\" is an awesome movie"
pattern.enumerateMatches(in: str, range: str.fullUTF16Range) { (result, flags, stop) in
// rangeAt(1) is the range representing the characters in the 1st
// capture group of the regular expression: ([^"]+)
if let result = result, let range = result.rangeAt(1).toRange() {
print("This was in quotes: \(str[utf16Range: range] ?? "<bad range>")")
}
}