Swift String Tokenizer / Parser [closed] - swift

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Hello there fellow Swift devs!
I am a junior dev, and I'm trying to figure out a best way to tokenize / parse Swift String as an exercise.
What I have is a string which looks like this:
let string = "This is a {B}string{/B} and this is a substring."
What I would like to do is, tokenize the string, and change the "strings / tokens" inside the tags you see.
I can see using NSRegularExpression and it's matches, but it feels too generic. I would like to have only say 2 of these tags, that change the text. What would be the best approach in Swift 5.2^?
if let regex = try? NSRegularExpression(pattern: "\\{[a-z0-9]+\}", options: .caseInsensitive) {
let string = self as NSString
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: string.length)).map {
// now $0 is the result? but it won't work for enclosing the tags :/
}
}

If the option of using html tags instead of {B}{/B} is acceptable, then you can use the StringEx library that I wrote for this purpose.
You can select a substring inside the html tag and replace it with another string like this:
let string = "This is a <b>string</b> and this is a substring."
let ex = string.ex
ex[.tag("b")].replace(with: "some value")
print(ex.rawString) // This is a <b>some value</b> and this is a substring.
print(ex.string) // This is a some value and this is a substring.
if necessary, you can also style the selected substrings and get NSAttributedString:
ex[.tag("b")].style([
.font(.boldSystemFont(ofSize: 16)),
.color(.black)
])
myLabel.attributedText = ex.attributedString

Not sure if you have solved it with NLTokenizer or not, but you can certainly solve it with Regx here is how (I have implemented it as generic, in future if you have to handle different kinds of tags and substite different string for them small tweak to the logic should do the job )
override func viewDidLoad() {
super.viewDidLoad()
let regexStr = "(\\{B\\}(\\s*\\w+\\s*)*\\{\\/B\\})"
let regex = try! NSRegularExpression(pattern: regexStr)
var string = "Sandeep {B}Bhandaari{/B} is here{B}Sandeep{/B}"
var foundRanges = [NSRange]()
regex.enumerateMatches(in: string, options: [], range: NSMakeRange(0, string.count)) { (match, flag, stop) in
if let matchRange = match?.range(at: 1) {
foundRanges.append(matchRange)
}
}
let substituteString = "abcd"
var replacedString = string as NSString
let foundRangesCount = foundRanges.count
var currentRange = 0
while foundRangesCount > currentRange {
let range = foundRanges[currentRange]
replacedString = replacedString.replacingCharacters(in: range, with: substituteString) as NSString
reEvaluateAllRanges(ranges: &foundRanges, byOffset: range.length - substituteString.count)
currentRange += 1
}
debugPrint(replacedString)
}
func reEvaluateAllRanges(ranges: inout [NSRange], byOffset: Int) {
var newFoundRange = [NSRange]()
for range in ranges {
newFoundRange.append(NSMakeRange(range.location - byOffset, range.length))
}
ranges = newFoundRange
}
Input: "Sandeep {B}Bhandaari{/B} is here"
Output: Sandeep abcd is here
Input: "Sandeep {B}Bhandaari{/B} is here{B}Sandeep{/B}"
Output: Sandeep abcd is hereabcd
Look at the edge case handling Longer strings replaced by smaller substitute strings and vice versa also detection of string enclosed in tag with / without space
EDIT 1:
Regx (\\{B\\}(\\s*\\w+\\s*)*\\{\\/B\\}) should be self explanatory, incase you need help with understanding it use cheat sheet
regex.enumerateMatches(in: string, options: [], range: NSMakeRange(0, string.count)) { (match, flag, stop) in
if let matchRange = match?.range(at: 1) {
foundRanges.append(matchRange)
}
}
I could have modified substring here itself, but if you have more than one match and if you mutate string evaluated ranges will be corrupted hence am saving all found ranges into an array and apply replace on each one of them later
let substituteString = "abcd"
var replacedString = string as NSString
let foundRangesCount = foundRanges.count
var currentRange = 0
while foundRangesCount > currentRange {
let range = foundRanges[currentRange]
replacedString = replacedString.replacingCharacters(in: range, with: substituteString) as NSString
reEvaluateAllRanges(ranges: &foundRanges, byOffset: range.length - substituteString.count)
currentRange += 1
}
Here am iterating through all found match ranges and replace character in range with substitute string, you can always have a switch / if else ladder inside while loop to look for different types of tags and pass different substitute strings for each tags
func reEvaluateAllRanges(ranges: inout [NSRange], byOffset: Int) {
var newFoundRange = [NSRange]()
for range in ranges {
newFoundRange.append(NSMakeRange(range.location - byOffset, range.length))
}
ranges = newFoundRange
}
This function modifies all the ranges in array using the offset, remember you need to only modify range's location, length remains same
One bit of optimisation you can do is probably get rid of ranges from array for which you have already applied substitute strings

Related

Split String or Substring with Regex pattern in Swift

First let me point out... I want to split a String or Substring with any character that is not an alphabet, a number, # or #. That means, I want to split with whitespaces(spaces & line breaks) and special characters or symbols excluding # and #
In Android Java, I am able to achieve this with:
String[] textArr = text.split("[^\\w_##]");
Now, I want to do the same in Swift. I added an extension to String and Substring classes
extension String {}
extension Substring {}
In both extensions, I added a method that returns an array of Substring
func splitWithRegex(by regexStr: String) -> [Substring] {
//let string = self (for String extension) | String(self) (for Substring extension)
let regex = try! NSRegularExpression(pattern: regexStr)
let range = NSRange(string.startIndex..., in: string)
return regex.matches(in: string, options: .anchored, range: range)
.map { match -> Substring in
let range = Range(match.range(at: 1), in: string)!
return string[range]
}
}
And when I tried to use it, (Only tested with a Substring, but I also think String will give me the same result)
let textArray = substring.splitWithRegex(by: "[^\\w_##]")
print("substring: \(substring)")
print("textArray: \(textArray)")
This is the out put:
substring: This,is a #random #text written for debugging
textArray: []
Please can Someone help me. I don't know if the problem if from my regex [^\\w_##] or from splitWithRegex method
The main reason why the code doesn't work is range(at: 1) which returns the content of the first captured group, but the pattern does not capture anything.
With just range the regex returns the ranges of the found matches, but I suppose you want the characters between.
To accomplish that you need a dynamic index starting at the first character. In the map closure return the string from the current index to the lowerBound of the found range and set the index to its upperBound. Finally you have to add manually the string from the upperBound of the last match to the end.
The Substring type is a helper type for slicing strings. It should not be used beyond a temporary scope.
extension String {
func splitWithRegex(by regexStr: String) -> [String] {
guard let regex = try? NSRegularExpression(pattern: regexStr) else { return [] }
let range = NSRange(startIndex..., in: self)
var index = startIndex
var array = regex.matches(in: self, range: range)
.map { match -> String in
let range = Range(match.range, in: self)!
let result = self[index..<range.lowerBound]
index = range.upperBound
return String(result)
}
array.append(String(self[index...]))
return array
}
}
let text = "This,is a #random #text written for debugging"
let textArray = text.splitWithRegex(by: "[^\\w_##]")
print(textArray) // ["This", "is", "a", "#random", "#text", "written", "for", "debugging"]
However in macOS 13 and iOS 16 there is a new API quite similar to the java API
let text = "This,is a #random #text written for debugging"
let textArray = Array(text.split(separator: /[^\w_##]/))
print(textArray)
The forward slashes indicate a regex literal

Swift 5.1 - is there a clean way to deal with locations of substrings/ pattern matches

I'm very, very new to Swift and admittedly struggling with some of its constructs. I have to work with a text file and do many manipulations - here's an example to illustrate the point:
let's say I have a text like this (multi line)
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
x----------------x
I want to be able to do simple things like find the location of #name, then split it to get the name and so on. I've done this in javascript and it was pretty simple with the use of substr and the regex matches.
In swift, which is supposed to be swift and easy and what not, I'm finding this exceedingly confusing.
Can someone help with how one might do
Find the location of the start of a substring
Extract all text between from the end of a substring to the end of text
Sorry if this is trivial - but the Apple documentation feels very complicated, and lots of examples are years old. I can't also seem to find easy application of regex.
You can use string range(of: String) method to find the range of your string, get its upperBound and search for the end of the line from that position of the string:
Playground testing:
let sentence = """
Mary had a little lamb
#name: a name
#summary: a paragraph of text
{{something}}
a whole bunch of multi-line text
"""
if let start = sentence.range(of: "#name:")?.upperBound,
let end = sentence[start...].range(of: "\n")?.lowerBound {
let substring = sentence[start..<end]
print("name:", substring)
}
If you need to get the string from there to the end of the string you can use PartialRangeFrom:
if let start = sentence.range(of: "#summary:")?.upperBound {
let substring = sentence[start...]
print("summary:", substring)
}
If you find yourself using that a lot you can extend StringProtocol and create your own method:
extension StringProtocol {
func substring<S:StringProtocol,T:StringProtocol>(between start: S, and end: T, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: start, options: options)?.upperBound,
let upper = self[lower...].range(of: end, options: options)?.lowerBound
else { return nil }
return self[lower..<upper]
}
func substring<S:StringProtocol>(after string: S, options: String.CompareOptions = []) -> SubSequence? {
guard
let lower = range(of: string, options: options)?.upperBound else { return nil }
return self[lower...]
}
}
Usage:
let name = sentence.substring(between: "#name:", and: "\n") // " a name"
let sumary = sentence.substring(after: "#summary:") // " a paragraph of text\n\n{{something}}\n\na whole bunch of multi-line text"
You can use regular expressions as well:
let name = sentence.substring(between: "#\\w+:", and: "\\n", options: .regularExpression) // " a name"
You can do this with range() and distance():
let str = "Example string"
let range = str.range(of: "amp")!
print(str.distance(from: str.startIndex, to: range.lowerBound)) // 2
let lastStr = str[range.upperBound...]
print(lastStr) // "le string"

Regular expressions in swift

I'm bit confused by NSRegularExpression in swift, can any one help me?
task:1 given ("name","john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets.
task:2 given ("name"," john","name of john")
then I should get ["name","john","name of john"]. Here I should avoid the brackets and extra spaces and finally get array of strings.
task:3 given key = value // comment
then I should get ["key","value","comment"]. Here I should get only strings in the line by avoiding = and //
I have tried below code for task 1 but not passed.
let string = "(name,john,string for user name)"
let pattern = "(?:\\w.*)"
do {
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count))
for match in matches {
if let range = Range(match.range, in: string) {
let name = string[range]
print(name)
}
}
} catch {
print("Regex was bad!")
}
Thanks in advance.
RegEx in Swift
These posts might help you to explore regular expressions in swift:
Does a string match a pattern?
Swift extract regex matches
How can I use String slicing subscripts in Swift 4?
How to use regex with Swift?
Swift 3 - How do I extract captured groups in regular expressions?
How to group search regular expressions using swift?
Task 1 & 2
This expression might help you to match your desired outputs for both Task 1 and 2:
"(\s+)?([a-z\s]+?)(\s+)?"
Based on Rob's advice, you could much reduce the boundaries, such as the char list [a-z\s]. For example, here, we can also use:
"(\s+)?(.*?)(\s+)?"
or
"(\s+)?(.+?)(\s+)?"
to simply pass everything in between two " and/or space.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:
JavaScript Demo
const regex = /"(\s+)?([a-z\s]+?)(\s+)?"/gm;
const str = `"name","john","name of john"
"name"," john","name of john"
" name "," john","name of john "
" name "," john"," name of john "`;
const subst = `\n$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Task 3
This expression might help you to design an expression for the third task:
(.*?)([a-z\s]+)(.*?)
const regex = /(.*?)([a-z\s]+)(.*?)/gm;
const str = `key = value // comment
key = value with some text // comment`;
const subst = `$2,`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Separate the string by non alpha numeric characters except white spaces. Then trim the elements with white spaces.
extension String {
func words() -> [String] {
return self.components(separatedBy: CharacterSet.alphanumerics.inverted.subtracting(.whitespaces))
.filter({ !$0.isEmpty })
.map({ $0.trimmingCharacters(in: .whitespaces) })
}
}
let string1 = "(name,john,string for user name)"
let string2 = "(name, john,name of john)"
let string3 = "key = value // comment"
print(string1.words())//["name", "john", "string for user name"]
print(string2.words())//["name", "john", "name of john"]
print(string3.words())//["key", "value", "comment"]
Here I have done with after understanding all of above comments.
let text = """
Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups later on in the tutorial
"""
extension String {
func rex (_ expr : String)->[String] {
return try! NSRegularExpression(pattern: expr, options: [.caseInsensitive])
.matches(in: self, options: [], range: NSRange(location: 0, length: self.count))
.map {
String(self[Range($0.range, in: self)!])
}
}
}
let r = text.rex("(?:\\w+-\\w+)") // pass any rex
A single pattern, works for test:1...3, in Swift.
let string =
//"(name,john,string for user name)" //test:1
//#"("name"," john","name of john")"# //test:2
"key = value // comment" //test:3
let pattern = #"(?:\w+)(?:\s+\w+)*"# //Swift 5+ only
//let pattern = "(?:\\w+)(?:\\s+\\w+)*"
do {
let regex = try NSRegularExpression(pattern: pattern)
let matches = regex.matches(in: string, range: NSRange(0..<string.utf16.count))
let matchingWords = matches.map {
String(string[Range($0.range, in: string)!])
}
print(matchingWords) //(test:3)->["key", "value", "comment"]
} catch {
print("Regex was bad!")
}
Let’s consider:
let string = "(name,José,name is José)"
I’d suggest a regex that looks for strings where:
It’s the substring either after the ( at the start of the full string or after a comma, i.e., look behind assertion of (?<=^\(|,);
It’s the substring that does not contain , within it, i.e., [^,]+?;
It’s the substring that is terminated by either a comma or ) at the end of the full string, i.e., look ahead assertion of (?=,|\)$), and
If you want to have it skip white space before and after the substrings, throw in the \s*+, too.
Thus:
let pattern = #"(?<=^\(|,)\s*+([^,]+?)\s*+(?=,|\)$)"#
let regex = try! NSRegularExpression(pattern: pattern)
regex.enumerateMatches(in: string, range: NSRange(string.startIndex..., in: string)) { match, _, _ in
if let nsRange = match?.range(at: 1), let range = Range(nsRange, in: string) {
let substring = String(string[range])
// do something with `substring` here
}
}
Note, I’m using the Swift 5 extended string delimiters (starting with #" and ending with "#) so that I don’t have to escape my backslashes within the string. If you’re using Swift 4 or earlier, you’ll want to escape those back slashes:
let pattern = "(?<=^\\(|,)\\s*+([^,]+?)\\s*+(?=,|\\)$)"

Converting numbers to string in a given string in Swift

I am given a string like 4eysg22yl3kk and my output should be like this:
foureysgtweny-twoylthreekk or if I am given 0123 it should be output as one hundred twenty-three. So basically, as I scan the string, I need to convert numbers to string.
I do not know how to implement this in Swift as I iterate through the string? Any idea?
You actually have two basic problems.
The first is convert a "number" to "spelt out" value (ie 1 to one). This is actually easy to solve, as NumberFormatter has a spellOut style property
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let text = formatter.string(from: NSNumber(value: 1))
which will result in "one", neat.
The other issue though, is how to you separate the numbers from the text?
While I can find any number of solutions for "extract" numbers or characters from a mixed String, I can't find one which return both, split on their boundaries, so, based on your input, we'd end up with ["4", "eysg", "22", "yl", "3", "kk"].
So, time to role our own...
func breakApart(_ text: String, withPattern pattern: String) throws -> [String]? {
do {
let regex = try NSRegularExpression(pattern: "[0-9]+", options: .caseInsensitive)
var previousRange: Range<String.Index>? = nil
var parts: [String] = []
for match in regex.matches(in: text, options: [], range: NSRange(location: 0, length: text.count)) {
guard let range = Range(match.range, in: text) else {
return nil
}
let part = text[range]
if let previousRange = previousRange {
let textRange = Range<String.Index>(uncheckedBounds: (lower: previousRange.upperBound, upper: range.lowerBound))
parts.append(String(text[textRange]))
}
parts.append(String(part))
previousRange = range
}
if let range = previousRange, range.upperBound != text.endIndex {
let textRange = Range<String.Index>(uncheckedBounds: (lower: range.upperBound, upper: text.endIndex))
parts.append(String(text[textRange]))
}
return parts
} catch {
}
return nil
}
Okay, so this is a little "dirty" (IMHO), but I can't seem to think of a better approach, hopefully someone will be kind enough to provide some hints towards one ;)
Basically what it does is uses a regular expression to find all the groups of numbers, it then builds an array, cutting the string apart around the matching boundaries - like I said, it's crude, but it gets the job done.
From there, we just need to map the results, spelling out the numbers as we go...
let formatter = NumberFormatter()
formatter.numberStyle = .spellOut
let value = "4eysg22yl3kk"
if let parts = try breakApart(value, withPattern: pattern) {
let result = parts.map { (part) -> String in
if let number = Int(part), let text = formatter.string(from: NSNumber(value: number)) {
return text
}
return part
}.joined(separator: " ")
print(result)
}
This will end up printing four eysg twenty-two yl three kk, if you don't want the spaces, just get rid of separator in the join function
I did this in Playgrounds, so it probably needs some cleaning up
I was able to solve my question without dealing with anything extra than converting my String to an array and check char by char. If I found a digit I was saving it in a temp String and as soon as I found out the next char is not digit, I converted my digit to its text.
let inputString = Array(string.lowercased())

How to search array using unknown characters - Swift 3 for Mac

I am looking for a way to search an Array of strings (containing filenames with extension) for dots (if the string contains characters-a dot-charaters, print the string definition). To do that I have to use something like wildcards (.).
So I tried this :
let testString = "*.*"
if Array[x].countains(testString)
{
print (Array[x])
}
or
if Array[x].range(of:testString) != nil
{
print (Array[x])
}
But it does not work. I guess I have to declare it differently but I don't know how and I have not found the right example.
Could someone shows some examples? Thank U.
Using this helper method on String:
extension String {
func contains(regex: NSRegularExpression) -> Bool {
let length = self.utf16.count // NSRanges are UTF-16 based!
let wholeString = NSRange(location: 0, length: length)
let matchCount = regex.numberOfMatches(in: self, range: wholeString)
return matchCount > 0
}
}
Then try this:
let fileNameWithExtension = try! NSRegularExpression(pattern: "\\w+[.]\\w+")
if Array[x].contains(regex: fileNameWithExtension) {
print(Array[x])
}
You may need to tweak my pattern above in order to match all cases you have in mind. This NSRegularExpression cheat sheet might help you there ;-)