Parsing Infix Mathematical Expressions in Swift Using Regular Expressions - swift

I would like to convert a string that is formatted as an infix mathematical to an array of tokens, using regular expressions. I'm very new to regular expressions, so forgive me if the answer to this question turns out to be too trivial
For example:
"31+2--3*43.8/1%(1*2)" -> ["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "*", "2", ")"]
I've already implemented a method that achieves this task, however, it consists of many lines of code and a few nested loops. I figured that when I define more operators/functions that may even consist of multiple characters, such as log or cos, it would be easier to edit a regex string rather than adding many more lines of code to my working function. Are regular expressions the right job for this, and if so, where am I going wrong? Or am I better off adding to my working parser?
I've already referred to the following SO posts:
How to split a string, but also keep the delimiters?
This one was very helpful, but I don't believe I'm using 'lookahead' correctly.
Validate mathematical expressions using regular expression?
The solution to the question above doesn't convert the string into an array of tokens. Rather, it checks to see if the given string is a valid mathematical expression.
My code is as follows:
func convertToInfixTokens(expression: String) -> [String]?
{
do
{
let pattern = "^(((?=[+-/*]))(-)?\\d+(\\.\\d+)?)*"
let regex = try NSRegularExpression(pattern: pattern)
let results = regex.matches(in: expression, range: NSRange(expression.startIndex..., in: expression))
return results.map
{
String(expression[Range($0.range, in: expression)!])
}
}
catch
{
return nil
}
}
When I do pass a valid infix expression to this function, it returns nil. Where am I going wrong with my regex string?
NOTE: I haven't even gotten to the point of trying to parse parentheses as individual tokens. I'm still figuring out why it won't work on this expression:
"-99+44+2+-3/3.2-6"
Any feedback is appreciated, thanks!

Your pattern does not work because it only matches text at the start of the string (see ^ anchor), then the (?=[+-/*]) positive lookahead requires the first char to be an operator from the specified set but the only operator that you consume is an optional -. So, when * tries to match the enclosed pattern sequence the second time with -99+44+2+-3/3.2-6, it sees +44 and -?\d fails to match it (as it does not know how to match + with -?).
Here is how your regex matches the string:
You may tokenize the expression using
let pattern = "(?<!\\d)-?\\d+(?:\\.\\d+)?|[-+*/%()]"
See the regex demo
Details
(?<!\d) - there should be no digit immediately to the left of the current position
-? - an optional -
\d+ - 1 or more digits
(?:\.\d+)? - an optional sequence of . and 1+ digits
| - or
\D - any char but a digit.
Output using your function:
Optional(["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "1", "*", "2", ")"])

Related

StringTransform to sanitize strings in Swift

I'm attempting to sanitize a string in Swift using a single StringTransform.
I'm using this example string: "Mom's \t Famous \"Ćevapčići\"!"
And the expected result is: "moms-famous-cevapcici"
So far, I've been able to achieve this using a combination of StringTransform and NSRegularExpression:
"Mom's \t Famous \"Ćevapčići\"!"
.applyingTransform(StringTransform("Latin-ASCII; Lower; [:Punctuation:] Remove;"))?
// produces: Optional("moms famous\tcevapcici")
.replacingMatches(
by: try! NSRegularExpression(pattern: "[^a-z0-9]+", options: []),
withTemplate: "-"
)
// produces: Optional("moms-famous-cevapcici")
Is there a way to do this using only StringTransform?
So far, I've only figured out how to remove certain characters, but not replace them. Eg.:
StringTransform("Latin-ASCII; Lower; [:Punctuation:] Remove; [^a-z0-9] Remove;")
The above transform produces "momsfamouscevapcici".
I'd also like to avoid this result: moms---famous-cevapcici. Ideally, this transform could replace several consecutive characters with one dash.

Count leading tabs in Swift string

I need to count the number of leading tabs in a Swift string. I know there are fairly simple solutions (e.g. looping over the string until a non-tab character is encountered) but I am looking for a more elegant solution.
I have attempted to use a regex such as ^\\t* along with the .numberOfMatches method but this detects all the tab characters as one match. For example, if the string has three leading tabs then that method just returns 1. Is there a way to write a regex that treats each individual tab character as a single match?
Also open to other ways of approaching this without using a regex.
Here is a non-regex solution
let count = someString.prefix(while: {$0 == "\t"}).count
You may use
\G\t
See the regex demo.
Here,
\G - matches a string start position or end of the previous match position, and
\t - matches a single tab.
Swift test:
let string = "\t\t123"
let regex = try! NSRegularExpression(pattern: "\\G\t", options: [])
let numberOfOccurrences = regex.numberOfMatches(in: string, range: NSRange(string.startIndex..., in: string))
print(numberOfOccurrences) // => 2

How to remove spaces from a string in Swift?

I have the need to remove leading and trailing spaces around a punctuation character.
For example: Hello , World ... I 'm a newbie iOS Developer.
And I'd like to have: > Hello, World... I'm a newbie iOS Developer.
How can I do this? I tried to get components of the string and enumerate it by sentences. But that is not what I need
Rob's answer is great, but you can trim it down quite a lot by taking advantage of the \p{Po} regular expression class. Getting rid of the spaces around punctuation then becomes a single regular expression replace:
import Foundation
let input = "Hello , World ... I 'm a newbie iOS Developer."
let result = input.replacingOccurrences(of: "\\s*(\\p{Po}\\s?)\\s*",
with: "$1",
options: [.regularExpression])
print(result) // "Hello, World... I'm a newbie iOS Developer."
Rob's answer also tries to trim leading/trailing spaces, but your input doesn't have any of those. If you do care about that you can just call result.trimmingCharacters(in: .whitespacesAndNewlines) on the result.
Here's an explanation for the regular expression. Removing the double-escapes it looks like
\s*(\p{Po}\s?)\s*
This is comprised of the following components:
\s* - Match zero or more whitespace characters (and throw them away)
(…) - Capturing group. Anything inside this group is preserved by the replacement (the $1 in the replacement refers to this group).
\p{Po} - Match a single character in the "Other_Punctuation" unicode category. This includes things like ., ', and …, but excludes things like ( or -.
\s? - Match a single optional whitespace character. This preserves the space after periods (or ellipses).
\s* - Once again, match zero or more whitespace characters (and throw them away). This is what turns your , World into , World.
For Swift 3 or 4 you can use :
let trimmedString = string.trimmingCharacters(in: .whitespaces)
This is a really wonderful problem and a shame that it isn't easier to do in Swift today (someday it will be, but not today).
I kind of hate this code, but I'm getting on a plane for 20 hours, and don't have time to make it nicer. This may at least get you started using NSMutableString. It'd be nice to work in String, and Swift hates regular expressions, so this is kind of hideous, but at least it's a start.
import Foundation
let input = "Hello, World ... I 'm a newbie iOS Developer."
let adjustments = [
(pattern: "\\s*(\\.\\.\\.|\\.|,)\\s*", replacement: "$1 "), // elipsis or period or comma has trailing space
(pattern: "\\s*'\\s*", replacement: "'"), // apostrophe has no extra space
(pattern: "^\\s+|\\s+$", replacement: ""), // remove leading or trailing space
]
let mutableString = NSMutableString(string: input)
for (pattern, replacement) in adjustments {
let re = try! NSRegularExpression(pattern: pattern)
re.replaceMatches(in: mutableString,
options: [],
range: NSRange(location: 0, length: mutableString.length),
withTemplate: replacement)
}
mutableString // "Hello, World... I'm a newbie iOS Developer."
Regular expressions can be very confusing when you first encounter them. A few hints at reading these:
The specific language Foundation uses is described by ICU.
Backslash (\) means "the next character is special" for a regex. But inside a Swift string, backslash means "the next character is special" of the string. So you have to double them all.
\s means "a whitespace character"
\s* means "zero or more whitespace characters"
\s+ means "one or more whitespace characters"
$1 means "the thing we matched in parentheses"
| means "or"
^ means "start of string"
$ means "end of string"
. means "any character" so to mean "an actual dot" you have to type "\\." in a Swift string.
Notice that I check for both "..." and "." in the same regular expression. You kind of have to do something like that, or else the "." will match three times inside the "...". Another approach would be to first replace "..." with "…" (the single ellipsis character, typed on a Mac by pressing Opt-;). Then "…" is a one-character punctuation. (You could also decide to re-expand all ellipsis back to dot-dot-dot at the end of the process.)
Something like this is probably how I'd do it in real life, get it done and shipped, but it may be worth the pain/practice to try to build this as a character-by-character state machine, walking one character at a time, and keeping track of your current state.
You can try something like
string.replacingOccurrences(of: " ,", with: ",") for every punctuation...
Interesting problem; here's my stab at a non-Regex approach:
func correct(input: String) -> String {
typealias Correction = (punctuation: String, replacement: String)
let corrections: [Correction] = [
(punctuation: "...", replacement: "... "),
(punctuation: "'", replacement: "'"),
(punctuation: ",", replacement: ", "),
]
var transformed = input
for correction in corrections {
transformed = transformed
.components(separatedBy: correction.punctuation)
.map({ $0.trimmingCharacters(in: .whitespaces) })
.joined(separator: correction.replacement)
}
return transformed
}
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let testOutput = correct(input: testInput)
// Hello, World... I'm a newbie iOS Developer.
If you were doing this manually by processing characters arrays, you would merely need to check the previous and next characters around spaces. You can achieve the same result using functional style programming with zip, filter and map:
let testInput = "Hello , World ... I 'm a newbie iOS Developer."
let punctuation = Set(".\',")
let previousNext = zip( [" "] + testInput, String(testInput.dropFirst()) + [" "] )
let filteredChars = zip(Array(previousNext),testInput)
.filter{ $1 != " "
|| !($0.0 != " " && punctuation.contains($0.1))
}
let filteredInput = String(filteredChars.map{$1})
print(testInput) // Hello , World ... I 'm a newbie iOS Developer.
print(filteredInput) // Hello, World... I'm a newbie iOS Developer.
Swift 4, 4.2 and 5
let str = " Akbar Code "
let trimmedString = str.trimmingCharacters(in: .whitespaces)

Could I specify pattern match priority in lex code?

I've got a related thread in the site(My lex pattern doesn't work to match my input file, how to correct it?)
The problems I met, is about how "greedy" lex will do pattern match, e.g. I've got my lex file:
$ cat b.l
%{
#include<stdio.h>
%}
%%
"12" {printf("head\n");}
"34" {printf("tail\n");}
.* {printf("content\n");}
%%
What I wish to say is, when meet "12", print "head"; when meet "34", print "tail", otherwise print "content" for the longest match that doesn't contain either "12" or "34".
But the fact was, ".*" was a greedy match that whatever I input, it prints "content".
My requirement is, when I use
12sdf2dfsd3sd34
as input, the output should be
head
content
tail
So seems there're 2 possible ways:
1, To specify a match priority for ".*", it should work only when neither "12" and "34" works to match. Does lex support "priority"?
2, to change the 3rd expression, as to match any contiguous string that doesn't contain sub-string of "12", or "34". But how to write this regular expression?
Does (f)lex support priority?
(F)lex always produces the longest possible match. If more than one rule matches the same longest match, the first one is chosen, so in that case it supports priority. But it does not support priority for shorter matches, nor does it implement non-greedy matching.
How to match a string which does not contain one or more sequences?
You can, with some work, create a regular expression which matches a string not containing specified substrings, but it is not particularly easy and (f)lex does not provide a simple syntax for such regular expressions.
A simpler (but slightly less efficient) solution is to match the string in pieces. As a rough outline, you could do the following:
"12" { return HEAD; }
"34" { if (yyleng > 2) {
yyless(yyleng - 2);
return CONTENT;
}
else
return TAIL;
}
.|\n { yymore(); }
This could be made more efficient by matching multiple characters when there is not chance of skipping a delimiter; change the last rule to:
.|[^13]+ { yymore(); }
yymore() causes the current token to be retained, so that the next match appends to the current token rather than starting a new token. yyless(x) returns all but the first x characters to the input stream; in this case, that is used to cause the end delimiter 34 to be rescanned after the CONTENT token is identified.
(That assumes you actually want to tokenize the input stream, rather than just print a debugging message, which is why I called it an outline solution.)

Scala string pattern matching for mathematical symbols

I have the following code:
val z: String = tree.symbol.toString
z match {
case "method +" | "method -" | "method *" | "method ==" =>
println("no special op")
false
case "method /" | "method %" =>
println("we have the special div operation")
true
case _ =>
false
}
Is it possible to create a match for the primitive operations in Scala:
"method *".matches("(method) (+-*==)")
I know that the (+-*) signs are used as quantifiers. Is there a way to match them anyway?
Thanks from a avidly Scala scholar!
Sure.
val z: String = tree.symbol.toString
val noSpecialOp = "method (?:[-+*]|==)".r
val divOp = "method [/%]".r
z match {
case noSpecialOp() =>
println("no special op")
false
case divOp() =>
println("we have the special div operation")
true
case _ =>
false
}
Things to consider:
I choose to match against single characters using [abc] instead of (?:a|b|c).
Note that - has to be the first character when using [], or it will be interpreted as a range. Likewise, ^ cannot be the first character inside [], or it will be interpreted as negation.
I'm using (?:...) instead of (...) because I don't want to extract the contents. If I did want to extract the contents -- so I'd know what was the operator, for instance, then I'd use (...). However, I'd also have to change the matching to receive the extracted content, or it would fail the match.
It is important not to forget () on the matches -- like divOp(). If you forget them, a simple assignment is made (and Scala will complain about unreachable code).
And, as I said, if you are extracting something, then you need something inside those parenthesis. For instance, "method ([%/])".r would match divOp(op), but not divOp().
Much the same as in Java. To escape a character in a regular expression, you prefix the character with \. However, backslash is also the escape character in standard Java/Scala strings, so to pass it through to the regular expression processing you must again prefix it with a backslash. You end up with something like:
scala> "+".matches("\\+")
res1 : Boolean = true
As James Iry points out in the comment below, Scala also has support for 'raw strings', enclosed in three quotation marks: """Raw string in which I don't need to escape things like \!""" This allows you to avoid the second level of escaping, that imposed by Java/Scala strings. Note that you still need to escape any characters that are treated as special by the regular expression parser:
scala> "+".matches("""\+""")
res1 : Boolean = true
Escaping characters in Strings works like in Java.
If you have larger Strings which need a lot of escaping, consider Scala's """.
E. g. """String without needing to escape anything \n \d"""
If you put three """ around your regular expression you don't need to escape anything anymore.