Scala string pattern matching for mathematical symbols - scala

I have the following code:
val z: String = tree.symbol.toString
z match {
case "method +" | "method -" | "method *" | "method ==" =>
println("no special op")
false
case "method /" | "method %" =>
println("we have the special div operation")
true
case _ =>
false
}
Is it possible to create a match for the primitive operations in Scala:
"method *".matches("(method) (+-*==)")
I know that the (+-*) signs are used as quantifiers. Is there a way to match them anyway?
Thanks from a avidly Scala scholar!

Sure.
val z: String = tree.symbol.toString
val noSpecialOp = "method (?:[-+*]|==)".r
val divOp = "method [/%]".r
z match {
case noSpecialOp() =>
println("no special op")
false
case divOp() =>
println("we have the special div operation")
true
case _ =>
false
}
Things to consider:
I choose to match against single characters using [abc] instead of (?:a|b|c).
Note that - has to be the first character when using [], or it will be interpreted as a range. Likewise, ^ cannot be the first character inside [], or it will be interpreted as negation.
I'm using (?:...) instead of (...) because I don't want to extract the contents. If I did want to extract the contents -- so I'd know what was the operator, for instance, then I'd use (...). However, I'd also have to change the matching to receive the extracted content, or it would fail the match.
It is important not to forget () on the matches -- like divOp(). If you forget them, a simple assignment is made (and Scala will complain about unreachable code).
And, as I said, if you are extracting something, then you need something inside those parenthesis. For instance, "method ([%/])".r would match divOp(op), but not divOp().

Much the same as in Java. To escape a character in a regular expression, you prefix the character with \. However, backslash is also the escape character in standard Java/Scala strings, so to pass it through to the regular expression processing you must again prefix it with a backslash. You end up with something like:
scala> "+".matches("\\+")
res1 : Boolean = true
As James Iry points out in the comment below, Scala also has support for 'raw strings', enclosed in three quotation marks: """Raw string in which I don't need to escape things like \!""" This allows you to avoid the second level of escaping, that imposed by Java/Scala strings. Note that you still need to escape any characters that are treated as special by the regular expression parser:
scala> "+".matches("""\+""")
res1 : Boolean = true

Escaping characters in Strings works like in Java.
If you have larger Strings which need a lot of escaping, consider Scala's """.
E. g. """String without needing to escape anything \n \d"""
If you put three """ around your regular expression you don't need to escape anything anymore.

Related

How to replace pound sign £ in scala

In sales column i have values with pound sign £1200. It is not readable by Data frame in scala, please help me for the same. i want column value in double, 1200. I am using below method but its not working.
def getRemovedDollarValue = udf(
(actualSales: String) => {
val actualSalesDouble = actualSales
.replace(",", "")
.replace("$", "")
.replace("\\u00A3","")
.replace("\\U00A3","")
.replaceAll("\\s", "_").trim().toDouble
java.lang.Double.parseDouble(actualSalesDouble.toString)
}
)
You need write: .replace("\u00A3","") instead of escaping .replace("\\u00A3","").
But I prefer just: .replace("£", "") - it is more readable.
I think the proposed solutions and comments all work but don't address the confusion behind why your code isn't working.
From the Pattern docs:
Thus the strings "\u2014" and "\\u2014", while not equal, compile into the same pattern, which matches the character with hexadecimal value 0x2014.
replace and replaceAll are both replacing all occurrences in a String, but only replaceAll is taking in a regular expression. You're passing in "\\u00A3" which will work as a pattern, but not a unicode literal due to the added backslash. As already suggested, either use replace with a unicode literal or the actual symbol, or change to replaceAll.

Scala string formating exercises error: not compiling

I am working on the exercises from https://www.scala-exercises.org/std_lib/formatting
For the following question, m answer seems incorrect but I do not know why.
val c = 'a' //unicode for a
val d = '\141' //octal for a
val e = '\"'
val f = '\\'
"%c".format(c) should be("a") //my answers
"%c".format(d) should be("a")
"%c".format(e) should be(")
"%c".format(f) should be(\)
your answer should be enclosed in quotes
"%c".format(e) should be("\"")
"%c".format(f) should be("\\")
because it wouldn't recognize string unless it's enclosed in quotes
Your last two lines are invalid Scala code and cannot be compiled:
// These are wrong
"%c".format(e) should be(")
"%c".format(f) should be(\)
The be() function needs to be passed a String, and neither of those calls are being passed a String. A String needs to start and end with a double-quote (there are some exceptions).
// In this case you started a String with a double-quote, but you are never
// closing the string with a second double-quote
"%c".format(e) should be(")
// In this case you are missing both double-quotes
"%c".format(f) should be(\)
In this case the code should be:
"%c".format(e) should be("\"")
"%c".format(f) should be("\\")
If you want a character to be treated literally in a String, you need to "escape" it with a backslash. So if you want to literally show a double-quote, you need to prefix it with a backslash:
\"
And as a String:
"\""
Similarily for a backslash:
\\
As a String:
"\\"
Using an IDE makes this easier to see. Using IntelliJ the String is green but the special non-literal characters are highlighted in orange.
Check quote signs.
https://www.tutorialspoint.com/scala/scala_strings.htm
https://docs.scala-lang.org/overviews/core/string-interpolation.html
https://learnxinyminutes.com/docs/scala/
You can run Scala code online and check yourself here:
https://scastie.scala-lang.org
https://ideone.com/

How to strip everything except digits from a string in Scala (quick one liners)

This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }

Implement Scala-style String Interpolation In Scala

I want to implement a Scala-style string interpolation in Scala. Here is an example,
val str = "hello ${var1} world ${var2}"
At runtime I want to replace "${var1}" and "${var2}" with some runtime strings. However, when trying to use Regex.replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String), I ran into the following problem:
import scala.util.matching.Regex
val placeholder = new Regex("""(\$\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
java.lang.IllegalArgumentException: No group with name {var1}
at java.util.regex.Matcher.appendReplacement(Matcher.java:800)
at scala.util.matching.Regex$Replacement$class.replace(Regex.scala:722)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:700)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:743)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:410)
... 32 elided
However, when I removed '$' from the regular expression, it worked:
val placeholder = new Regex("""(\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
res2: String = hello $A{var1}B world $A{var2}B
So my question is that whether this is a bug in Scala Regex. And if so, are there other elegant ways to achieve the same goal (other than brutal force replaceAllLiterally on all placeholders)?
$ is a treated specially in the replacement string. This is described in the documentation of replaceAllIn:
In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.
(Actually, that doesn't mention named group references, so I guess it's only sort of documented.)
Anyway, the takeaway here is that you need to escape the $ characters in the replacement string if you don't want them to be treated as references.
new scala.util.matching.Regex("""(\$\{\w+\})""")
.replaceAllIn("hello ${var1} world ${var2}", m => s"A\\${m.matched}B")
// "hello A${var1}B world A${var2}B"
It's hard to tell what you're expecting the behavior to do. The issue is that s"${m.matched}" is turning into "${var1}" (and "${var2}"). The '$' is special character to say "place the group with name {var1} here instead".
For example:
scala> placeholder.replaceAllIn(str, m => "$1")
res0: String = hello ${var1} world ${var2}
It replaces the match with the first capturing group (which is m itself).
It's hard to tell exactly what you're doing, but you could escape any $ like so:
scala> placeholder.replaceAllIn(str, m => s"${m.matched.replace("$","\\$")}")
res1: String = hello ${var1} world ${var2}
If what you really want to do is evaluate var1/var2 for some variables in the local scope of the method; that's not possible. In fact, the s"Hello, $name" pattern is actually converted into new StringContext("Hello, ", "").s(name) at compile time.

Why is there space at end of method names ending with an operator?

I've been learning Scala recently, and learned that for method names, if the method name ends in an operator symbol (such as defining unary_- for a class), and we specify the return type, we need a space between the final character of the method and the : which let's us specify the return type.
def unary_-: Rational = new Rational(-numer, denom)
The reasoning I have heard for this is that : is also a legal part of an identifier, so we need a way of separating the identifier and the end of the method name. But letters are legal parts of identifiers too, so why don't we need a space if we just have a method name that is all letters?
To quote the language spec (p. 12) or html:
First, an identifier can start with a letter
which can be followed by an arbitrary sequence of letters and digits. This may be
followed by underscore ‘_’ characters and another string composed of either letters
and digits or of operator characters
That is, to include operator characters into identifiers, they must be joined with an underscore.
Looking at def unary_-: Rational = new Rational(-numer, denom), with the underscore joining unary with -:, the colon is interpreted as part of the method name if there is no space. Therefore, with the colon being part of the method name, it can't find the colon precedes the return type.
scala> def test_-: Int = 1 // the method name is `test_-:`
<console>:1: error: '=' expected but identifier found.
scala> def test_- : Int = 1 // now the method name is `test_-`, and this is okay.
test_$minus: Int
If you want the colon to be part of the method name, it would have to look like this:
scala> def test_-: : Int = 1
test_$minus$colon: Int
Method names with just letters will not have this problem, because the colon isn't absorbed into the name following an underscore.