Replace " with \" - scala

How do I replace " with \".
Here is what im trying :
def main(args:Array[String]) = {
val line:String = "replace \" quote";
println(line);
val updatedLine = line.replaceAll("\"" , "\\\"");
println(updatedLine);
}
output :
replace " quote
replace " quote
The output should be :
replace " quote
replace \" quote

Use "replaceAllLiterally" method of StringOps class. This replaces all literal occurrences of the argument:
scala> val line:String = "replace \" quote"
line: String = replace " quote
scala> line.replaceAllLiterally("\"", "\\\"")
res8: String = replace \" quote

Two more \\ does the job:
scala> line.replaceAll("\"" , "\\\\\"");
res5: java.lang.String = replace \" quote
The problem here is that there are two 'layers' escaping the strings. The first layer is the compiler, which we can easily see in the REPL:
scala> "\""
res0: java.lang.String = "
scala> "\\"
res1: java.lang.String = \
scala> "\\\""
res2: java.lang.String = \"
scala> val line:String = "replace \" quote";
line: String = replace " quote
The second layer is the regular expression interpreter. This one is harder to see, but can be seen by applyin your example:
scala> line.replaceAll("\"" , "\\\"");
res5: java.lang.String = replace " quote
What the reg. exp. interpreter really receives is \", which is interpreted as only ". So, we need the reg. exp. to receive \\". To make the compiler give us \ we need to write \\.
Let's see the unescaping:
The right case: \\\" the compiler sees \", the regular expression sees \".
The wrong case: \\" the compiler sees \", the regular expression sees ".
It can be a bit confusing despite being very straight forward.
As pointed by #sschaef, another alternative it to use """ triple-quoting, strings in this form aren't unescaped by the compiler:
scala> line.replaceAll("\"" , """\\"""");
res6: java.lang.String = replace \" quote

#pedrofurla nicely explains why you saw the behavior you did. Another solution to your problem would be to use a raw string with scala's triple-quote character. Anything between a pair of triple-quotes is treated as a raw string with no interpretation by the Scala compiler. Thus:
scala> line.replaceAll("\"", """\\"""")
res1: String = replace \" quote
Used in conjunction with stripMargin, triple-quotes are a powerful way to embed raw strings into your code. For example:
val foo = """
|hocus
|pocus""".stripMargin
yields the string: "\nhocus\npocus"

Related

How to convert a space delimited file to a CSV file in Scalar spark?

I has a CSV file.
This is my Input:
a _ \_ \ b_c b\_c "
Now, I want to convert a space delimited file to a CSV file. What should I do?
Fields not specified are considered "String 0" and are not enclosed
in quotes.
This is Specifications:
1.The string "_" by itself is converted to a null string.
( -n option changes "_" )
2.The string \c is converted to c.
3.The backslash character \ by itself is converted to a space
4.The underscore is converted to a space if it occurs in a string.
( -s option changes "_" )
5.\n at the end of a line is converted automatically to \r\n.
6.Within String 1, " is converted to "".
I want to have the desired output result as below. Please help me.
"a","","_"," ","b c","b_c",""""
The requirements are a little bit confusing to me, but you can try with this (which produces the expected output):
import scala.util.matching.Regex
val input = "a _ \\_ \\ b_c b\\_c \""
// List of replacements required (first replacement will be apply first)
val replacements: List[(Regex, String)] = List(
("""^_$""".r, ""),
("""(?<!\\)_""".r, " "),
("""\\(.)""".r, "$1"),
("""\\""".r, " "),
(""""""".r, "\"\""))
def applyReplacements(inputString: String, replacements: List[(Regex, String)]): String =
replacements match {
case Nil =>
inputString
case replacement :: tail =>
applyReplacements(
replacement._1.replaceAllIn(inputString, replacement._2),
tail)
}
def processLine(input: String): String = {
val inputArray = input.split(" ")
val outputArray = inputArray.map(x => applyReplacements(x, replacements))
val finalLine = outputArray.map(x => s"""\"${x}\"""").mkString(",")
// Use s"${finalLine}\r\n" instead if you need the '\r\n' ending
finalLine
}
processLine(input)
// output:
// String = "a","","_"," ","b c","b_c",""""
Probably you will have to apply some modifications to fully adapt it to your requirements (which are not fully clear to me).
If you need to apply this over a Spark RDD, you will have to put processLine in a map so that it processes every line in the RDD.
Hope it helps.

Scala: Replace all space characters with %20

I need to replace all space characters with %20. I wrote this in Scala
strToConvert.map(c => if (Character.isSpaceChar(c)) "%20" else c).mkString
Is there any better way to do this in Scala?
[Edit]
Lets assume replaceAll is not available and we'd like to implement algorithm similar to replaceAll method
you can use String.replaceAll(what_to_replace, with_what).
eg. to replace single whitespace with %20
scala> val input = "this is my http request execute me"
input: String = this is my http request execute me
scala> input.replaceAll(" ", "%20")
res1: String = this%20is%20my%20http%20request%20%20%20%20%20%20%20%20%20%20execute%20me
or use \\s regex (matches single whitespace character)
scala> input.replaceAll("\\s", "%20")
res2: String = this%20is%20my%20http%20request%20%20%20%20%20%20%20%20%20%20execute%20me
If you want multiple whitespaces to replace to one single %20, then use \\s+ which matches sequence of one or more whitespace characters
scala> input.replaceAll("\\s+", "%20")
res3: String = this%20is%20my%20http%20request%20execute%20me

Scala Regex with $ and String Interpolation

I am writing a regex in scala
val regex = "^foo.*$".r
this is great but if I want to do
var x = "foo"
val regex = s"""^$x.*$""".r
now we have a problem because $ is ambiguous. is it possible to have string interpolation and be able to write a regex as well?
I can do something like
val x = "foo"
val regex = ("^" + x + ".*$").r
but I don't like to do a +
You can use $$ to have a literal $ in an interpolated string.
You should use the raw interpolator when enclosing a string in triple-quotes as the s interpolator will re-enable escape sequences that you might expect to be interpreted literally in triple-quotes. It doesn't make a difference in your specific case but it's good to keep in mind.
so val regex = raw"""^$x.*$$""".r
Using %s should work.
var x = "foo"
val regex = """^%s.*$""".format(x).r
In the off case you need %s to be a regex match term, just do
val regex = """^%s.*%s$""".format(x, "%s").r

Regex error with \S in Scala: scala.StringContext$InvalidEscapeException: invalid escape '\S'

I'm trying to collect only domains from List of urls with this regex:
val regDomains = s"""(?im)^http[s]*://[^/]+/(?!\S)""".r
In the docs http://www.scala-lang.org/api/2.12.x/scala/util/matching/Regex.html
there is a statement:
using triple quotes avoids having to escape the backslash character
but I got exception
scala.StringContext$InvalidEscapeException: invalid escape '\S' not
one of [\b, \t, \n, \f, \r, \, \", \']
whenever I try to run my program. Why is this so?
There is no need to use string interpolation s"""...""", doing so breaks the regex:
scala> val myRegex = """\S""".r
myRegex: scala.util.matching.Regex = \S
scala> myRegex.findAllIn("a b c 1 2 3").toList
res0: List[String] = List(a, b, c, 1, 2, 3)
scala> val myRegex = s"""\S""".r
scala.StringContext$InvalidEscapeException: invalid escape '\S' not one of [\b, \t, \n, \f, \r, \\, \", \'] at index 0 in "\S". Use \\ for literal \.
If you need to insert variable info, then the raw interpolator is useful:
raw"""(?im)^http[s]*://[^/]+/(?!\S)$myVar""".r

Split function difference between char and string arguments

I try the following code in scala REPL:
"ASD-ASD.KZ".split('.')
res7: Array[String] = Array(ASD-ASD, KZ)
"ASD-ASD.KZ".split(".")
res8: Array[String] = Array()
Why this function calls have a different results?
There's a big difference in the function use.
The split function is overloaded, and this is the implementation from the source code of Scala:
/** For every line in this string:
Strip a leading prefix consisting of blanks or control characters
followed by | from the line.
*/
def stripMargin: String = stripMargin('|')
private def escape(ch: Char): String = "\\Q" + ch + "\\E"
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separator: Char): Array[String] = toString.split(escape(separator))
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separators: Array[Char]): Array[String] = {
val re = separators.foldLeft("[")(_+escape(_)) + "]"
toString.split(re)
}
So when you're calling split() with a char, you ask to split by that specific char:
scala> "ASD-ASD.KZ".split('.')
res0: Array[String] = Array(ASD-ASD, KZ)
And when you're calling split() with a string, it means that you want to have a regex. So for you to get the exact result using the double quotes, you need to do:
scala> "ASD-ASD.KZ".split("\\.")
res2: Array[String] = Array(ASD-ASD, KZ)
Where:
First \ escapes the following character
Second \ escapes character for the dot which is a regex expression, and we want to use it as a character
. - the character to split the string by