Split function difference between char and string arguments - scala

I try the following code in scala REPL:
"ASD-ASD.KZ".split('.')
res7: Array[String] = Array(ASD-ASD, KZ)
"ASD-ASD.KZ".split(".")
res8: Array[String] = Array()
Why this function calls have a different results?

There's a big difference in the function use.
The split function is overloaded, and this is the implementation from the source code of Scala:
/** For every line in this string:
Strip a leading prefix consisting of blanks or control characters
followed by | from the line.
*/
def stripMargin: String = stripMargin('|')
private def escape(ch: Char): String = "\\Q" + ch + "\\E"
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separator: Char): Array[String] = toString.split(escape(separator))
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separators: Array[Char]): Array[String] = {
val re = separators.foldLeft("[")(_+escape(_)) + "]"
toString.split(re)
}
So when you're calling split() with a char, you ask to split by that specific char:
scala> "ASD-ASD.KZ".split('.')
res0: Array[String] = Array(ASD-ASD, KZ)
And when you're calling split() with a string, it means that you want to have a regex. So for you to get the exact result using the double quotes, you need to do:
scala> "ASD-ASD.KZ".split("\\.")
res2: Array[String] = Array(ASD-ASD, KZ)
Where:
First \ escapes the following character
Second \ escapes character for the dot which is a regex expression, and we want to use it as a character
. - the character to split the string by

Related

Converting from Array[String] to Seq[String] in Scala

In the following Scala code I attempt to convert from a String that contains elements separated by "|" to a sequence Seq[String]. However the result is a WrappedArray of characters. How to make this work?
val array = "t1|t2".split("|")
println(array.toSeq)
results in:
WrappedArray(t, 1, |, t, 2)
What I need is:
Seq(t1,t2)
The below works. ie split by pipe character ('|') instead of pipe string ("|").
since split("|") calls overloaded definition that takes an regex string where pipe is a meta-character. This gets you the incorrect result as shown in the question.
scala> "t1|t2".split('|').toSeq
res10: Seq[String] = WrappedArray(t1, t2)

Concatenate characters in Scala

Suppose we have a string "code". How would we concatenate any two characters? Say for example we need to concatenate last two characters,
str.init.last + str.last gives result as 201. How would we get de instead?
You can use string interpolation to make any combination of characters:
scala> val code = "code"
code: String = code
scala> s"${code(1)}${code(3)}"
res0: String = oe
"code".init.last.toString + "code".last.toString
val res7: String = de
(use toString first to convert char to String and then concatenate with +)

How to combine raw with string interpolation in Scala?

Just started Scala and have a question.
val num = 10
val str = "Learning\t${num}Scala"
Now I am trying to print str without escaping \t but with num interpolation. Is this possible? Tried couple of variations below but they didn't work
scala>s"${str}"
scala>s"""${str}"""
scala>raw"""${str}"""
The question is how do I print Learning\t10Scala
Here is something that can be done, with same amount of code.
Write a function called times and make it insert some string in the middle of some other string
scala> def times(n: Int)(str: String): String = List.fill(n)(str).mkString("")
times: (n: Int)String
scala> s"""hello${times(3)("\t")}world"""
res0: String = hello world

How do I escape tilde character in scala?

Given that i have a file that looks like this
CS~84~Jimmys Bistro~Jimmys
...
using tilde (~) as a delimiter, how can i split it?
val company = dataset.map(k=>k.split(""\~"")).map(
k => Company(k(0).trim, k(1).toInt, k(2).trim, k(3).trim)
The above don't work
Hmmm, I don't see where it needs to be escaped.
scala> val str = """CS~84~Jimmys Bistro~Jimmys"""
str: String = CS~84~Jimmys Bistro~Jimmys
scala> str.split('~')
res15: Array[String] = Array(CS, 84, Jimmys Bistro, Jimmys)
And the array elements don't need to be trimmed unless you know that errant spaces can be part of the input.

Replace " with \"

How do I replace " with \".
Here is what im trying :
def main(args:Array[String]) = {
val line:String = "replace \" quote";
println(line);
val updatedLine = line.replaceAll("\"" , "\\\"");
println(updatedLine);
}
output :
replace " quote
replace " quote
The output should be :
replace " quote
replace \" quote
Use "replaceAllLiterally" method of StringOps class. This replaces all literal occurrences of the argument:
scala> val line:String = "replace \" quote"
line: String = replace " quote
scala> line.replaceAllLiterally("\"", "\\\"")
res8: String = replace \" quote
Two more \\ does the job:
scala> line.replaceAll("\"" , "\\\\\"");
res5: java.lang.String = replace \" quote
The problem here is that there are two 'layers' escaping the strings. The first layer is the compiler, which we can easily see in the REPL:
scala> "\""
res0: java.lang.String = "
scala> "\\"
res1: java.lang.String = \
scala> "\\\""
res2: java.lang.String = \"
scala> val line:String = "replace \" quote";
line: String = replace " quote
The second layer is the regular expression interpreter. This one is harder to see, but can be seen by applyin your example:
scala> line.replaceAll("\"" , "\\\"");
res5: java.lang.String = replace " quote
What the reg. exp. interpreter really receives is \", which is interpreted as only ". So, we need the reg. exp. to receive \\". To make the compiler give us \ we need to write \\.
Let's see the unescaping:
The right case: \\\" the compiler sees \", the regular expression sees \".
The wrong case: \\" the compiler sees \", the regular expression sees ".
It can be a bit confusing despite being very straight forward.
As pointed by #sschaef, another alternative it to use """ triple-quoting, strings in this form aren't unescaped by the compiler:
scala> line.replaceAll("\"" , """\\"""");
res6: java.lang.String = replace \" quote
#pedrofurla nicely explains why you saw the behavior you did. Another solution to your problem would be to use a raw string with scala's triple-quote character. Anything between a pair of triple-quotes is treated as a raw string with no interpretation by the Scala compiler. Thus:
scala> line.replaceAll("\"", """\\"""")
res1: String = replace \" quote
Used in conjunction with stripMargin, triple-quotes are a powerful way to embed raw strings into your code. For example:
val foo = """
|hocus
|pocus""".stripMargin
yields the string: "\nhocus\npocus"