scala - one line convert string split to vals - scala

I saw that following answer: Scala split string to tuple, but in the question the OP is asking for a string to a List. I would like to take a string, split it by some character, and convert it to a tuple so they can be saved as vals:
val (a,b,c) = "A.B.C".split(".").<toTupleMagic>
Is this possible? This would be a conversion from an Array[String] to a Tuple3 of (String,String,String)

It is unnecessary:
val Array(a, b, c) = "A.B.C".split('.')
Note that I converted the parameter to split from String to Char: if you pass a String, it is treated as a regex pattern, and . matches anything (so you'll get an array of empty strings back).
If you truly want to convert it to tuple, you can use Shapeless.

Related

Not able to understand a Scala code snippet with `mkString` on `Array[Any]`

I am a beginner exploring scala.The following is a Scala function.
def printArray[K](array:Array[K]) = array.mkString("Array(" , ", " , ")")
val array2 = Array("a", 2, true)
printArray(array2)
The output is
Array(a, 2, true)
My doubts
Here we have given the array type as K. What does K means? Does it mean all types?
How is the fucntion 'mkString' able to give the output as Array(a, 2, true).
Basically I don't understand the concatenation part.
Appreciate your help.
The mkString method called as
arr.mkString(prefix, separator, suffix)
will invoke toString on all array elements, prepend the prefix, then concatenate all strings separating them by the separator, and finally append the suffix.
The type parameter K in printArray[K] is ignored, it could be replaced by an existential. It's just a method with a bad name and confusing signature.
When you store any primitive data types (like Int) together with types that extend AnyRef (like String) into the same array, the least upper bound is inferred to be Any, so in
printArray(array2)
the K is set to Any, and the mkString works as described above, gluing together
Array( prefix
a "a".toString
, separator
2 2.toString
, separator
true true.toString
) suffix
yielding the string Array(a,2,true).
K is not a type here it is a type parameter, for more intuition have a look at other question Type parameter in scala
In this specific example K is infered to by Any - the most specific type that satisfies all 3 values "a", 2 and true
val array2: Array[Any] = Array("a", 2, true)
the mkString function joins all items of collection into single string. It adds separator between items and some strings in the beginning and end. Documentation mkString
If you look at your array2 definition in REPL, you will see that array2 is of type Any, the parent type of all the other types in Scala
scala> val array2 = Array("a", 2, true)
//array2: Array[Any] = Array(a, 2, true)
So when you call the function def printArray[K](array:Array[K]) = array.mkString("Array(" , ", " , ")") K now is treated as Any which returns a string with intitial String as Array( and ending string as ) and all the values separated by ,.
def mkString(start: String, sep: String, end: String): String =
addString(new StringBuilder(), start, sep, end).toString

How to refer Spark RDD element multiple times using underscore notation?

How to refer Spark RDD element multiple times using underscore notations.
For example I need to convert RDD[String] to RDD[(String, Int)]. I can create anonymous function using function variables but I would like to do this using Underscore notation. How I can achieve this.
PFB sample code.
val x = List("apple", "banana")
val rdd1 = sc.parallelize(x)
// Working
val rdd2 = rdd1.map(x => (x, x.length))
// Not working
val rdd3 = rdd1.map((_, _.length))
Why does the last line above not work?
An underscore or (more commonly) a placeholder syntax is a marker of a single input parameter. It's nice to use for simple functions, but can get tricky to get right with two or more.
You can find the definitive answer in the Scala language specification's Placeholder Syntax for Anonymous Functions:
An expression (of syntactic category Expr) may contain embedded underscore symbols _ at places where identifiers are legal. Such an expression represents an anonymous function where subsequent occurrences of underscores denote successive parameters.
Note that one underscore references one input parameter, two underscores are for two different input parameters and so on.
With that said, you cannot use the placeholder twice and expect that they'll reference the same input parameter. That's not how it works in Scala and hence the compiler error.
// Not working
val rdd3 = rdd1.map((_, _.length))
The above is equivalent to the following:
// Not working
val rdd3 = rdd1.map { (a: String, b: String) => (a, b.length)) }
which is clearly incorrect as map expects a function of one input parameter.

Converting from Array[String] to Seq[String] in Scala

In the following Scala code I attempt to convert from a String that contains elements separated by "|" to a sequence Seq[String]. However the result is a WrappedArray of characters. How to make this work?
val array = "t1|t2".split("|")
println(array.toSeq)
results in:
WrappedArray(t, 1, |, t, 2)
What I need is:
Seq(t1,t2)
The below works. ie split by pipe character ('|') instead of pipe string ("|").
since split("|") calls overloaded definition that takes an regex string where pipe is a meta-character. This gets you the incorrect result as shown in the question.
scala> "t1|t2".split('|').toSeq
res10: Seq[String] = WrappedArray(t1, t2)

Scala method toLowerCase in spark

val file = sc.textFile(filePath)
val sol1=file.map(x=>x.split("\t")).map(x=>Array(x(4),x(5),x(1)))
val sol2=sol1.map(x=>x(2).toLowerCase)
In sol1, I have created an Rdd[Array[String]] and I want to put for every array the 3rd string element in LowerCase so call the method toLowerCase which should do that but instead it transform the string in lowercase char??
I assume you want to convert 3rd array element to lower case
val sol1=file.map(x=>x.split("\t"))
.map(x => Array(x(4),x(5),x(1).toLowerCase))
In your code, sol2 will be the sequence of string, not the sequence of array.

Scala updating Map error

I'm trying to write method addWordToMap, that shouls add a word w to a list in map's values if key occurences equals to occ. And I don't understand why compiler says that map.updated(occ, map.apply(occ)++w) return Map[Occurences, List[Any]]. My idea is there is some troubles with concatenation, but it seems quite correct for me. Thank you!
type Word = String
type Occurrences = List[(Char, Int)]
def addWordToMap(map: Map[Occurrences, List[Word]],
w: Word, occ: Occurrences): Map[Occurrences, List[Word]] = {
map.updated(occ, map.apply(occ)++w)
}
You're looking for :+, not ++.
It compiles with ++ for a combination of unpleasant reasons: it looks like you're trying to concatenate two collections, so the compiler implicitly converts the string to a collection of characters, and you end up with a collection whose element type is the least upper bound of Char and String, which is Any.