Why does my println in rdd prints the string of elements? - scala

When I try to print the contents of my RDD, it prints something like displayed below, how can I print the content?
Thanks!
scala> lines
res15: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[3] at filter at <console>:23
scala> lines.take(5).foreach(println)
[Ljava.lang.String;#6d3db5d1
[Ljava.lang.String;#6e6be45e
[Ljava.lang.String;#6d5e0ff4
[Ljava.lang.String;#3a699444
[Ljava.lang.String;#69851a51

This is because it uses the toString implementation for the given object. In this case Array prints out the type and hash. If you convert it to a List then it will be a prettier output due to List's toString implementation
scala>println(Array("foo"))
[Ljava.lang.String;HASH
scala>println(Array("foo").toList)
List(foo)

Depending on how you want to print them out you can change your line that prints the elements to:
scala> lines.take(5).foreach(indvArray => indvArray.foreach(println))

Related

How to print array values in Scala? I am getting different values

Code:
object Permutations extends App
{
val ar=Array(1,2,3).combinations(2).foreach(println(_))
}
Output:
[I#378fd1ac
[I#49097b5d
[I#6e2c634b
I am trying to execute this but I am getting some other values.
How to print array values in Scala? Can any one help to print?
Use mkString
object Permutations extends App {
Array(1,2,3).combinations(2).foreach(x => println(x.mkString(", ")))
}
Scala REPL
scala> Array(1,2,3).combinations(2).foreach(x => println(x.mkString(", ")))
1, 2
1, 3
2, 3
When array instance is directly used for inside println. The toString method of array gets called and results in output like [I#49097b5d. So, use mkString for converting array instance to string.
Scala REPL
scala> println(Array(1, 2))
[I#2aadeb31
scala> Array(1, 2).mkString
res12: String = 12
scala> Array(1, 2).mkString(" ")
res13: String = 1 2
scala>
You can't print array directly, If you will try to print it will print the reference of that array.
You are almost there, Just iterate over array of array and then on individual array and display the elements like below
Array(1,2,3).combinations(2).foreach(_.foreach(println))
Or Just convert each array to string and display like below
Array(1,2,3).combinations(2).foreach(x=>println(x.mkString(" ")))
I hope this will help you

Can someone explain this line of Scala code to me?

Scala syntax has been driving me nuts. Below is a line of Scala from a Spark driver program. I get most of it except the very end.
val ratings = lines.map(x => x.toString().split("\t")(2))
The (2) just floating out there doesn't make sense. I understand intellectually that it's the third item in the RDD, but why is there not a dot or something connecting it to the rest of the statement?
It's Scala's syntax for accessing an Array element.
x.toString().split("\t")
The above returns an Array. Adding the (2) returns the third element in that array. This is syntactic sugar for calling .apply(2) on the array, which gives you the element at the supplied index.
An example:
val numbers = Array("beaver", "aardvark", "warthog")
numbers(0) // "beaver"; same as numbers.apply(0)
numbers(1) // "aardvark"
numbers(2) // "warthog"
Because the string x is split into an array and this is the syntax to access the array element
In my observation
val fruits = Array("Apple", "Banana", "Orange");
fruits.map(x => x.toString().split("\t")(0))
Array[String] = Array(Apple, Banana, Orange)
fruits.map(x => x.toString().split("\t"))
Array[Array[String]] = Array(Array(Apple), Array(Banana), Array(Orange))
fruits.map(x => x.toString())
Array[String] = Array(Apple, Banana, Orange)

Converting from Array[String] to Seq[String] in Scala

In the following Scala code I attempt to convert from a String that contains elements separated by "|" to a sequence Seq[String]. However the result is a WrappedArray of characters. How to make this work?
val array = "t1|t2".split("|")
println(array.toSeq)
results in:
WrappedArray(t, 1, |, t, 2)
What I need is:
Seq(t1,t2)
The below works. ie split by pipe character ('|') instead of pipe string ("|").
since split("|") calls overloaded definition that takes an regex string where pipe is a meta-character. This gets you the incorrect result as shown in the question.
scala> "t1|t2".split('|').toSeq
res10: Seq[String] = WrappedArray(t1, t2)

How to save a two-dimensional array into HDFS in spark?

Something like:
val arr : Array[Array[Double]] = new Array(featureSize)
sc.parallelize(arr, 100).saveAsTextFile(args(1))
Then Spark will store data type into HDFS.
Array in Scala exactly corresponds to Java Arrays - in particular, it's a mutable type, and its toString method will return a reference to the Array. When you save this RDD as textFile, it's invoking toString method on each element of the RDD and therefore giving you gibberish. If you want to output actual elements of the Array, you first have to stringify the Array, for example by applying mkString(",") method to each array. Example from Spark shell:
scala> Array(1,2,3).toString
res11: String = [I#31cba915
scala> Array(1,2,3).mkString(",")
res12: String = 1,2,3
For double arrays:
scala> sc.parallelize(Array( Array(1,2,3), Array(4,5,6), Array(7,8,9) )).collect.mkString("\n")
res15: String =
[I#41ff41b0
[I#5d31aba9
[I#67fd140b
scala> sc.parallelize(Array( Array(1,2,3), Array(4,5,6), Array(7,8,9) ).map(_.mkString(","))).collect.mkString("\n")
res16: String =
1,2,3
4,5,6
7,8,9
So, your code should be:
sc.parallelize(arr.map(_.mkString(",")), 100).saveAsTextFile(args(1))
or
sc.parallelize(arr), 100).map(_.mkString(",")).saveAsTextFile(args(1))

Scala: How to print all values in a long vector without ellipsis

I'd like to print all values of a vector which has about 700 elements. By default, a relatively small number (maybe 100 or so) are printed, and then an ellipsis (...). Is there a way to print all the values?
Of course, I could go through the elements one by one, but I'm hoping to avoid that.
EDIT: I am printing stuff via println. Unless I've misunderstood something, changing maxPrintString doesn't seem to affect println output (or toString, since I suppose println must be calling toString).
If you're using scala's REPL, it will print out the value of whatever expression you type in, but if the toString of that value would be unreasonably long, it truncates it and adds ....
If you want the whole thing, you just need to explicily print it. Use println.
scala> val list = List.fill(700)('a')
list: List[Char] = List(a, a, /*omitting some for brevity*/, a, ...
scala> println(list)
// it actually prints everything
// or you could print individual elements
scala> list foreach println