How do I convert an Array[String] to a Set[String]? - scala

I have an array of strings. What's the best way to turn it into an immutable set of strings?
I presume this is a single method call, but I can't find it in the scala docs.
I'm using scala 2.8.1.

This method called toSet, e.g.:
scala> val arr = Array("a", "b", "c")
arr: Array[java.lang.String] = Array(a, b, c)
scala> arr.toSet
res1: scala.collection.immutable.Set[java.lang.String] = Set(a, b, c)
In this case toSet method does not exist for the Array. But there is an implicit conversion to ArrayOps.
In such cases I can advise you to look in Predef. Normally you should find some suitable implicit conversion there. genericArrayOps would be used in this case. genericWrapArray also can be used, but it has lower priority.

scala> val a = Array("a", "b", "c")
a: Array[java.lang.String] = Array(a, b, c)
scala> Set(a: _*)
res0: scala.collection.immutable.Set[java.lang.String] = Set(a, b, c)
// OR
scala> a.toSet
res1: scala.collection.immutable.Set[java.lang.String] = Set(a, b, c)

Related

Scala: map(f) and map(_.f)

I thought in scala map(f) is the same as map(_.f) as map(x => x.f), but turns out it is not
scala> val a = List(1,2,3)
val a: List[Int] = List(1, 2, 3)
scala> a.map(toString)
val res7: List[Char] = List(l, i, n)
scala> a.map(_.toString)
val res8: List[String] = List(1, 2, 3)
What happenes when a.map(toString) is called? Where did the three charaacters l, i, and n come from?
map(f) is not the same as map(_.f()). It's the same as map(f(_)). That is, it's going to call f(x), not x.f(), for each x in the list.
So a.map(toString) should be an error because the normal toString method does not take any arguments. My guess is that in your REPL session you've defined your own toString method that takes an argument and that's the one that's being called.

How to pad a Scala collection for minimum size?

How to make sure a Seq is of certain minimum length, in Scala?
The below code does what I want (adds empty strings until arr has three entries), but it feels clumsy.
scala> val arr = Seq("a")
arr: Seq[String] = List(a)
scala> arr ++ Seq.fill(3-arr.size)("")
res2: Seq[String] = List(a, "", "")
A way to fulfill this would be a merger of two sequences: take from first, but if it runs out, continue from second. What was such method called...?
I find this slightly better:
scala> (arr ++ Seq.fill(3)("")).take(3)
res4: Seq[String] = List(a, "", "")
And even better, thanks #thomas-böhm
scala> arr.toArray.padTo(3,"")
res5: Array[String] = Array(a, "", "")
arr.padTo(3,"")
It was too trivial.

Expand a RDD[List[(ImmutableBytesWritable, Put)]] to RDD[(ImmutableBytesWritable, Put)] [duplicate]

In Scala I can flatten a collection using :
val array = Array(List("1,2,3").iterator,List("1,4,5").iterator)
//> array : Array[Iterator[String]] = Array(non-empty iterator, non-empty itera
//| tor)
array.toList.flatten //> res0: List[String] = List(1,2,3, 1,4,5)
But how can I perform similar in Spark ?
Reading the API doc http://spark.apache.org/docs/0.7.3/api/core/index.html#spark.RDD there does not seem to be a method which provides this functionality ?
Use flatMap and the identity Predef, this is more readable than using x => x, e.g.
myRdd.flatMap(identity)
Try flatMap with an identity map function (y => y):
scala> val x = sc.parallelize(List(List("a"), List("b"), List("c", "d")))
x: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[1] at parallelize at <console>:12
scala> x.collect()
res0: Array[List[String]] = Array(List(a), List(b), List(c, d))
scala> x.flatMap(y => y)
res3: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[3] at flatMap at <console>:15
scala> x.flatMap(y => y).collect()
res4: Array[String] = Array(a, b, c, d)

Define a 2d list and append lists to it in a for loop, scala

I want to define a 2d list before a for loop and afterwards I want to append to it 1d lists in a for loop, like so:
var 2dEmptyList: listOf<List<String>>
for (element<-elements){
///do some stuff
2dEmptyList.plusAssign(1dlist)
}
The code above does not work. But I can't seem to find a solution for this and it is so simple!
scala> val elements = List("a", "b", "c")
elements: List[String] = List(a, b, c)
scala> val twoDimenstionalList: List[List[String]] = List.empty[List[String]]
twoDimenstionalList: List[List[String]] = List()
scala> val res = for(element <- elements) yield twoDimenstionalList ::: List(element)
res: List[List[java.io.Serializable]] = List(List(a), List(b), List(c))
Better still:
scala> twoDimenstionalList ::: elements.map(List(_))
res8: List[List[String]] = List(List(a), List(b), List(c))
If you want 2dEmptyList be mutable, please consider using scala.collection.mutable.ListBuffer:
scala> val ll = scala.collection.mutable.ListBuffer.empty[List[String]]
ll: scala.collection.mutable.ListBuffer[List[String]] = ListBuffer()
scala> ll += List("Hello")
res7: ll.type = ListBuffer(List(Hello))
scala> ll += List("How", "are", "you?")
res8: ll.type = ListBuffer(List(Hello), List(How, are, you?))

How to flatten a collection with Spark/Scala?

In Scala I can flatten a collection using :
val array = Array(List("1,2,3").iterator,List("1,4,5").iterator)
//> array : Array[Iterator[String]] = Array(non-empty iterator, non-empty itera
//| tor)
array.toList.flatten //> res0: List[String] = List(1,2,3, 1,4,5)
But how can I perform similar in Spark ?
Reading the API doc http://spark.apache.org/docs/0.7.3/api/core/index.html#spark.RDD there does not seem to be a method which provides this functionality ?
Use flatMap and the identity Predef, this is more readable than using x => x, e.g.
myRdd.flatMap(identity)
Try flatMap with an identity map function (y => y):
scala> val x = sc.parallelize(List(List("a"), List("b"), List("c", "d")))
x: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[1] at parallelize at <console>:12
scala> x.collect()
res0: Array[List[String]] = Array(List(a), List(b), List(c, d))
scala> x.flatMap(y => y)
res3: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[3] at flatMap at <console>:15
scala> x.flatMap(y => y).collect()
res4: Array[String] = Array(a, b, c, d)