Spark Scala def with yield - scala

In SO 33655920 I come across the below, fine.
rdd = sc.parallelize([1, 2, 3, 4], 2)
def f(iterator): yield sum(iterator)
rdd.mapPartitions(f).collect()
In Scala, I cannot seem to get the the def in the same shorthand way. The equivalent is? I have searched and tried but to no avail.
Thanks in advance.

yield sum(iterator) in Python sums the elements of the iterator. The similar way of doing this in Scala would be:
val rdd = sc.parallelize(Array(1, 2, 3, 4), 2)
rdd.mapPartitions(it => Iterator(it.sum)).collect()

If you want to sum values in the partition you can write something like
val rdd = sc.parallelize(1 to 4, 2)
def f(i: Iterator[Int]) = Iterator(i.sum)
rdd.mapPartitions(f).collect()

Related

Why copying a Scala array using keyword yield gives me a vector

I tried to copy a Scala array using the yield keyword, but I got an vector in the end. Why and how can I get an copied array using yield?
scala> val s=Array(1,2,3,4,5); val copied_s=for (i<-0 until s.size) yield s(i)
The console returns
s: Array[Int] = Array(1, 2, 3, 4, 5)
copied_s: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5)
Use clone instead:
val c = s.clone
0 until ... creates a Range, and this is the source, from which the Vector is considered a good fit, not the Array.
scala> 0 until 4
res4: scala.collection.immutable.Range = Range(0, 1, 2, 3)
A big (...) helps too:
(for (i<-0 until s.size) yield s(i)).toArray
but clone is much smaller.

How to efficiently delete all elements from ListBuffer in Scala?

I have a ListBuffer with thousand elements. After program has done calculations I want to fill it with new data. Is there a way like in C with free() to empty it? Or is it a good way to assign null to my ListBuffer and garbage collector will do all the work?
The method clear does just that.
scala> val xs = scala.collection.mutable.ListBuffer(1,2,3,4,5)
xs: scala.collection.mutable.ListBuffer[Int] = ListBuffer(1, 2, 3, 4, 5)
scala> xs.clear()
scala> xs
res2: scala.collection.mutable.ListBuffer[Int] = ListBuffer()

How can I get a sum of arrays of tuples in scala

I have a simple array of tuples
val arr = Array((1,2), (3,4),(5,6),(7,8),(9,10))
I wish to get (1+3+5+7+9, 2+4+6+8+10) tuple as the answer
What is the best way to get the sum as tuples, similar to regular arrays. I tried
val res = arr.foldLeft(0,0)(_ + _)
This does not work.
Sorry about not writing the context. I was using it in scalding with algebird. Algebird allows sums of tuples and I assumed this would work. That was my mistake.
There is no such thing as Tuple addition, so that can't work. You would have to operate on each ordinate of the Tuple:
val res = arr.foldLeft(0,0){ case (sum, next) => (sum._1 + next._1, sum._2 + next._2) }
res: (Int, Int) = (25,30)
This should work nicely:
arr.foldLeft((0,0)){ case ((a0,b0),(a1,b1)) => (a0+a1,b0+b1) }
Addition isn't defined for tuples.
Use scalaz, which defines a tuple as a semigroup, allowing you to use the append operator |+|
import scalaz._
import Scalaz._
arr.fold((0,0))(_ |+| _)
Yet another alternative
val (a, b) = arr.unzip
//> a : Array[Int] = Array(1, 3, 5, 7, 9)
//| b : Array[Int] = Array(2, 4, 6, 8, 10)
(a.sum, b.sum)
//> res0: (Int, Int) = (25,30)

Index with Many Indices

Is there a quick scala idiom to have retrieve multiple elements of a a traversable using indices.
I am looking for something like
val L=1 to 4 toList
L(List(1,2)) //doesn't work
I have been using map so far, but wondering if there was a more "scala" way
List(1,2) map {L(_)}
Thanks in advance
Since a List is a Function you can write just
List(1,2) map L
Although, if you're going to be looking things up by index, you should probably use an IndexedSeq like Vector instead of a List.
You could add an implicit class that adds the functionality:
implicit class RichIndexedSeq[T](seq: IndexedSeq[T]) {
def apply(i0: Int, i1: Int, is: Int*): Seq[T] = (i0+:i1+:is) map seq
}
You can then use the sequence's apply method with one index or multiple indices:
scala> val data = Vector(1,2,3,4,5)
data: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)
scala> data(0)
res0: Int = 1
scala> data(0,2,4)
res1: Seq[Int] = ArrayBuffer(1, 3, 5)
You can do it with a for comprehension but it's no clearer than the code you have using map.
scala> val indices = List(1,2)
indices: List[Int] = List(1, 2)
scala> for (index <- indices) yield L(index)
res0: List[Int] = List(2, 3)
I think the most readable would be to implement your own function takeIndices(indices: List[Int]) that takes a list of indices and returns the values of a given List at those indices. e.g.
L.takeIndices(List(1,2))
List[Int] = List(2,3)

Scala: Yielding from one type of collection to another

Concerning the yield command in Scala and the following example:
val values = Set(1, 2, 3)
val results = for {v <- values} yield (v * 2)
Can anyone explain how Scala knows which type of collection to yield into? I know it is based on values, but how would I go about writing code that replicates yield?
Is there any way for me to change the type of the collection to yield into? In the example I want results to be of type List instead of Set.
Failing this, what is the best way to convert from one collection to another? I know about _:*, but as a Set is not a Seq this does not work. The best I could find thus far is val listResults = List() ++ results.
Ps. I know the example does not following the recommended functional way (which would be to use map), but it is just an example.
The for comprehensions are translated by compiler to map/flatMap/filter calls using this scheme.
This excellent answer by Daniel answers your first question.
To change the type of result collection, you can use collection.breakout (also explained in the post I linked above.)
scala> val xs = Set(1, 2, 3)
xs: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> val ys: List[Int] = (for(x <- xs) yield 2 * x)(collection.breakOut)
ys: List[Int] = List(2, 4, 6)
You can convert a Set to a List using one of following ways:
scala> List.empty[Int] ++ xs
res0: List[Int] = List(1, 2, 3)
scala> xs.toList
res1: List[Int] = List(1, 2, 3)
Recommended read: The Architecture of Scala Collections
If you use map/flatmap/filter instead of for comprehensions, you can use scala.collection.breakOut to create a different type of collection:
scala> val result:List[Int] = values.map(2*)(scala.collection.breakOut)
result: List[Int] = List(2, 4, 6)
If you wanted to build your own collection classes (which is the closest thing to "replicating yield" that makes any sense to me), you should have a look at this tutorial.
Try this:
val values = Set(1, 2, 3)
val results = for {v <- values} yield (v * 2).toList