In Spark, how do I transform my RDD into a list of differences between RDD items?

In Spark, how do I transform my RDD into a list of differences between RDD items? - scala

Suppose I have an RDD of integers that looks like this:
10, 20, 30, 40, 50, 60, 70, 80 ...
(ie there is a stream of different integers)
and modify the RDD so it looks like this:
15, 25, 35, 45, 55, 65, 75, 85...
(ie each item on the RDD is the difference of of the two RDDs above.)
My question is: In Spark, how do I transform my RDD into a list of differences between RDD items?

You can take help of rdd's sliding function. like below
import org.apache.spark.mllib.rdd.RDDFunctions._
val rdd=sc.parallelize(List(10, 20, 30, 40, 50, 60, 70, 80))
rdd.sliding(2).map(_.sum/2).collect
//output
res14: Array[Int] = Array(15, 25, 35, 45, 55, 65, 75)

Related

Pyspark RDD filter behaviour when filter condition variable is modified

When the following code is run:
A = ss.sparkContext.parallelize(range(1, 100))
t = 50
B = A.filter(lambda x: x < t)
print(B.count())
t = 10
C = B.filter(lambda x: x > t)
print(C.count())
The output is:
49
0
Which is incorrect as there are 39 values between 10 and 49. It seems like changing t to 10 from 50 effected the first filter as well and it got re-evalutated so when both filters are applied consecutively it effectively becomes x<10 which would result in 1, 2, 3, 4, 5, 6 ,7, 8, 9 followed by x>10 resulting in an empty rdd.
But When I add debug prints in the code the result is not what I expect and I am looking for an explanation:
A = ss.sparkContext.parallelize(range(1, 100))
t = 50
B = A.filter(lambda x: x < t)
print(B.collect())
t = 10
print(B.collect())
print(B.count())
C = B.filter(lambda x: x > t)
print(C.collect())
print(C.count())
The output is:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
9
[]
0
How come the count is 10 after t=10 but print(B.collect()) shows the expected rdd with values from 1 to 49? If triggering collect after changing t re-did the filter, then shouldn't collect() show values from 1-9?
I am new to pyspark, I suspect this has to do with spark's lazy operations and caching. Can someone explain what is going on behind the scenes?
Thanks!

Your assumption is correct, the observed behaviour is related to Spark's lazy evaluation of transformations.
When B.count() is executed, Spark simply applies the filter x < t with t = 50 and prints the expected value of 49.
When C.count() is executed, Spark sees two filters in the excution plan of C, namely x < t and x > t. At this point in time t has been set to 10 and no element of the rdd satiesfies both conditions to be smaller and larger than 10. Spark ignores that fact that the first filter has already been evaluated. When a Spark action is called all transformations in the history of the current rdd are executed (unless some intermediate result has been cached, see below).
A way to examine this behaviour (a bit) more in details is to switch to Scala and print toDebugString for both rdds.1
println(B.toDebugString)
prints
(4) MapPartitionsRDD[1] at filter at SparkStarter.scala:23 []
| ParallelCollectionRDD[0] at parallelize at SparkStarter.scala:19 []
while
println(C.toDebugString)
prints
(4) MapPartitionsRDD[2] at filter at SparkStarter.scala:28 []
| MapPartitionsRDD[1] at filter at SparkStarter.scala:23 []
| ParallelCollectionRDD[0] at parallelize at SparkStarter.scala:19 []
Here we can see that for rdd B one filter is applied and for rdd C two filters are applied.
How to fix the issue?
If the result of the first filter is cached the expected result is printed out. When then t is changed and the second filter is applied C.count() only triggers the second filter based on the cached result of B:
A = ss.sparkContext.parallelize(range(1, 100))
t = 50
B = A.filter(lambda x: x < t).cache()
print(B.count())
t = 10
C = B.filter(lambda x: x > t)
print(C.count())
prints the expected result.
49
39
1 Unfortunately this works only in the Scala version of Spark. PySpark seems to "condense" the output of toDebugString (version 3.1.1).

Extra bytes being added to serialization with BooPickle and RocksDb

So I'm using BooPickle to serialize Scala classes before writing them to RocksDB. To serialize a class,
case class Key(a: Long, b: Int) {
def toStringEncoding: String = s"${a}-${b}"
}
I have this implicit class
implicit class KeySerializer(key: Key) {
def serialize: Array[Byte] =
Pickle.intoBytes(key.toStringEncoding).array
}
The method toStringEncoding is necessary because BooPickle wasn't serializing the case class in a way that worked well with RocksDb's requirements on key ordering. I then write a bunch of key, value pairs to several SST files and ingest them into RocksDb. However when I go to look up the keys from the db, they're not found.
If I iterate over all of the keys in the db, I find keys are successfully written, however extra bytes are written to the byte representation in the db. For example if key.serialize outputs something like this
Array[Byte] = Array( 25, 49, 54, 48, 53, 55, 52, 52, 48, 48, 48, 45, 48, 45, 49, 54, 48, 53, 55, 52, 52, 48, 51, 48, 45, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
What I'll find in the db is something like this
Array[Byte] = Array( 25, 49, 54, 48, 53, 55, 52, 52, 48, 48, 48, 45, 48, 45, 49, 54, 48, 53, 55, 52, 52, 48, 51, 48, 45, 48, 51, 101, 52, 97, 49, 100, 102, 48, 50, 53, 5, 8, ...)
Extra non zero bytes replace the zero bytes at the end of the byte array. In addition the size of the byte arrays are different. When I call the serialize method the size of the byte array is 512, but when I retrieve the key from the db the size is 4112. Anyone know what might be causing this?

I have no experience with RocksDb or BooPickle but I guess that the problem is in calling ByteBuffer.array.
It returns the whole array backing the byte buffer rather than the relevant part.
You can look e.g. here Gets byte array from a ByteBuffer in java
how to properly extract the data from a ByteBuffer.

The BooPickle docs suggest the following for getting BooPickled data as a byte array:
val data: Array[Byte] = Array.ofDim[Byte](buf.remaining)
buf.get(data)
So in your case it would be something like
def serialize: Array[Byte] = {
val buf = Pickle.intoBytes(key.toStringEncoding)
val arr = Array.ofDim[Byte](buf.remaining)
buf.get(arr)
arr
}

Nested Looping over dynamic list of list in scala [duplicate]

Given the following list:
List(List(1,2,3), List(4,5))
I would like to generate all the possible combinations. Using yield, it can be done as follows:
scala> for (x <- l.head; y <- l.last) yield (x,y)
res17: List[(Int, Int)] = List((1,4), (1,5), (2,4), (2,5), (3,4), (3,5))
But the problem I have is that the List[List[Int]] is not fixed; it can grow and shrink in size, so I never know how many for loops I will need in advance. What I would like is to be able to pass that list into a function which will dynamically generate the combinations regardless of the number of lists I have, so:
def generator (x : List[List[Int]) : List[List[Int]]
Is there a built-in library function that can do this. If not how do I go about doing this. Any pointers and hints would be great.
UPDATE:
The answer by #DNA blows the heap with the following (not so big) nested List structure:
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
Calling the generator2 function as follows:
generator2(
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
)
Is there a way to generate the cartesian product without blowing the heap?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.LowPriorityImplicits.wrapRefArray(LowPriorityImplicits.scala:73)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:82)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)

Here's a recursive solution:
def generator(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generator(x.tail).map(i :: _))
}
which produces:
val a = List(List(1, 2, 3), List(4, 5))
val b = List(List(1, 2, 3), List(4, 5), List(6, 7))
generator(a) //> List(List(1, 4), List(1, 5), List(2, 4),
//| List(2, 5), List(3, 4), List(3, 5))
generator(b) //> List(List(1, 4, 6), List(1, 4, 7), List(1, 5, 6),
//| List(1, 5, 7), List(2, 4, 6), List(2, 4, 7),
//| List(2, 5, 6), List(2, 5, 7), Listt(3, 4, 6),
//| List(3, 4, 7), List(3, 5, 6), List(3, 5, 7))
Update: the second case can also be written as a for comprehension, which may be a little clearer:
def generator2(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: t => for (j <- generator2(t); i <- h) yield i :: j
}
Update 2: for larger datasets, if you run out of memory, you can use Streams instead (if it makes sense to process the results incrementally). For example:
def generator(x: Stream[Stream[Int]]): Stream[Stream[Int]] =
if (x.isEmpty) Stream(Stream.empty)
else x.head.flatMap(i => generator(x.tail).map(i #:: _))
// NB pass in the data as Stream of Streams, not List of Lists
generator(input).take(3).foreach(x => println(x.toList))
>List(0, 0, 0, 0, 0, 0, 0)
>List(0, 0, 0, 0, 0, 200, 0)
>List(0, 0, 0, 0, 100, 0, 0)

Feels like your problem can be described in terms of recursion:
If you have n lists of int:
list1 of size m and list2, ... list n
generate the X combinations for list2 to n (so n-1 lists)
for each combination, you generate m new ones for each value of list1.
the base case is a list of one list of int, you just split all the elements in singleton lists.
so with List(List(1,2), List(3), List(4, 5))
the result of your recursive call is List(List(3,4),List(3,5)) and for each you add 2 combinations: List(1,3,4), List(2,3,4), List(1,3,5), List(2,3,5).

Ezekiel has exactly what I was looking for. This is just a minor tweak of it to make it generic.
def generateCombinations[T](x: List[List[T]]): List[List[T]] = {
x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generateCombinations(x.tail).map(i :: _))
}
}

Here is another solution based on Ezekiel's one. More verbose, but it's tail recursion (stack-safe).
def generateCombinations[A](in: List[List[A]]): List[List[A]] =
generate(in, List.empty)
#tailrec
private def generate[A](in: List[List[A]], acc: List[List[A]]): List[List[A]] = in match {
case Nil => acc
case head :: tail => generate(tail, generateAcc(acc, head))
}
private def generateAcc[A](oldAcc: List[List[A]], as: List[A]): List[List[A]] = {
oldAcc match {
case Nil => as.map(List(_))
case nonEmptyAcc =>
for {
a <- as
xs <- nonEmptyAcc
} yield a :: xs
}
}

I realize this is old, but it seems like no other answer provided the non-recursive solution with fold.
def generator[A](xs: List[List[A]]): List[List[A]] = xs.foldRight(List(List.empty[A])) { (next, combinations) =>
for (a <- next; as <- combinations) yield a +: as
}

Set operation to divide lists by one another in Scala

Right now I have 2 lists in Scala:
val one = List(50, 10, 17, 8, 16)
val two = List(582, 180, 174, 159, 158)
These lists are going to be of the same length, and right now I'm looking to divide each element of the first list by a corresponding element in the second. In other words, I want a list that consists of:
List(50/582, 10/180, etc...)
Is there a set operation that accomplishes this that can be done without looping?
Thank you!

You can use the zip function.
val one = List(50, 10, 17, 8, 16)
val two = List(582, 180, 174, 159, 158)
one.zip(two).map {
case (a, b) => a.toDouble/b.toDouble
}

Generating all possible combinations from a List[List[Int]] in Scala

Given the following list:
List(List(1,2,3), List(4,5))
I would like to generate all the possible combinations. Using yield, it can be done as follows:
scala> for (x <- l.head; y <- l.last) yield (x,y)
res17: List[(Int, Int)] = List((1,4), (1,5), (2,4), (2,5), (3,4), (3,5))
But the problem I have is that the List[List[Int]] is not fixed; it can grow and shrink in size, so I never know how many for loops I will need in advance. What I would like is to be able to pass that list into a function which will dynamically generate the combinations regardless of the number of lists I have, so:
def generator (x : List[List[Int]) : List[List[Int]]
Is there a built-in library function that can do this. If not how do I go about doing this. Any pointers and hints would be great.
UPDATE:
The answer by #DNA blows the heap with the following (not so big) nested List structure:
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
Calling the generator2 function as follows:
generator2(
List(
List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300),
List(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300),
List(0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300),
List(0, 50, 100, 150, 200, 250, 300),
List(0, 100, 200, 300),
List(0, 200),
List(0)
)
)
Is there a way to generate the cartesian product without blowing the heap?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.LowPriorityImplicits.wrapRefArray(LowPriorityImplicits.scala:73)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:82)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at recfun.Main$.recfun$Main$$generator$1(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at recfun.Main$$anonfun$recfun$Main$$generator$1$1.apply(Main.scala:83)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)

Here's a recursive solution:
def generator(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generator(x.tail).map(i :: _))
}
which produces:
val a = List(List(1, 2, 3), List(4, 5))
val b = List(List(1, 2, 3), List(4, 5), List(6, 7))
generator(a) //> List(List(1, 4), List(1, 5), List(2, 4),
//| List(2, 5), List(3, 4), List(3, 5))
generator(b) //> List(List(1, 4, 6), List(1, 4, 7), List(1, 5, 6),
//| List(1, 5, 7), List(2, 4, 6), List(2, 4, 7),
//| List(2, 5, 6), List(2, 5, 7), Listt(3, 4, 6),
//| List(3, 4, 7), List(3, 5, 6), List(3, 5, 7))
Update: the second case can also be written as a for comprehension, which may be a little clearer:
def generator2(x: List[List[Int]]): List[List[Int]] = x match {
case Nil => List(Nil)
case h :: t => for (j <- generator2(t); i <- h) yield i :: j
}
Update 2: for larger datasets, if you run out of memory, you can use Streams instead (if it makes sense to process the results incrementally). For example:
def generator(x: Stream[Stream[Int]]): Stream[Stream[Int]] =
if (x.isEmpty) Stream(Stream.empty)
else x.head.flatMap(i => generator(x.tail).map(i #:: _))
// NB pass in the data as Stream of Streams, not List of Lists
generator(input).take(3).foreach(x => println(x.toList))
>List(0, 0, 0, 0, 0, 0, 0)
>List(0, 0, 0, 0, 0, 200, 0)
>List(0, 0, 0, 0, 100, 0, 0)

Feels like your problem can be described in terms of recursion:
If you have n lists of int:
list1 of size m and list2, ... list n
generate the X combinations for list2 to n (so n-1 lists)
for each combination, you generate m new ones for each value of list1.
the base case is a list of one list of int, you just split all the elements in singleton lists.
so with List(List(1,2), List(3), List(4, 5))
the result of your recursive call is List(List(3,4),List(3,5)) and for each you add 2 combinations: List(1,3,4), List(2,3,4), List(1,3,5), List(2,3,5).

Ezekiel has exactly what I was looking for. This is just a minor tweak of it to make it generic.
def generateCombinations[T](x: List[List[T]]): List[List[T]] = {
x match {
case Nil => List(Nil)
case h :: _ => h.flatMap(i => generateCombinations(x.tail).map(i :: _))
}
}

Here is another solution based on Ezekiel's one. More verbose, but it's tail recursion (stack-safe).
def generateCombinations[A](in: List[List[A]]): List[List[A]] =
generate(in, List.empty)
#tailrec
private def generate[A](in: List[List[A]], acc: List[List[A]]): List[List[A]] = in match {
case Nil => acc
case head :: tail => generate(tail, generateAcc(acc, head))
}
private def generateAcc[A](oldAcc: List[List[A]], as: List[A]): List[List[A]] = {
oldAcc match {
case Nil => as.map(List(_))
case nonEmptyAcc =>
for {
a <- as
xs <- nonEmptyAcc
} yield a :: xs
}
}

I realize this is old, but it seems like no other answer provided the non-recursive solution with fold.
def generator[A](xs: List[List[A]]): List[List[A]] = xs.foldRight(List(List.empty[A])) { (next, combinations) =>
for (a <- next; as <- combinations) yield a +: as
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

In Spark, how do I transform my RDD into a list of differences between RDD items? - scala

You can take help of rdd's sliding function. like below import org.apache.spark.mllib.rdd.RDDFunctions._ val rdd=sc.parallelize(List(10, 20, 30, 40, 50, 60, 70, 80)) rdd.sliding(2).map(_.sum/2).collect //output res14: Array[Int] = Array(15, 25, 35, 45, 55, 65, 75)

Related

Pyspark RDD filter behaviour when filter condition variable is modified

Extra bytes being added to serialization with BooPickle and RocksDb

Nested Looping over dynamic list of list in scala [duplicate]

Set operation to divide lists by one another in Scala

Generating all possible combinations from a List[List[Int]] in Scala

Categories

Resources