scala merge option sequences - scala

Want to merge val A = Option(Seq(1,2)) and val B = Option(Seq(3,4)) to yield a new option sequence
val C = Option(Seq(1,2,3,4))
This
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil)),
seems faster and more idiomatic than
val C = Option(A.toList.flatten ++ B.toList.flatten)
But is there a better way? And am I right that getOrElse is faster and lighter than toList.flatten?

What about a neat for comprehension:
val Empty = Some(Nil)
val C = for {
a <- A orElse Empty
b <- B orElse Empty
} yield a ++ b
Creates less intermediate options.
Or, you could just do a somewhat cumbersome pattern matching:
(A, B) match {
case (None, None) => Nil
case (None, sb#Some(b)) => sb
case (sa#Some(a), None) => sa
case (Some(a), Some(b)) => Some(a ++ b)
}
I think this at least creates less intermediate collections than the double flatten.

Your first case:
// In this case getOrElse is not needed as the option is clearly not `None`.
// So, you can replace the following:
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil))
// By this:
val C = Option(A.get ++ B.get) // A simple concatenation of two sequences.
C: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Your second case/option is wrong for multiple reasons.
val C = Option(A.toList.flatten ++ B.toList.flatten)
Option[List[Int]] = Some(List(1, 2, 3, 4))
It returns the incorrect type Option[List[Int]] instead of Option[Seq[Int]]
It needlessly invokes toList on A & B. You could simply add the options and invoke flatten on them.
It is not DRY and redundantly calls flatten on both A.toList & B.toList whereas it could call flatten on (A ++ B)
Instead of this, you could do this more efficiently:
val E = Option((A ++ B).flatten.toSeq)
E: Option[Seq[Int]] = Some(List(1, 2, 3, 4))

Using foldLeft
Seq(Some(List(1, 2)), None).foldLeft(List.empty[Int])(_ ++ _.getOrElse(List.empty[Int]))
result: List[Int] = List(1, 2)
Using flatten twice
Seq(Some(Seq(1, 2, 3)), Some(4, 5, 6), None).flatten.flatten
result: Seq(1, 2, 3, 4, 5, 6)
Scala REPL
scala> val a = Some(Seq(1, 2, 3))
a: Some[Seq[Int]] = Some(List(1, 2, 3))
scala> val b = Some(Seq(4, 5, 6))
b: Some[Seq[Int]] = Some(List(4, 5, 6))
scala> val c = None
c: None.type = None
scala> val d = Seq(a, b, c).flatten.flatten
d: Seq[Int] = List(1, 2, 3, 4, 5, 6)

Related

Merge two collections by interleaving values

How can I merge two lists / Seqs so it takes 1 element from list 1, then 1 element from list 2, and so on, instead of just appending list 2 at the end of list 1?
E.g
[1,2] + [3,4] = [1,3,2,4]
and not [1,2,3,4]
Any ideas? Most concat methods I've looked at seem to do to the latter and not the former.
Another way:
List(List(1,2), List(3,4)).transpose.flatten
So maybe your collections aren't always the same size. Using zip in that situation would create data loss.
def interleave[A](a :Seq[A], b :Seq[A]) :Seq[A] =
if (a.isEmpty) b else if (b.isEmpty) a
else a.head +: b.head +: interleave(a.tail, b.tail)
interleave(List(1, 2, 17, 27)
,Vector(3, 4)) //res0: Seq[Int] = List(1, 3, 2, 4, 17, 27)
You can do:
val l1 = List(1, 2)
val l2 = List(3, 4)
l1.zip(l2).flatMap { case (a, b) => List(a, b) }
Try
List(1,2)
.zip(List(3,4))
.flatMap(v => List(v._1, v._2))
which outputs
res0: List[Int] = List(1, 3, 2, 4)
Also consider the following implicit class
implicit class ListIntercalate[T](lhs: List[T]) {
def intercalate(rhs: List[T]): List[T] = lhs match {
case head :: tail => head :: (rhs.intercalate(tail))
case _ => rhs
}
}
List(1,2) intercalate List(3,4)
List(1,2,5,6,6,7,8,0) intercalate List(3,4)
which outputs
res2: List[Int] = List(1, 3, 2, 4)
res3: List[Int] = List(1, 3, 2, 4, 5, 6, 6, 7, 8, 0)

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

How do I do a Map comprehension with Scala?

With Python, I can do something like
listOfLists = [('a', -1), ('b', 0), ('c', 1)]
my_dict = {foo: bar for foo, bar in listOfLists}
my_dict == {'a': -1, 'b': 0, 'c': 1} => True
I know this as a dictionary comprehension. When I look for this operation with Scala, I find this incomprehensible document (pun intended).
Is there an idiomatic way to do this with Scala?
Bonus question: Can I filter with this operation as well like my_dict = {foo: bar for foo, bar in listOfLists if bar > 0}?
First, let's parse your Python code to figure out what it's doing.
my_dict = {
foo: bar <-- Key, value names
for foo, bar <-- Destructuring a list
in listOfLists <-- This is where they came from
}
So you can see that even in this very short example there's actually considerable redundancy and plenty of potential for failure if listOfLists isn't actually what it says it is.
If listOfLists actually is a list of pairs (key, value), then in Scala it's trivial:
listOfPairs.toMap
If, on the other hand, it really is lists, and you want to pull off the first one to make the key and save the rest as a value, it would be something like
listOfLists.map(x => x.head -> x.tail).toMap
You can select some of them by using collect instead. For instance, maybe you only want the lists of length 2 (you could if x.head > 0 to get your example), in which case you
listOfLists.collect{
case x if x.length == 2 => x.head -> x.last
}.toMap
or if it is literally a List, you could also
listOfLists.collect{
case key :: value :: Nil => key -> value
}.toMap
I'll compare list comprehension in Scala2.x and Python 3.x
1. Sequence
In python:
xs = [x*x for x in range(5)]
#xs = [0, 1, 4, 9, 16]
ys = list(map(lambda x: x*x, range(5)))
#ys = [0, 1, 4, 9, 16]
In Scala:
scala> val xs = for(x <- 0 until 5) yield x*x
xs: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 4, 9, 16)
scala> val ys = (0 until 5) map (x => x*x)
ys: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 4, 9, 16)
Or you really want a list:
scala> import collection.breakOut
scala> val xs: List[Int] = (for(x <- 0 until 5) yield x*x)(breakOut)
xs: List[Int] = List(0, 1, 4, 9, 16)
scala> val ys: List[Int] = (0 until 5).map(x => x*x)(breakOut)
ys: List[Int] = List(0, 1, 4, 9, 16)
scala> val zs = (for(x <- 0 until 5) yield x*x).toList
zs: List[Int] = List(0, 1, 4, 9, 16)
2. Set
In Python
s1 = { x//2 for x in range(10) }
#s1 = {0, 1, 2, 3, 4}
s2 = set(map(lambda x: x//2, range(10)))
#s2 = {0, 1, 2, 3, 4}
In Scala
scala> val s1 = (for(x <- 0 until 10) yield x/2).toSet
s1: scala.collection.immutable.Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s2: Set[Int] = (for(x <- 0 until 10) yield x/2)(breakOut)
s2: Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s3: Set[Int] = (0 until 10).map(_/2)(breakOut)
s3: Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s4 = (0 until 10).map(_/2).toSet
s4: scala.collection.immutable.Set[Int] = Set(0, 1, 2, 3, 4)
3. Dict
In Python:
pairs = [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
#d1 = {1: 'aa', 2: 'bb', 3: 'cc', 4: 'dd'}
d2 = dict([(k*2, v) for k, v in pairs])
#d2 = {2: 'a', 4: 'b', 6: 'c', 8: 'd'}
In Scala
scala> val pairs = Seq(1->"a", 2->"b", 3->"c", 4->"d")
pairs: Seq[(Int, String)] = List((1,a), (2,b), (3,c), (4,d))
scala> val d1 = (for((k, v) <- pairs) yield (k, v*2)).toMap
d1: scala.collection.immutable.Map[Int,String] = Map(1 -> aa, 2 -> bb, 3 -> cc, 4 -> dd)
scala> val d2 = Map(pairs map { case(k, v) => (k*2, v) } :_*)
d2: scala.collection.immutable.Map[Int,String] = Map(2 -> a, 4 -> b, 6 -> c, 8 -> d)
scala> val d3 = pairs map { case(k, v) => (k*2, v) } toMap
d3: scala.collection.immutable.Map[Int,String] = Map(2 -> a, 4 -> b, 6 -> c, 8 -> d)
scala> val d4: Map[Int, String] = (for((k, v) <- pairs) yield (k, v*2))(breakOut)
d4: Map[Int,String] = Map(1 -> aa, 2 -> bb, 3 -> cc, 4 -> dd)
Here are a few examples:
val listOfLists = Vector(Vector(1,2), Vector(3,4), Vector(5,6))
val m1 = listOfLists.map { case Seq(a,b) => (a,b) }.toMap
val m2 = listOfLists.collect { case Seq(a,b) if b>0 => (a,b) }.toMap
val m3 = (for (Seq(a,b) <- listOfLists) yield (a,b)).toMap
val m4 = (for (Seq(a,b) <- listOfLists if b>0) yield (a,b)).toMap
val m5 = Map(listOfLists.map { case Seq(a,b) => (a,b) }: _*)
val m6 = Map(listOfLists.collect { case Seq(a,b) => (a,b) }: _*)
val m7 = Map((for (Seq(a,b) <- listOfLists) yield (a,b)): _*)
val m8 = Map((for (Seq(a,b) <- listOfLists if b>0) yield (a,b)): _*)
You can create a Map using .toMap or Map(xs: _*). The collect method lets you filter as you map. And a for-comprehension uses syntax most similar to your example.

Inserting at position in List

This insert function is taken from :
http://aperiodic.net/phil/scala/s-99/p21.scala
def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = ls.splitAt(n) match {
case (pre, post) => pre ::: e :: post
}
I want to insert an element at every second element of a List so I use :
val sl = List("1", "2", "3", "4", "5") //> sl : List[String] = List(1, 2, 3, 4, 5)
insertAt("'a", 2, insertAt("'a", 4, sl)) //> res0: List[String] = List(1, 2, 'a, 3, 4, 'a, 5)
This is a very basic implementation, I want to use one of the functional constructs. I think I need
to use a foldLeft ?
Group the list into Lists of size 2, then combine those into lists separated by the separation character:
val sl = List("1","2","3","4","5") //> sl : List[String] = List(1, 2, 3, 4, 5)
val grouped = sl grouped(2) toList //> grouped : List[List[String]] = List(List(1, 2), List(3, 4), List(5))
val separatedList = grouped flatMap (_ :+ "a") //> separatedList : <error> = List(1, 2, a, 3, 4, a, 5, a)
Edit
Just saw that my solution has a trailing token that isn't in the question. To get rid of that do a length check:
val separatedList2 = grouped flatMap (l => if(l.length == 2) l :+ "a" else l)
//> separatedList2 : <error> = List(1, 2, a, 3, 4, a, 5)
You could also use sliding:
val sl = List("1", "2", "3", "4", "5")
def insertEvery(n:Int, el:String, sl:List[String]) =
sl.sliding(2, 2).foldRight(List.empty[String])( (xs, acc) => if(xs.length == n)xs:::el::acc else xs:::acc)
insertEvery(2,"x",sl) // res1: List[String] = List(1, 2, x, 3, 4, x, 5)
Forget about insertAt, use pure foldLeft:
def insertAtEvery[A](e: A, n: Int, ls: List[A]): List[A] =
ls.foldLeft[(Int, List[A])]((0, List.empty)) {
case ((pos, result), elem) =>
((pos + 1) % n, if (pos == n - 1) e :: elem :: result else elem :: result)
}._2.reverse
Recursion and pattern matching are functional constructs. Insert the new elem by pattern matching on the output of splitAt then recurse with the remaining input. Seems easier to read but I'm not satisfied with the type signature for this one.
def insertEvery(xs: List[Any], n: Int, elem: String):List[Any] = xs.splitAt(n) match {
case (xs, List()) => if(xs.size >= n) xs ++ elem else xs
case (xs, ys) => xs ++ elem ++ insertEvery(ys, n, elem)
}
Sample runs.
scala> val xs = List("1","2","3","4","5")
xs: List[String] = List(1, 2, 3, 4, 5)
scala> insertEvery(xs, 1, "a")
res1: List[Any] = List(1, a, 2, a, 3, a, 4, a, 5, a)
scala> insertEvery(xs, 2, "a")
res2: List[Any] = List(1, 2, a, 3, 4, a, 5)
scala> insertEvery(xs, 3, "a")
res3: List[Any] = List(1, 2, 3, a, 4, 5)
An implementation using recursion:
Note n must smaller than the size of List, or else an Exception would be raised.
scala> def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = n match {
| case 0 => e :: ls
| case _ => ls.head :: insertAt(e, n-1, ls.tail)
| }
insertAt: [A](e: A, n: Int, ls: List[A])List[A]
scala> insertAt("'a", 2, List("1", "2", "3", "4"))
res0: List[String] = List(1, 2, 'a, 3, 4)
Consider indexing list positions with zipWithIndex, and so
sl.zipWithIndex.flatMap { case(v,i) => if (i % 2 == 0) List(v) else List(v,"a") }

How to generate the power set of a set in Scala

I have a Set of items of some type and want to generate its power set.
I searched the web and couldn't find any Scala code that adresses this specific task.
This is what I came up with. It allows you to restrict the cardinality of the sets produced by the length parameter.
def power[T](set: Set[T], length: Int) = {
var res = Set[Set[T]]()
res ++= set.map(Set(_))
for (i <- 1 until length)
res = res.map(x => set.map(x + _)).flatten
res
}
This will not include the empty set. To accomplish this you would have to change the last line of the method simply to res + Set()
Any suggestions how this can be accomplished in a more functional style?
Looks like no-one knew about it back in July, but there's a built-in method: subsets.
scala> Set(1,2,3).subsets foreach println
Set()
Set(1)
Set(2)
Set(3)
Set(1, 2)
Set(1, 3)
Set(2, 3)
Set(1, 2, 3)
Notice that if you have a set S and another set T where T = S ∪ {x} (i.e. T is S with one element added) then the powerset of T - P(T) - can be expressed in terms of P(S) and x as follows:
P(T) = P(S) ∪ { p ∪ {x} | p ∈ P(S) }
That is, you can define the powerset recursively (notice how this gives you the size of the powerset for free - i.e. adding 1-element doubles the size of the powerset). So, you can do this tail-recursively in scala as follows:
scala> def power[A](t: Set[A]): Set[Set[A]] = {
| #annotation.tailrec
| def pwr(t: Set[A], ps: Set[Set[A]]): Set[Set[A]] =
| if (t.isEmpty) ps
| else pwr(t.tail, ps ++ (ps map (_ + t.head)))
|
| pwr(t, Set(Set.empty[A])) //Powerset of ∅ is {∅}
| }
power: [A](t: Set[A])Set[Set[A]]
Then:
scala> power(Set(1, 2, 3))
res2: Set[Set[Int]] = Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3), Set(1, 2))
It actually looks much nicer doing the same with a List (i.e. a recursive ADT):
scala> def power[A](s: List[A]): List[List[A]] = {
| #annotation.tailrec
| def pwr(s: List[A], acc: List[List[A]]): List[List[A]] = s match {
| case Nil => acc
| case a :: as => pwr(as, acc ::: (acc map (a :: _)))
| }
| pwr(s, Nil :: Nil)
| }
power: [A](s: List[A])List[List[A]]
Here's one of the more interesting ways to write it:
import scalaz._, Scalaz._
def powerSet[A](xs: List[A]) = xs filterM (_ => true :: false :: Nil)
Which works as expected:
scala> powerSet(List(1, 2, 3)) foreach println
List(1, 2, 3)
List(1, 2)
List(1, 3)
List(1)
List(2, 3)
List(2)
List(3)
List()
See for example this discussion thread for an explanation of how it works.
(And as debilski notes in the comments, ListW also pimps powerset onto List, but that's no fun.)
Use the built-in combinations function:
val xs = Seq(1,2,3)
(0 to xs.size) flatMap xs.combinations
// Vector(List(), List(1), List(2), List(3), List(1, 2), List(1, 3), List(2, 3),
// List(1, 2, 3))
Note, I cheated and used a Seq, because for reasons unknown, combinations is defined on SeqLike. So with a set, you need to convert to/from a Seq:
val xs = Set(1,2,3)
(0 to xs.size).flatMap(xs.toSeq.combinations).map(_.toSet).toSet
//Set(Set(1, 2, 3), Set(2, 3), Set(), Set(3), Set(2), Set(1), Set(1, 3),
//Set(1, 2))
Can be as simple as:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] =
xs.foldLeft(Seq(Seq[A]())) {(sets, set) => sets ++ sets.map(_ :+ set)}
Recursive implementation:
def powerSet[A](xs: Seq[A]): Seq[Seq[A]] = {
def go(xsRemaining: Seq[A], sets: Seq[Seq[A]]): Seq[Seq[A]] = xsRemaining match {
case Nil => sets
case y :: ys => go(ys, sets ++ sets.map(_ :+ y))
}
go(xs, Seq[Seq[A]](Seq[A]()))
}
All the other answers seemed a bit complicated, here is a simple function:
def powerSet (l:List[_]) : List[List[Any]] =
l match {
case Nil => List(List())
case x::xs =>
var a = powerSet(xs)
a.map(n => n:::List(x)):::a
}
so
powerSet(List('a','b','c'))
will produce the following result
res0: List[List[Any]] = List(List(c, b, a), List(b, a), List(c, a), List(a), List(c, b), List(b), List(c), List())
Here's another (lazy) version... since we're collecting ways of computing the power set, I thought I'd add it:
def powerset[A](s: Seq[A]) =
Iterator.range(0, 1 << s.length).map(i =>
Iterator.range(0, s.length).withFilter(j =>
(i >> j) % 2 == 1
).map(s)
)
Here's a simple, recursive solution using a helper function:
def concatElemToList[A](a: A, list: List[A]): List[Any] = (a,list) match {
case (x, Nil) => List(List(x))
case (x, ((h:List[_]) :: t)) => (x :: h) :: concatElemToList(x, t)
case (x, (h::t)) => List(x, h) :: concatElemToList(x, t)
}
def powerSetRec[A] (a: List[A]): List[Any] = a match {
case Nil => List()
case (h::t) => powerSetRec(t) ++ concatElemToList(h, powerSetRec (t))
}
so the call of
powerSetRec(List("a", "b", "c"))
will give the result
List(List(c), List(b, c), List(b), List(a, c), List(a, b, c), List(a, b), List(a))