How do I do a Map comprehension with Scala? - scala

With Python, I can do something like
listOfLists = [('a', -1), ('b', 0), ('c', 1)]
my_dict = {foo: bar for foo, bar in listOfLists}
my_dict == {'a': -1, 'b': 0, 'c': 1} => True
I know this as a dictionary comprehension. When I look for this operation with Scala, I find this incomprehensible document (pun intended).
Is there an idiomatic way to do this with Scala?
Bonus question: Can I filter with this operation as well like my_dict = {foo: bar for foo, bar in listOfLists if bar > 0}?

First, let's parse your Python code to figure out what it's doing.
my_dict = {
foo: bar <-- Key, value names
for foo, bar <-- Destructuring a list
in listOfLists <-- This is where they came from
}
So you can see that even in this very short example there's actually considerable redundancy and plenty of potential for failure if listOfLists isn't actually what it says it is.
If listOfLists actually is a list of pairs (key, value), then in Scala it's trivial:
listOfPairs.toMap
If, on the other hand, it really is lists, and you want to pull off the first one to make the key and save the rest as a value, it would be something like
listOfLists.map(x => x.head -> x.tail).toMap
You can select some of them by using collect instead. For instance, maybe you only want the lists of length 2 (you could if x.head > 0 to get your example), in which case you
listOfLists.collect{
case x if x.length == 2 => x.head -> x.last
}.toMap
or if it is literally a List, you could also
listOfLists.collect{
case key :: value :: Nil => key -> value
}.toMap

I'll compare list comprehension in Scala2.x and Python 3.x
1. Sequence
In python:
xs = [x*x for x in range(5)]
#xs = [0, 1, 4, 9, 16]
ys = list(map(lambda x: x*x, range(5)))
#ys = [0, 1, 4, 9, 16]
In Scala:
scala> val xs = for(x <- 0 until 5) yield x*x
xs: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 4, 9, 16)
scala> val ys = (0 until 5) map (x => x*x)
ys: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 4, 9, 16)
Or you really want a list:
scala> import collection.breakOut
scala> val xs: List[Int] = (for(x <- 0 until 5) yield x*x)(breakOut)
xs: List[Int] = List(0, 1, 4, 9, 16)
scala> val ys: List[Int] = (0 until 5).map(x => x*x)(breakOut)
ys: List[Int] = List(0, 1, 4, 9, 16)
scala> val zs = (for(x <- 0 until 5) yield x*x).toList
zs: List[Int] = List(0, 1, 4, 9, 16)
2. Set
In Python
s1 = { x//2 for x in range(10) }
#s1 = {0, 1, 2, 3, 4}
s2 = set(map(lambda x: x//2, range(10)))
#s2 = {0, 1, 2, 3, 4}
In Scala
scala> val s1 = (for(x <- 0 until 10) yield x/2).toSet
s1: scala.collection.immutable.Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s2: Set[Int] = (for(x <- 0 until 10) yield x/2)(breakOut)
s2: Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s3: Set[Int] = (0 until 10).map(_/2)(breakOut)
s3: Set[Int] = Set(0, 1, 2, 3, 4)
scala> val s4 = (0 until 10).map(_/2).toSet
s4: scala.collection.immutable.Set[Int] = Set(0, 1, 2, 3, 4)
3. Dict
In Python:
pairs = [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
#d1 = {1: 'aa', 2: 'bb', 3: 'cc', 4: 'dd'}
d2 = dict([(k*2, v) for k, v in pairs])
#d2 = {2: 'a', 4: 'b', 6: 'c', 8: 'd'}
In Scala
scala> val pairs = Seq(1->"a", 2->"b", 3->"c", 4->"d")
pairs: Seq[(Int, String)] = List((1,a), (2,b), (3,c), (4,d))
scala> val d1 = (for((k, v) <- pairs) yield (k, v*2)).toMap
d1: scala.collection.immutable.Map[Int,String] = Map(1 -> aa, 2 -> bb, 3 -> cc, 4 -> dd)
scala> val d2 = Map(pairs map { case(k, v) => (k*2, v) } :_*)
d2: scala.collection.immutable.Map[Int,String] = Map(2 -> a, 4 -> b, 6 -> c, 8 -> d)
scala> val d3 = pairs map { case(k, v) => (k*2, v) } toMap
d3: scala.collection.immutable.Map[Int,String] = Map(2 -> a, 4 -> b, 6 -> c, 8 -> d)
scala> val d4: Map[Int, String] = (for((k, v) <- pairs) yield (k, v*2))(breakOut)
d4: Map[Int,String] = Map(1 -> aa, 2 -> bb, 3 -> cc, 4 -> dd)

Here are a few examples:
val listOfLists = Vector(Vector(1,2), Vector(3,4), Vector(5,6))
val m1 = listOfLists.map { case Seq(a,b) => (a,b) }.toMap
val m2 = listOfLists.collect { case Seq(a,b) if b>0 => (a,b) }.toMap
val m3 = (for (Seq(a,b) <- listOfLists) yield (a,b)).toMap
val m4 = (for (Seq(a,b) <- listOfLists if b>0) yield (a,b)).toMap
val m5 = Map(listOfLists.map { case Seq(a,b) => (a,b) }: _*)
val m6 = Map(listOfLists.collect { case Seq(a,b) => (a,b) }: _*)
val m7 = Map((for (Seq(a,b) <- listOfLists) yield (a,b)): _*)
val m8 = Map((for (Seq(a,b) <- listOfLists if b>0) yield (a,b)): _*)
You can create a Map using .toMap or Map(xs: _*). The collect method lets you filter as you map. And a for-comprehension uses syntax most similar to your example.

Related

Looking for an FP ranking implementation which handles ties (i.e. equal values)

Starting from a sorted sequence of values, my goal is to assign a rank to each value, using identical ranks for equal values (aka ties):
Input: Vector(1, 1, 3, 3, 3, 5, 6)
Output: Vector((0,1), (0,1), (1,3), (1,3), (1,3), (2,5), (3,6))
A few type aliases for readability:
type Rank = Int
type Value = Int
type RankValuePair = (Rank, Value)
An imperative implementation using a mutable rank variable could look like this:
var rank = 0
val ranked1: Vector[RankValuePair] = for ((value, index) <- values.zipWithIndex) yield {
if ((index > 0) && (values(index - 1) != value)) rank += 1
(rank, value)
}
// ranked1: Vector((0,1), (0,1), (1,3), (1,3), (1,3), (2,5), (3,6))
To hone my FP skills, I was trying to come up with a functional implementation:
val ranked2: Vector[RankValuePair] = values.sliding(2).foldLeft((0 , Vector.empty[RankValuePair])) {
case ((rank: Rank, rankedValues: Vector[RankValuePair]), Vector(currentValue, nextValue)) =>
val newRank = if (nextValue > currentValue) rank + 1 else rank
val newRankedValues = rankedValues :+ (rank, currentValue)
(newRank, newRankedValues)
}._2
// ranked2: Vector((0,1), (0,1), (1,3), (1,3), (1,3), (2,5))
It is less readable, and – more importantly – is missing the last value (due to using sliding(2) on an odd number of values).
How could this be fixed and improved?
This works well for me:
// scala
val vs = Vector(1, 1, 3, 3, 3, 5, 6)
val rank = vs.distinct.zipWithIndex.toMap
val result = vs.map(i => (rank(i), i))
The same in Java 8 using Javaslang:
// java(slang)
Vector<Integer> vs = Vector(1, 1, 3, 3, 3, 5, 6);
Function<Integer, Integer> rank = vs.distinct().zipWithIndex().toMap(t -> t);
Vector<Tuple2<Integer, Integer>> result = vs.map(i -> Tuple(rank.apply(i), i));
The output of both variants is
Vector((0, 1), (0, 1), (1, 3), (1, 3), (1, 3), (2, 5), (3, 6))
*) Disclosure: I'm the creator of Javaslang
This is nice and concise but it assumes that your Values don't go negative. (Actually it just assumes that they can never start with -1.)
val vs: Vector[Value] = Vector(1, 1, 3, 3, 3, 5, 6)
val rvps: Vector[RankValuePair] =
vs.scanLeft((-1,-1)){ case ((r,p), v) =>
if (p == v) (r, v) else (r + 1, v)
}.tail
edit
Modification that makes no assumptions, as suggested by #Kolmar.
vs.scanLeft((0,vs.headOption.getOrElse(0))){ case ((r,p), v) =>
if (p == v) (r, v) else (r + 1, v)
}.tail
Here's an approach with recursion, pattern matching and guards.
The interesting part is where the head and head of the tail (h and ht respectively) are de-constructed from the list and an if checks if they are equal. The logic for each case adjusts the rank and proceeds on the remaining part of the list.
def rank(xs: Vector[Value]): List[RankValuePair] = {
def rankR(xs: List[Value], acc: List[RankValuePair], rank: Rank): List[RankValuePair] = xs match{
case Nil => acc.reverse
case h :: Nil => rankR(Nil, (rank, h) :: acc, rank)
case h :: ht :: t if (h == ht) => rankR(xs.tail, (rank, h) :: acc, rank)
case h :: ht :: t if (h != ht) => rankR(xs.tail, (rank, h) :: acc, rank + 1)
}
rankR(xs.toList, List[RankValuePair](), 0)
}
Output:
scala> rank(xs)
res14: List[RankValuePair] = List((0,1), (0,1), (1,3), (1,3), (1,3), (2,5), (3,6))
This is a modification of the solution by #jwvh, that doesn't make any assumptions about the values:
val vs = Vector(1, 1, 3, 3, 3, 5, 6)
vs.sliding(2).scanLeft(0, vs.head) {
case ((rank, _), Seq(a, b)) => (if (a != b) rank + 1 else rank, b)
}.toVector
Note, that it would throw if vs is empty, so you'd have to use vs.headOption getOrElse 0, or check if the input is empty beforehand: if (vs.isEmpty) Vector.empty else ...
import scala.annotation.tailrec
type Rank = Int
// defined type alias Rank
type Value = Int
// defined type alias Value
type RankValuePair = (Rank, Value)
// defined type alias RankValuePair
def rankThem(values: List[Value]): List[RankValuePair] = {
// Assumes that the "values" are sorted
#tailrec
def _rankThem(currentRank: Rank, currentValue: Value, ranked: List[RankValuePair], values: List[Value]): List[RankValuePair] = values match {
case value :: tail if value == currentValue => _rankThem(currentRank, value, (currentRank, value) +: ranked, tail)
case value :: tail if value > currentValue => _rankThem(currentRank + 1, value, (currentRank + 1, value) +: ranked, tail)
case Nil => ranked.reverse
}
_rankThem(0, Int.MinValue, List.empty[RankValuePair], values.sorted)
}
// rankThem: rankThem[](val values: List[Value]) => List[RankValuePair]
val valueList = List(1, 1, 3, 3, 5, 6)
// valueList: List[Int] = List(1, 1, 3, 3, 5, 6)
val rankValueList = rankThem(valueList)[RankedValuePair], values: Vector[Value])
// rankValueList: List[RankValuePair] = List((1,1), (1,1), (2,3), (2,3), (3,5), (4,6))
val list = List(1, 1, 3, 3, 5, 6)
val result = list
.groupBy(identity)
.mapValues(_.size)
.toArray
.sortBy(_._1)
.zipWithIndex
.flatMap(tuple => List.fill(tuple._1._2)((tuple._2, tuple._1._1)))
result: Array[(Int, Int)] = Array((0,1), (0,1), (1,3), (1,3), (2,5), (3,6))
The idea is using groupBy to find identical elements and find their occurrences and then sort and then flatMap. Time complexity I would say is O(nlogn), groupBy is O(n), sort is O(nlogn), fl

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

scala merge option sequences

Want to merge val A = Option(Seq(1,2)) and val B = Option(Seq(3,4)) to yield a new option sequence
val C = Option(Seq(1,2,3,4))
This
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil)),
seems faster and more idiomatic than
val C = Option(A.toList.flatten ++ B.toList.flatten)
But is there a better way? And am I right that getOrElse is faster and lighter than toList.flatten?
What about a neat for comprehension:
val Empty = Some(Nil)
val C = for {
a <- A orElse Empty
b <- B orElse Empty
} yield a ++ b
Creates less intermediate options.
Or, you could just do a somewhat cumbersome pattern matching:
(A, B) match {
case (None, None) => Nil
case (None, sb#Some(b)) => sb
case (sa#Some(a), None) => sa
case (Some(a), Some(b)) => Some(a ++ b)
}
I think this at least creates less intermediate collections than the double flatten.
Your first case:
// In this case getOrElse is not needed as the option is clearly not `None`.
// So, you can replace the following:
val C = Option(A.getOrElse(Nil) ++ B.getOrElse(Nil))
// By this:
val C = Option(A.get ++ B.get) // A simple concatenation of two sequences.
C: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Your second case/option is wrong for multiple reasons.
val C = Option(A.toList.flatten ++ B.toList.flatten)
Option[List[Int]] = Some(List(1, 2, 3, 4))
It returns the incorrect type Option[List[Int]] instead of Option[Seq[Int]]
It needlessly invokes toList on A & B. You could simply add the options and invoke flatten on them.
It is not DRY and redundantly calls flatten on both A.toList & B.toList whereas it could call flatten on (A ++ B)
Instead of this, you could do this more efficiently:
val E = Option((A ++ B).flatten.toSeq)
E: Option[Seq[Int]] = Some(List(1, 2, 3, 4))
Using foldLeft
Seq(Some(List(1, 2)), None).foldLeft(List.empty[Int])(_ ++ _.getOrElse(List.empty[Int]))
result: List[Int] = List(1, 2)
Using flatten twice
Seq(Some(Seq(1, 2, 3)), Some(4, 5, 6), None).flatten.flatten
result: Seq(1, 2, 3, 4, 5, 6)
Scala REPL
scala> val a = Some(Seq(1, 2, 3))
a: Some[Seq[Int]] = Some(List(1, 2, 3))
scala> val b = Some(Seq(4, 5, 6))
b: Some[Seq[Int]] = Some(List(4, 5, 6))
scala> val c = None
c: None.type = None
scala> val d = Seq(a, b, c).flatten.flatten
d: Seq[Int] = List(1, 2, 3, 4, 5, 6)

How to convert two consecutive elements from List to entries in Map?

I have a list:
List(1,2,3,4,5,6)
that I would like to to convert to the following map:
Map(1->2,3->4,5->6)
How can this be done?
Mostly resembles #Vakh answer, but with a nicer syntax:
val l = List(1,2,3,4,5,6)
val m = l.grouped(2).map { case List(key, value) => key -> value}.toMap
// Map(1 -> 2, 3 -> 4, 5 -> 6)
Try:
val l = List(1,2,3,4,5,6)
val m = l.grouped(2).map(l => (l(0), l(1))).toMap
if the list is guaranteed to be of even length:
val l = List(1,2,3,4,5,6)
val m = l.grouped(2).map { x => x.head -> x.tail.head }.toMap
// Map(1 -> 2, 3 -> 4, 5 -> 6)
but if list may be of odd length, use headOption:
val l = List(1,2,3,4,5,6,7)
val m = l.grouped(2).map(x => x.head -> x.tail.headOption).toMap
// Map(1 -> Some(2), 3 -> Some(4), 5 -> Some(6), 7 -> None)
Without using grouped that appears ubiquitous in the answers so far.
scala> val l = (1 to 6).toList
l: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> l.zip(l.tail).zipWithIndex.collect { case (e, pos) if pos % 2 == 0 => e }.toMap
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4, 5 -> 6)
You may also use sliding and foldLeft as follows:
scala> l.sliding(2,2).foldLeft(Map.empty[Int,Int]){ case (m, List(l, r)) => m + (l -> r) }
res1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4, 5 -> 6)

Inserting at position in List

This insert function is taken from :
http://aperiodic.net/phil/scala/s-99/p21.scala
def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = ls.splitAt(n) match {
case (pre, post) => pre ::: e :: post
}
I want to insert an element at every second element of a List so I use :
val sl = List("1", "2", "3", "4", "5") //> sl : List[String] = List(1, 2, 3, 4, 5)
insertAt("'a", 2, insertAt("'a", 4, sl)) //> res0: List[String] = List(1, 2, 'a, 3, 4, 'a, 5)
This is a very basic implementation, I want to use one of the functional constructs. I think I need
to use a foldLeft ?
Group the list into Lists of size 2, then combine those into lists separated by the separation character:
val sl = List("1","2","3","4","5") //> sl : List[String] = List(1, 2, 3, 4, 5)
val grouped = sl grouped(2) toList //> grouped : List[List[String]] = List(List(1, 2), List(3, 4), List(5))
val separatedList = grouped flatMap (_ :+ "a") //> separatedList : <error> = List(1, 2, a, 3, 4, a, 5, a)
Edit
Just saw that my solution has a trailing token that isn't in the question. To get rid of that do a length check:
val separatedList2 = grouped flatMap (l => if(l.length == 2) l :+ "a" else l)
//> separatedList2 : <error> = List(1, 2, a, 3, 4, a, 5)
You could also use sliding:
val sl = List("1", "2", "3", "4", "5")
def insertEvery(n:Int, el:String, sl:List[String]) =
sl.sliding(2, 2).foldRight(List.empty[String])( (xs, acc) => if(xs.length == n)xs:::el::acc else xs:::acc)
insertEvery(2,"x",sl) // res1: List[String] = List(1, 2, x, 3, 4, x, 5)
Forget about insertAt, use pure foldLeft:
def insertAtEvery[A](e: A, n: Int, ls: List[A]): List[A] =
ls.foldLeft[(Int, List[A])]((0, List.empty)) {
case ((pos, result), elem) =>
((pos + 1) % n, if (pos == n - 1) e :: elem :: result else elem :: result)
}._2.reverse
Recursion and pattern matching are functional constructs. Insert the new elem by pattern matching on the output of splitAt then recurse with the remaining input. Seems easier to read but I'm not satisfied with the type signature for this one.
def insertEvery(xs: List[Any], n: Int, elem: String):List[Any] = xs.splitAt(n) match {
case (xs, List()) => if(xs.size >= n) xs ++ elem else xs
case (xs, ys) => xs ++ elem ++ insertEvery(ys, n, elem)
}
Sample runs.
scala> val xs = List("1","2","3","4","5")
xs: List[String] = List(1, 2, 3, 4, 5)
scala> insertEvery(xs, 1, "a")
res1: List[Any] = List(1, a, 2, a, 3, a, 4, a, 5, a)
scala> insertEvery(xs, 2, "a")
res2: List[Any] = List(1, 2, a, 3, 4, a, 5)
scala> insertEvery(xs, 3, "a")
res3: List[Any] = List(1, 2, 3, a, 4, 5)
An implementation using recursion:
Note n must smaller than the size of List, or else an Exception would be raised.
scala> def insertAt[A](e: A, n: Int, ls: List[A]): List[A] = n match {
| case 0 => e :: ls
| case _ => ls.head :: insertAt(e, n-1, ls.tail)
| }
insertAt: [A](e: A, n: Int, ls: List[A])List[A]
scala> insertAt("'a", 2, List("1", "2", "3", "4"))
res0: List[String] = List(1, 2, 'a, 3, 4)
Consider indexing list positions with zipWithIndex, and so
sl.zipWithIndex.flatMap { case(v,i) => if (i % 2 == 0) List(v) else List(v,"a") }