I wish to add a list of tuples of integers i.e. given an input list of tuples of arity k, produce a tuple of arity k whose fields are sums of corresponding fields of the tuples in the list.
Input
List( (1,2,3), (2,3,-3), (1,1,1))
Output
(4, 6, 1)
I was trying to use foldLeft, but I am not able to get it to compile. Right now, I am using a for loop, but I was looking for a more concise solution.
This can be done type safely and very concisely using shapeless,
scala> import shapeless._, syntax.std.tuple._
import shapeless._
import syntax.std.tuple._
scala> val l = List((1, 2, 3), (2, 3, -1), (1, 1, 1))
l: List[(Int, Int, Int)] = List((1,2,3), (2,3,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 3)
Notice that unlike solutions which rely on casts, this approach is type safe, and any type errors are detected at compile time rather than at runtime,
scala> val l = List((1, 2, 3), (2, "foo", -1), (1, 1, 1))
l: List[(Int, Any, Int)] = List((1,2,3), (2,foo,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
<console>:15: error: could not find implicit value for parameter num: Numeric[Any]
l.map(_.toList).transpose.map(_.sum)
^
scala> val tuples = List( (1,2,3), (2,3,-3), (1,1,1))
tuples: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> tuples.map(t => t.productIterator.toList.map(_.asInstanceOf[Int])).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 1)
Type information is lost when calling productIterator on Tuple3 so you have to convert from Any back to an Int.
If the tuples are always going to contain the same type I would suggest using another collection such as List. The Tuple is better suited for disparate types. When you have the same types and don't lose the type information by using productIterator the solution is more elegant.
scala> val tuples = List(List(1,2,3), List(2,3,-3), List(1,1,1))
tuples: List[List[Int]] = List(List(1, 2, 3), List(2, 3, -3), List(1, 1, 1))
scala> tuples.transpose.map(_.sum)
res1: List[Int] = List(4, 6, 1)
scala> val list = List( (1,2,3), (2,3,-3), (1,1,1))
list: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> list.foldRight( (0, 0, 0) ){ case ((a, b, c), (a1, b1, c1)) => (a + a1, b + b1, c + c1) }
res0: (Int, Int, Int) = (4,6,1)
Related
I have two Seq.
1 has Seq[String] and another has Seq[(String,Double)]
a -> ["a","b","c"] and
b-> [1,2,3]
I want to create output as
[("a",1),("b",2),("c",3)]
I have a code
a.zip(b) is actually creating a seq of those two elements instead of creating a map
Can anyone suggest how to do that in scala?
you simply need .toMap so that you can transform List[Tuple[String, Int]] to Map[String, Int]
scala> val seq1 = List("a", "b", "c")
seq1: List[String] = List(a, b, c)
scala> val seq2 = List(1, 2, 3)
seq2: List[Int] = List(1, 2, 3)
scala> seq1.zip(seq2)
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))
scala> seq1.zip(seq2).toMap
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
also see
How to convert a Seq[A] to a Map[Int, A] using a value of A as the key in the map?
I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)
Can someone tell me why I am getting different results when using Tuple2[List,List] and List[List] as my Product in the code below? Specifically I would like to know why the second value of the list of lists gets wrapped in another list?
scala> val a = List(1,2,3)
a: List[Int] = List(1, 2, 3)
scala> val b = List(4,5,6)
b: List[Int] = List(4, 5, 6)
scala> val c = List(a,b)
c: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6))
scala> c.productIterator.foreach( println(_) )
List(1, 2, 3)
List(List(4, 5, 6)) // <-- Note this
scala> val d = (a,b)
d: (List[Int], List[Int]) = (List(1, 2, 3),List(4, 5, 6))
scala> d.productIterator.foreach( println(_) )
List(1, 2, 3)
List(4, 5, 6) // <-- Compared to this
(I have read the (absolutely minimal) description of Scala's Product and the productIterator method on http://www.scala-lang.org/api/current/index.html#scala.Product )
Basically, Tuple means a product between all of its elements, but a non-empty List is a product between its head and tail.
This happens for List, because all case classes extend Product, and represent a product between all their elements similar to tuples. And non-empty List is defined as a case class, containing head and tail: final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B], which inherits the default implementation of Product by case class.
You can observe more of this behaviour with other Lists with 1 or more than 2 elements:
scala> List(a).productIterator.foreach(println)
List(1, 2, 3)
List()
scala> List(a, a).productIterator.foreach(println)
List(1, 2, 3)
List(List(1, 2, 3))
scala> List(a, a, a).productIterator.foreach(println)
List(1, 2, 3)
List(List(1, 2, 3), List(1, 2, 3))
Given the following list of tuples...
val list = List((1, 2), (1, 2), (1, 2))
... how do I sum all the values and obtain a single tuple like this?
(3, 6)
Using the foldLeft method. Please look at the scaladoc for more information.
scala> val list = List((1, 2), (1, 2), (1, 2))
list: List[(Int, Int)] = List((1,2), (1,2), (1,2))
scala> list.foldLeft((0, 0)) { case ((accA, accB), (a, b)) => (accA + a, accB + b) }
res0: (Int, Int) = (3,6)
Using unzip. Not as efficient as the above solution. Perhaps more readable.
scala> list.unzip match { case (l1, l2) => (l1.sum, l2.sum) }
res1: (Int, Int) = (3,6)
Very easy: (list.map(_._1).sum, list.map(_._2).sum).
You can solve this using Monoid.combineAll from the cats library:
import cats.instances.int._ // For monoid instances for `Int`
import cats.instances.tuple._ // for Monoid instance for `Tuple2`
import cats.Monoid.combineAll
def main(args: Array[String]): Unit = {
val list = List((1, 2), (1, 2), (1, 2))
val res = combineAll(list)
println(res)
// Displays
// (3, 6)
}
You can see more about this in the cats documentation or Scala with Cats.
answering to this question while trying to understand aggregate function in spark
scala> val list = List((1, 2), (1, 2), (1, 2))
list: List[(Int, Int)] = List((1,2), (1,2), (1,2))
scala> list.aggregate((0,0))((x,y)=>((y._1+x._1),(x._2+y._2)),(x,y)=>(x._1+y._2,y._2+x._2))
res89: (Int, Int) = (3,6)
Here is the link to the SO QA that helped to understand and answer this [Explain the aggregate functionality in Spark
Scalaz solution (suggestied by Travis and for some reason a deleted answer):
import scalaz._
import Scalaz._
val list = List((1, 2), (1, 2), (1, 2))
list.suml
which outputs
res0: (Int, Int) = (3,6)
You can also use a reduce function :
val list = List((1, 2), (1, 2), (1, 2))
val res = list.reduce((x, y) => (x._1 + y._1, x._2 + y._2))
What's the best way to convert a List of Lists in scala (2.9)?
I have a list:
List[List[A]]
which I want to convert into
List[A]
How can that be achieved recursively? Or is there any other better way?
List has the flatten method. Why not use it?
List(List(1,2), List(3,4)).flatten
> List(1,2,3,4)
.flatten is obviously the easiest way, but for completeness you should also know about flatMap
val l = List(List(1, 2), List(3, 4))
println(l.flatMap(identity))
and the for-comprehension equivalent
println(for (list <- l; x <- list) yield x)
flatten is obviously a special case of flatMap, which can do so much more.
Given the above example, I'm not sure you need recursion. Looks like you want List.flatten instead.
e.g.
scala> List(1,2,3)
res0: List[Int] = List(1, 2, 3)
scala> List(4,5,6)
res1: List[Int] = List(4, 5, 6)
scala> List(res0,res1)
res2: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6))
scala> res2.flatten
res3: List[Int] = List(1, 2, 3, 4, 5, 6)
If your structure can be further nested, like:
List(List(1, 2, 3, 4, List(5, 6, List(7, 8))))
This function should give you the desire result:
def f[U](l: List[U]): List[U] = l match {
case Nil => Nil
case (x: List[U]) :: tail => f(x) ::: f(tail)
case x :: tail => x :: f(tail)
}
You don't need recursion but you can use it if you want:
def flatten[A](list: List[List[A]]):List[A] =
if (list.length==0) List[A]()
else list.head ++ flatten(list.tail)
This works like flatten method build into List. Example:
scala> flatten(List(List(1,2), List(3,4)))
res0: List[Int] = List(1, 2, 3, 4)
If you want to use flatmap, here is the the way
Suppose that you have a List of List[Int] named ll, and you want to flat it to List,
many people already gives you the answers, such as flatten, that's the easy way. I assume that you are asking for using flatmap method. If it is the case, here is the way
ll.flatMap(_.map(o=>o))