traverse list in scala and group all the first elements and second elements - scala

This might be pretty simple questions. I have a list named "List1" that contain list of integer pairs as below.
List1 = List((1,2), (3,4), (9,8), (9,10))
Output should be:
r1 = (1,3,9,9) //List((1,2), (3,4), (9,8), (9,10))
r2 = (2,4,8,10) //List((1,2), (3,4), (9,8), (9,10))
array r1(Array[int]) should contains set of all first integers of each pair in the list.
array r2(Array[int]) should contains set of all second integers of each pair

Just use unzip:
scala> List((1,2), (3,4), (9,8), (9,10)).unzip
res0: (List[Int], List[Int]) = (List(1, 3, 9, 9),List(2, 4, 8, 10))

Use foldLeft
val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
Scala REPL
scala> val list1 = List((1, 2), (3, 4), (5, 6))
list1: List[(Int, Int)] = List((1,2), (3,4), (5,6))
scala> val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
alist: List[Int] = List(1, 3, 5)
blist: List[Int] = List(2, 4, 6)

Related

Scala grouping list into list tuples with one shared element

What would be short functional way to split list
List(1, 2, 3, 4, 5) into List((1,2), (2, 3), (3, 4), (4, 5))
(assuming you don't care if you nested pairs are Lists and not Tuples)
Scala collections have a sliding window function:
# val lazyWindow = List(1, 2, 3, 4, 5).sliding(2)
lazyWindow: Iterator[List[Int]] = non-empty iterator
To realize the collection:
# lazyWindow.toList
res1: List[List[Int]] = List(List(1, 2), List(2, 3), List(3, 4), List(4, 5))
You can even do more "funcy" windows, like of length 3 but with step 2:
# List(1, 2, 3, 4, 5).sliding(3,2).toList
res2: List[List[Int]] = List(List(1, 2, 3), List(3, 4, 5))
You can zip the list with its tail:
val list = List(1, 2, 3, 4, 5)
// list: List[Int] = List(1, 2, 3, 4, 5)
list zip list.tail
// res6: List[(Int, Int)] = List((1,2), (2,3), (3,4), (4,5))
I have always been a big fan of pattern matching. So you could also do:
val list = List(1, 2, 3, 4, 5, 6)
def splitList(list: List[Int], result: List[(Int, Int)] = List()): List[(Int, Int)] = {
list match {
case Nil => result
case x :: Nil => result
case x1 :: x2 :: ls => splitList(x2 :: ls, result.:+(x1, x2))
}
}
splitList(list)
//List((1,2), (2,3), (3,4), (4,5), (5,6))

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Grouping a list

I want to group elements of a list such as :
val lst = List(1,2,3,4,5)
On transformation it should return a new list as:
val newlst = List(List(1), List(1,2), List(1,2,3), List(1,2,3,4), Lis(1,2,3,4,5))
You can do it this way:
lst.inits.toList.reverse.tail
(1 to lst.size map lst.take).toList should do it.
Not as pretty or short as others, but gotta have some tail recursion for the soul:
def createFromElements(list: List[Int]): List[List[Int]] = {
#tailrec
def createFromElements(l: List[Int], p: List[List[Int]]): List[List[Int]] =
l match {
case x :: xs =>
createFromElements(xs, (p.headOption.getOrElse(List()) ++ List(x)) :: p)
case Nil => p.reverse
}
createFromElements(list, Nil)
}
And now:
scala> createFromElements(List(1,2,3,4,5))
res10: List[List[Int]] = List(List(1), List(1, 2), List(1, 2, 3), List(1, 2, 3, 4), List(1, 2, 3, 4, 5))
Doing a foldLeft seems to be more efficient, though ugly:
(lst.foldLeft((List[List[Int]](), List[Int]()))((x,y) => {
val z = x._2 :+ y;
(x._1 :+ z, z)
}))._1

Scala generate unique pairs from list

Input :
val list = List(1, 2, 3, 4)
Desired output :
Iterator((1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4))
This code works :
for (cur1 <- 0 until list.size; cur2 <- (cur1 + 1) until list.size)
yield (list(cur1), list(cur2))
but it not seems optimal, is there any better way of doing it ?
There's a .combinations method built-in:
scala> List(1,2,3,4).combinations(2).toList
res0: List[List[Int]] = List(List(1, 2), List(1, 3), List(1, 4), List(2, 3), List(2, 4), List(3, 4))
It returns an Iterator, but I added .toList just for the purpose of printing the result. If you want your results in tuple form, you can do:
scala> List(1,2,3,4).combinations(2).map{ case Seq(x, y) => (x, y) }.toList
res1: List[(Int, Int)] = List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
You mentioned uniqueness as well, so you could apply .distinct to your input list uniqueness isn't a precondition of your function, because .combination will not deduplicate for you.
.combinations is the proper way for generating unique arbitrary groups of any size, another alternative solution that does not check the uniqueness in first place is using foldLeft that way:
val list = (1 to 10).toList
val header :: tail = list
tail.foldLeft((header, tail, List.empty[(Int, Int)])) {
case ((header, tail, res), elem) =>
(elem, tail.drop(1), res ++ tail.map(x => (header, x)))
}._3
Will produce:
res0: List[(Int, Int)] = List((1,2), (1,3), (1,4), (1,5), (1,6), (1,7), (1,8), (1,9), (1,10), (2,3), (2,4), (2,5), (2,6), (2,7), (2,8), (2,9), (2,10), (3,4), (3,5), (3,6), (3,7), (3,8), (3,9), (3,10), (4,5), (4,6), (4,7), (4,8), (4,9), (4,10), (5,6), (5,7), (5,8), (5,9), (5,10), (6,7), (6,8), (6,9), (6,10), (7,8), (7,9), (7,10), (8,9), (8,10), (9,10))
If you expect there to be duplicates then you can turn the output list into a set and bring it back into a list, but you will lose the ordering then. Thus not the recommended way if you want to have uniqueness, but should be preferred if you want to generate all of the pairs included equal elements.
E.g. I used it in the field of machine learning for generating all of the products between each pair of variables in the feature space and if two or more variables have the same value I still want to produce a new variable corresponding to their product even though those newly generated "interaction variables" will have duplicates.

Add list of tuples of integers in Scala

I wish to add a list of tuples of integers i.e. given an input list of tuples of arity k, produce a tuple of arity k whose fields are sums of corresponding fields of the tuples in the list.
Input
List( (1,2,3), (2,3,-3), (1,1,1))
Output
(4, 6, 1)
I was trying to use foldLeft, but I am not able to get it to compile. Right now, I am using a for loop, but I was looking for a more concise solution.
This can be done type safely and very concisely using shapeless,
scala> import shapeless._, syntax.std.tuple._
import shapeless._
import syntax.std.tuple._
scala> val l = List((1, 2, 3), (2, 3, -1), (1, 1, 1))
l: List[(Int, Int, Int)] = List((1,2,3), (2,3,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 3)
Notice that unlike solutions which rely on casts, this approach is type safe, and any type errors are detected at compile time rather than at runtime,
scala> val l = List((1, 2, 3), (2, "foo", -1), (1, 1, 1))
l: List[(Int, Any, Int)] = List((1,2,3), (2,foo,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
<console>:15: error: could not find implicit value for parameter num: Numeric[Any]
l.map(_.toList).transpose.map(_.sum)
^
scala> val tuples = List( (1,2,3), (2,3,-3), (1,1,1))
tuples: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> tuples.map(t => t.productIterator.toList.map(_.asInstanceOf[Int])).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 1)
Type information is lost when calling productIterator on Tuple3 so you have to convert from Any back to an Int.
If the tuples are always going to contain the same type I would suggest using another collection such as List. The Tuple is better suited for disparate types. When you have the same types and don't lose the type information by using productIterator the solution is more elegant.
scala> val tuples = List(List(1,2,3), List(2,3,-3), List(1,1,1))
tuples: List[List[Int]] = List(List(1, 2, 3), List(2, 3, -3), List(1, 1, 1))
scala> tuples.transpose.map(_.sum)
res1: List[Int] = List(4, 6, 1)
scala> val list = List( (1,2,3), (2,3,-3), (1,1,1))
list: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> list.foldRight( (0, 0, 0) ){ case ((a, b, c), (a1, b1, c1)) => (a + a1, b + b1, c + c1) }
res0: (Int, Int, Int) = (4,6,1)