Scala - how to sort tuples by both attributes in different order? - scala

I want to sort a List[(String, Int)] so that the Ints are first sorted in descending order and then the Strings are sorted alphabetically. With my current implementation I achieved sorting Ints as expected. But the Strings are sorted in reverse order. I suppose this is due to the reverse ordering applied to the whole tuple.
How should I correct this to get the Strings sorted alphabetically?
val list: List[(String, Int)] = List(("x", 1), ("a", 1), ("c", 1), ("a", 2), ("b", 2), ("b", 1), ("a", 5), ("c", 5))
val sortedList = list.sortBy(x => (x._2, x._1))(implicitly[Ordering[(Int, String)]].reverse)
// Prints List((c,5), (a,5), (b,2), (a,2), (x,1), (c,1), (b,1), (a,1))
println(sortedList)
Expected: List((a,5), (c,5), (a,2), (b,2), (a,1), (b,1), (c,1), (x,1))

scala> val sortedList = list.sortBy(x => (-x._2.toLong, x._1))
sortedList: List[(String, Int)] = List((a,5), (c,5), (a,2), (b,2), (a,1), (b,1), (c,1), (x,1))
The trick with toLong here is to work properly for arbitrary Int values, including Int.MinValue for which:
scala> Int.MinValue == -Int.MinValue
res0: Boolean = true
scala> Int.MinValue.toLong == -Int.MinValue.toLong
res1: Boolean = false
For less allocations and better efficiency in runtime, please, consider using sorted with a custom ordering function:
scala> :paste
// Entering paste mode (ctrl-D to finish)
list.sorted((x: (String, Int), y: (String, Int)) => {
if (y._2 > x._2) 1
else if (y._2 < x._2) -1
else x._1.compareTo(y._1)
})
// Exiting paste mode, now interpreting.
res2: List[(String, Int)] = List((a,5), (c,5), (a,2), (b,2), (a,1), (b,1), (c,1), (x,1))

Related

How to merge two Seq[String], Seq[Double] to Seq[(String,Double)]

I have two Seq.
1 has Seq[String] and another has Seq[(String,Double)]
a -> ["a","b","c"] and
b-> [1,2,3]
I want to create output as
[("a",1),("b",2),("c",3)]
I have a code
a.zip(b) is actually creating a seq of those two elements instead of creating a map
Can anyone suggest how to do that in scala?
you simply need .toMap so that you can transform List[Tuple[String, Int]] to Map[String, Int]
scala> val seq1 = List("a", "b", "c")
seq1: List[String] = List(a, b, c)
scala> val seq2 = List(1, 2, 3)
seq2: List[Int] = List(1, 2, 3)
scala> seq1.zip(seq2)
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))
scala> seq1.zip(seq2).toMap
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
also see
How to convert a Seq[A] to a Map[Int, A] using a value of A as the key in the map?

traverse list in scala and group all the first elements and second elements

This might be pretty simple questions. I have a list named "List1" that contain list of integer pairs as below.
List1 = List((1,2), (3,4), (9,8), (9,10))
Output should be:
r1 = (1,3,9,9) //List((1,2), (3,4), (9,8), (9,10))
r2 = (2,4,8,10) //List((1,2), (3,4), (9,8), (9,10))
array r1(Array[int]) should contains set of all first integers of each pair in the list.
array r2(Array[int]) should contains set of all second integers of each pair
Just use unzip:
scala> List((1,2), (3,4), (9,8), (9,10)).unzip
res0: (List[Int], List[Int]) = (List(1, 3, 9, 9),List(2, 4, 8, 10))
Use foldLeft
val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
Scala REPL
scala> val list1 = List((1, 2), (3, 4), (5, 6))
list1: List[(Int, Int)] = List((1,2), (3,4), (5,6))
scala> val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
alist: List[Int] = List(1, 3, 5)
blist: List[Int] = List(2, 4, 6)

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.
scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))
scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))
As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.
Thanks
You need to look at how the signatures defined these methods:
def map[U: ClassTag](f: T => U): RDD[U]
map takes a function from type T to type U and returns an RDD[U].
On the other hand, flatMap:
def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]
Expects a function taking type T to a TraversableOnce[U], which is a trait Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.
map(func) Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The following example might be helpful.
scala> val b = List("1", "2", "4", "5")
b: List[String] = List(1, 2, 4, 5)
scala> b.map(x=>Set(x,1))
res69: List[scala.collection.immutable.Set[Any]] =
List(Set(1, 1), Set(2, 1), Set(4, 1), Set(5, 1))
scala> b.flatMap(x=>Set(x,1))
res70: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x,1))
res71: List[Any] = List(1, 1, 2, 1, 4, 1, 5, 1)
scala> b.flatMap(x=>List(x+1))
res75: scala.collection.immutable.Set[String] = List(11, 21, 41, 51) // concat
scala> val x = sc.parallelize(List("aa bb cc dd", "ee ff gg hh"), 2)
scala> val y = x.map(x => x.split(" ")) // split(" ") returns an array of words
scala> y.collect
res0: Array[Array[String]] = Array(Array(aa, bb, cc, dd), Array(ee, ff, gg, hh))
scala> val y = x.flatMap(x => x.split(" "))
scala> y.collect
res1: Array[String] = Array(aa, bb, cc, dd, ee, ff, gg, hh)
Map operation return type is U where as flatMap return type is TraversableOnce[U](means collections)
val b = List("1", "2", "4", "5")
val mapRDD = b.map { input => (input, 1) }
mapRDD.foreach(f => println(f._1 + " " + f._2))
val flatmapRDD = b.flatMap { input => List((input, 1)) }
flatmapRDD.foreach(f => println(f._1 + " " + f._2))
map does a 1-to-1 transformation, while flatMap converts a list of lists to a single list:
scala> val b = List(List(1,2,3), List(4,5,6), List(7,8,90))
b: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 90))
scala> b.map(x => (x,1))
res1: List[(List[Int], Int)] = List((List(1, 2, 3),1), (List(4, 5, 6),1), (List(7, 8, 90),1))
scala> b.flatMap(x => x)
res2: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 90)
Also, flatMap is useful for filtering out None values if you have a list of Options:
scala> val c = List(Some(1), Some(2), None, Some(3), Some(4), None)
c: List[Option[Int]] = List(Some(1), Some(2), None, Some(3), Some(4), None)
scala> c.flatMap(x => x)
res3: List[Int] = List(1, 2, 3, 4)

Understanding Scala code: (-_._2)

Help me understand this Scala code:
sortBy(-_._2)
I understand that the first underscore (_) is a placeholder. I understand that _2 means the second member of a Tuple.
But what does a minus (-) stand for in this code?
Reverse order (i.e. descending), you sort by minus the second field of the tuple
The underscore is an anonymous parameter, so -_ is basically the same as x => -x
Some examples in plain scala:
scala> List(1,2,3).sortBy(-_)
res0: List[Int] = List(3, 2, 1)
scala> List("a"->1,"b"->2, "c"->3).sortBy(-_._2)
res1: List[(String, Int)] = List((c,3), (b,2), (a,1))
scala> List(1,2,3).sortBy(x => -x)
res2: List[Int] = List(3, 2, 1)
Sort by sorts by ascending order as default. To inverse the order a - (Minus) can be prepended, as already explained by #TrustNoOne .
So sortBy(-_._2) sorts by the second value of a Tuple2 but in reverse order.
A longer example:
scala> Map("a"->1,"b"->2, "c"->3).toList.sortBy(-_._2)
res1: List[(String, Int)] = List((c,3), (b,2), (a,1))
is the same as
scala> Map("a"->1,"b"->2, "c"->3).toList sortBy { case (key,value) => - value }
res1: List[(String, Int)] = List((c,3), (b,2), (a,1))

Add list of tuples of integers in Scala

I wish to add a list of tuples of integers i.e. given an input list of tuples of arity k, produce a tuple of arity k whose fields are sums of corresponding fields of the tuples in the list.
Input
List( (1,2,3), (2,3,-3), (1,1,1))
Output
(4, 6, 1)
I was trying to use foldLeft, but I am not able to get it to compile. Right now, I am using a for loop, but I was looking for a more concise solution.
This can be done type safely and very concisely using shapeless,
scala> import shapeless._, syntax.std.tuple._
import shapeless._
import syntax.std.tuple._
scala> val l = List((1, 2, 3), (2, 3, -1), (1, 1, 1))
l: List[(Int, Int, Int)] = List((1,2,3), (2,3,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 3)
Notice that unlike solutions which rely on casts, this approach is type safe, and any type errors are detected at compile time rather than at runtime,
scala> val l = List((1, 2, 3), (2, "foo", -1), (1, 1, 1))
l: List[(Int, Any, Int)] = List((1,2,3), (2,foo,-1), (1,1,1))
scala> l.map(_.toList).transpose.map(_.sum)
<console>:15: error: could not find implicit value for parameter num: Numeric[Any]
l.map(_.toList).transpose.map(_.sum)
^
scala> val tuples = List( (1,2,3), (2,3,-3), (1,1,1))
tuples: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> tuples.map(t => t.productIterator.toList.map(_.asInstanceOf[Int])).transpose.map(_.sum)
res0: List[Int] = List(4, 6, 1)
Type information is lost when calling productIterator on Tuple3 so you have to convert from Any back to an Int.
If the tuples are always going to contain the same type I would suggest using another collection such as List. The Tuple is better suited for disparate types. When you have the same types and don't lose the type information by using productIterator the solution is more elegant.
scala> val tuples = List(List(1,2,3), List(2,3,-3), List(1,1,1))
tuples: List[List[Int]] = List(List(1, 2, 3), List(2, 3, -3), List(1, 1, 1))
scala> tuples.transpose.map(_.sum)
res1: List[Int] = List(4, 6, 1)
scala> val list = List( (1,2,3), (2,3,-3), (1,1,1))
list: List[(Int, Int, Int)] = List((1,2,3), (2,3,-3), (1,1,1))
scala> list.foldRight( (0, 0, 0) ){ case ((a, b, c), (a1, b1, c1)) => (a + a1, b + b1, c + c1) }
res0: (Int, Int, Int) = (4,6,1)