For multiple generators to handle Seq - scala

I'm a new to scala, and I want to unique a Seq[(Int,Int)] by the first component, my code as follow:
val seq = Seq((1,1), (0,1), (2,1), (0, 1), (3,1), (2,1))
val prev = -1
val uniqueSeq = for(tuple <- seq.sortBy(_._1) if !tuple._1.equals(prev); prev = tuple._1) yield tuple
but why the result is
uniqueSeq: Seq[(Int, Int)] = List((0,1), (0,1), (1,1), (2,1), (2,1), (3,1))

I would take a different approach:
It is a good idea to group them first. Then you can get the head of each of the groups:
seq.groupBy{
case (x, _) => x
}.map {
case (_, head :: _) => head
}.toList

prev in prev = tuple._1 is a completely different variable from val prev = -1! Note that it compiles even though the first prev is val, i.e. immutable (it can't be changed).
If you want to use this approach, you can:
val seq = Seq((1,1), (0,1), (2,1), (0, 1), (3,1), (2,1))
var prev = -1
val uniqueSeq = for(tuple <- seq.sortBy(_._1) if !tuple._1.equals(prev)) yield { prev = tuple._1; tuple }
but it isn't the idiomatic one in Scala. I'll leave that to someone else, since I don't have enough time right now.

Alexey already explained the mistake you're making with the prev variable.
A more idiomatic implementation of what you're trying to do (if I got it right) is
val seq = Seq((1,1), (0,1), (2,1), (0, 1), (3,1), (2,1))
seq.sortBy(_._1).reverse.toMap.toList // List((0,1), (1,1), (2,1), (3,1))
The caveat is that going through a Map the duplicate keys will disappear.
The reverse is necessary, since the last occurrence of a "key" will be preserved in the Map.

Related

How can I split a list of tuples scala

I have this list in Scala (which in reality has length 500):
List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
What could I do so that I can make a new list which contains the following:
List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))
Basically I wanna have a new list of tuples which will contain the first element of the old tuple and each element of the list inside the tuple. I am not sure how to start implementing this and this is why I have posted no code to show my attempt. I am really sorry, but I cant grasp this. I appreciate any help you can provide.
Exactly the same as #Andriy but using a for comprehension.
Which in the end is exactly the same but is more readable IMHO.
val result = for {
(x, ys) <- xs
y <- ys
} yield (x, y) // You can also use x -> y
(Again, I would recommend you to follow any tutorial, this is a basic exercise which if you had understand how map & flatMap works you shouldn't have any problem)
scala> val xs = List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
xs: List[(Int, List[Int])] = List((1,List(1, 2, 3)), (2,List(1, 2, 3)), (3,List(1, 2, 3)))
scala> xs.flatMap { case (x, ys) => ys.map(y => (x, y)) }
res0: List[(Int, Int)] = List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))
It's probably worth mentioning that the solution by Andriy Plokhotnyuk can also be re-written as a for-comprehension:
val list = List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
val pairs = for {
(n, nestedList) <- list
m <- nestedList
} yield (n, m)
assert(pairs == List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)))
The compiler will effectively re-write the for-comprehension to a flatMap/map chain as described in another answer.

How the get the index of the duplicate pair in the scala list

I have a scala list like this below:
slist = List("a","b","c","a","d","c","a")
I want to get the index of the same element pair in this list.
For example,the result of this slist is
(0,3),(0,6),(3,6),(2,5)
which (0,3) means the slist(0)==slist(3)
(0,6) means the slist(0)==slist(6)
and so on.
So is there any method to do this in scala?
Thanks very much
There's simpler approaches but starting with zipWithIndex leads down this path. zipWithIndex returns a Tuple2 with the index and one of the letters. From there we groupBy the letter to get a map of the letter to it's indices and filter the ones with more than one value. Lastly, we have this MapLike.DefaultValuesIterable(List((a,0), (a,3), (a,6)), List((c,2), (c,5)))
which we take the indices from and make combinations.
scala> slist.zipWithIndex.groupBy(zipped => zipped._1).filter(t => t._2.size > 1).values.flatMap(xs => xs.map(t => t._2).combinations(2))
res40: Iterable[List[Int]] = List(List(0, 3), List(0, 6), List(3, 6), List(2, 5))
Indexing a List is rather inefficient so I recommend a transition to Vector and then back again (if needed).
val svec = slist.toVector
svec.indices
.map(x => (x,svec.indexOf(svec(x),x+1)))
.filter(_._2 > 0)
.toList
//res0: List[(Int, Int)] = List((0,3), (2,5), (3,6))
val v = slist.toVector; val s = v.size
for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
In Scala REPL:
scala> for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
res34: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((0,3), (0,6), (2,5), (3,6))

Calculation on consecutive array elements

I have this:
val myInput:ArrayBuffer[(String,String)] = ArrayBuffer(
(a,timestampAStr),
(b,timestampBStr),
...
)
I would like to calculate the duration between each two consecutive timestamps from myInput and retrieve those like the following:
val myOutput = ArrayBuffer(
(a,durationFromTimestampAToTimestampB),
(b,durationFromTimestampBToTimestampC),
...
)
This is a paired evaluation, which led me to think something with foldLeft() might do the trick, but after giving this a little more thought, I could not come up with a solution.
I have already put something together with some for loops and .indices, but this does not seem as clean and concise as it could be. I would appreciate if somebody had a better option.
You can use zip and sliding to achieve what you want. For example, if you have a collection
scala> List(2,3,5,7,11)
res8: List[Int] = List(2, 3, 5, 7, 11)
The list of differences is res8.sliding(2).map{case List(fst,snd)=>snd-fst}.toList, which you can zip with the original list.
scala> res8.zip(res8.sliding(2).map{case List(fst,snd)=>snd-fst}.toList)
res13: List[(Int, Int)] = List((2,1), (3,2), (5,2), (7,4))
You can zip your array with itself, after dropping the first item - to match each item with the consecutive one - and then map to the calculated result:
val myInput:ArrayBuffer[(String,String)] = ArrayBuffer(
("a","1000"),
("b","1500"),
("c","2500")
)
val result: ArrayBuffer[(String, Int)] = myInput.zip(myInput.drop(1)).map {
case ((k1, v1), (k2, v2)) => (k1, v2.toInt - v1.toInt)
}
result.foreach(println)
// prints:
// (a,500)
// (b,1000)

How to access second element in a Sequence in scala

val k = Seq((0,1),(1,2),(2,3),(3,4))
k: Seq[(Int, Int)] = List((0,1), (1,2), (2,3), (3,4))
If I have above statement and I need to do addition for even places and subtraction for odd places how can I access them? to be clear
(0,1) has to become (0,(1+2))
(1,2) has to become (1,(1-2))
(2,3) has to become (2,(3+4))
(3,4) has to become (3,(3-4)
Do you mean something like this?
val transformed = k.grouped(2).flatMap{
case Seq((i, x), (j, y)) => Seq((i, x + y), (j, x - y))
}
transformed.toList
// List[(Int, Int)] = List((0,3), (1,-1), (2,7), (3,-1))

Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]

Is it possible in Spark to implement '.combinations' function from scala collections?
/** Iterates over combinations.
*
* #return An Iterator which traverses the possible n-element combinations of this $coll.
* #example `"abbbc".combinations(2) = Iterator(ab, ac, bb, bc)`
*/
For example how can I get from RDD[X] to RDD[List[X]] or RDD[(X,X)] for combinations of size = 2. And lets assume that all values in RDD are unique.
Cartesian product and combinations are two different things, the cartesian product will create an RDD of size rdd.size() ^ 2 and combinations will create an RDD of size rdd.size() choose 2
val rdd = sc.parallelize(1 to 5)
val combinations = rdd.cartesian(rdd).filter{ case (a,b) => a < b }`.
combinations.collect()
Note this will only work if an ordering is defined on the elements of the list, since we use <. This one only works for choosing two but can easily be extended by making sure the relationship a < b for all a and b in the sequence
This is supported natively by a Spark RDD with the cartesian transformation.
e.g.:
val rdd = sc.parallelize(1 to 5)
val cartesian = rdd.cartesian(rdd)
cartesian.collect
Array[(Int, Int)] = Array((1,1), (1,2), (1,3), (1,4), (1,5),
(2,1), (2,2), (2,3), (2,4), (2,5),
(3,1), (3,2), (3,3), (3,4), (3,5),
(4,1), (4,2), (4,3), (4,4), (4,5),
(5,1), (5,2), (5,3), (5,4), (5,5))
As discussed, cartesian will give you n^2 elements of the cartesian product of the RDD with itself.
This algorithm computes the combinations (n,2) of an RDD without having to compute the n^2 elements first: (used String as type, generalizing to a type T takes some plumbing with classtags that would obscure the purpose here)
This is probably less time efficient that cartesian + filtering due to the iterative count and take actions that forces the computation of the RDD, but more space efficient as it calculates only the C(n,2) = n!/(2*(n-2))! = (n*(n-1)/2) elements instead of the n^2 of the cartesian product.
import org.apache.spark.rdd._
def combs(rdd:RDD[String]):RDD[(String,String)] = {
val count = rdd.count
if (rdd.count < 2) {
sc.makeRDD[(String,String)](Seq.empty)
} else if (rdd.count == 2) {
val values = rdd.collect
sc.makeRDD[(String,String)](Seq((values(0), values(1))))
} else {
val elem = rdd.take(1)
val elemRdd = sc.makeRDD(elem)
val subtracted = rdd.subtract(elemRdd)
val comb = subtracted.map(e => (elem(0),e))
comb.union(combs(subtracted))
}
}
This creates all combinations (n, 2) and works for any RDD without requiring any ordering on the elements of RDD.
val rddWithIndex = rdd.zipWithIndex
rddWithIndex.cartesian(rddWithIndex).filter{case(a, b) => a._2 < b._2}.map{case(a, b) => (a._1, b._1)}
a._2 and b._2 are the indices, while a._1 and b._1 are the elements of the original RDD.
Example:
Note that, no ordering is defined on the maps here.
val m1 = Map('a' -> 1, 'b' -> 2)
val m2 = Map('c' -> 3, 'a' -> 4)
val m3 = Map('e' -> 5, 'c' -> 6, 'b' -> 7)
val rdd = sc.makeRDD(Array(m1, m2, m3))
val rddWithIndex = rdd.zipWithIndex
rddWithIndex.cartesian(rddWithIndex).filter{case(a, b) => a._2 < b._2}.map{case(a, b) => (a._1, b._1)}.collect
Output:
Array((Map(a -> 1, b -> 2),Map(c -> 3, a -> 4)), (Map(a -> 1, b -> 2),Map(e -> 5, c -> 6, b -> 7)), (Map(c -> 3, a -> 4),Map(e -> 5, c -> 6, b -> 7)))