Scala generate unique pairs from list - scala

Input :
val list = List(1, 2, 3, 4)
Desired output :
Iterator((1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4))
This code works :
for (cur1 <- 0 until list.size; cur2 <- (cur1 + 1) until list.size)
yield (list(cur1), list(cur2))
but it not seems optimal, is there any better way of doing it ?

There's a .combinations method built-in:
scala> List(1,2,3,4).combinations(2).toList
res0: List[List[Int]] = List(List(1, 2), List(1, 3), List(1, 4), List(2, 3), List(2, 4), List(3, 4))
It returns an Iterator, but I added .toList just for the purpose of printing the result. If you want your results in tuple form, you can do:
scala> List(1,2,3,4).combinations(2).map{ case Seq(x, y) => (x, y) }.toList
res1: List[(Int, Int)] = List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
You mentioned uniqueness as well, so you could apply .distinct to your input list uniqueness isn't a precondition of your function, because .combination will not deduplicate for you.

.combinations is the proper way for generating unique arbitrary groups of any size, another alternative solution that does not check the uniqueness in first place is using foldLeft that way:
val list = (1 to 10).toList
val header :: tail = list
tail.foldLeft((header, tail, List.empty[(Int, Int)])) {
case ((header, tail, res), elem) =>
(elem, tail.drop(1), res ++ tail.map(x => (header, x)))
}._3
Will produce:
res0: List[(Int, Int)] = List((1,2), (1,3), (1,4), (1,5), (1,6), (1,7), (1,8), (1,9), (1,10), (2,3), (2,4), (2,5), (2,6), (2,7), (2,8), (2,9), (2,10), (3,4), (3,5), (3,6), (3,7), (3,8), (3,9), (3,10), (4,5), (4,6), (4,7), (4,8), (4,9), (4,10), (5,6), (5,7), (5,8), (5,9), (5,10), (6,7), (6,8), (6,9), (6,10), (7,8), (7,9), (7,10), (8,9), (8,10), (9,10))
If you expect there to be duplicates then you can turn the output list into a set and bring it back into a list, but you will lose the ordering then. Thus not the recommended way if you want to have uniqueness, but should be preferred if you want to generate all of the pairs included equal elements.
E.g. I used it in the field of machine learning for generating all of the products between each pair of variables in the feature space and if two or more variables have the same value I still want to produce a new variable corresponding to their product even though those newly generated "interaction variables" will have duplicates.

Related

spark iterate consecutive elements in list to rows

I have a spark rdd with a column like
List(1, 3, 4, 8)
List(2, 3)
List(1, 5, 6)
I would like to get a new rdd with consecutive elements in each list to rows, like
(1, 3)
(3, 4)
(4, 8)
(2, 3)
(1, 5)
(5, 6)
How can I achieve this with scala?
Consider:
using a complementary (plain Scala) function with signature List[Int] => List[(Int, Int)] to achieve the desired result for the single list
and
passing this function to your RDD's flatMap method.
This complementary function may look like this:
def makeTuples(l: List[Int],
acc: List[(Int, Int)] = List.empty): List[(Int, Int)] =
l match {
case Nil | _ :: Nil => acc.reverse
case a :: b :: rest => makeTuples(b :: rest, (a, b) :: acc)
}

How can I split a list of tuples scala

I have this list in Scala (which in reality has length 500):
List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
What could I do so that I can make a new list which contains the following:
List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))
Basically I wanna have a new list of tuples which will contain the first element of the old tuple and each element of the list inside the tuple. I am not sure how to start implementing this and this is why I have posted no code to show my attempt. I am really sorry, but I cant grasp this. I appreciate any help you can provide.
Exactly the same as #Andriy but using a for comprehension.
Which in the end is exactly the same but is more readable IMHO.
val result = for {
(x, ys) <- xs
y <- ys
} yield (x, y) // You can also use x -> y
(Again, I would recommend you to follow any tutorial, this is a basic exercise which if you had understand how map & flatMap works you shouldn't have any problem)
scala> val xs = List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
xs: List[(Int, List[Int])] = List((1,List(1, 2, 3)), (2,List(1, 2, 3)), (3,List(1, 2, 3)))
scala> xs.flatMap { case (x, ys) => ys.map(y => (x, y)) }
res0: List[(Int, Int)] = List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))
It's probably worth mentioning that the solution by Andriy Plokhotnyuk can also be re-written as a for-comprehension:
val list = List((1,List(1,2,3)), (2,List(1,2,3)), (3, List(1,2,3)))
val pairs = for {
(n, nestedList) <- list
m <- nestedList
} yield (n, m)
assert(pairs == List((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)))
The compiler will effectively re-write the for-comprehension to a flatMap/map chain as described in another answer.

How the get the index of the duplicate pair in the scala list

I have a scala list like this below:
slist = List("a","b","c","a","d","c","a")
I want to get the index of the same element pair in this list.
For example,the result of this slist is
(0,3),(0,6),(3,6),(2,5)
which (0,3) means the slist(0)==slist(3)
(0,6) means the slist(0)==slist(6)
and so on.
So is there any method to do this in scala?
Thanks very much
There's simpler approaches but starting with zipWithIndex leads down this path. zipWithIndex returns a Tuple2 with the index and one of the letters. From there we groupBy the letter to get a map of the letter to it's indices and filter the ones with more than one value. Lastly, we have this MapLike.DefaultValuesIterable(List((a,0), (a,3), (a,6)), List((c,2), (c,5)))
which we take the indices from and make combinations.
scala> slist.zipWithIndex.groupBy(zipped => zipped._1).filter(t => t._2.size > 1).values.flatMap(xs => xs.map(t => t._2).combinations(2))
res40: Iterable[List[Int]] = List(List(0, 3), List(0, 6), List(3, 6), List(2, 5))
Indexing a List is rather inefficient so I recommend a transition to Vector and then back again (if needed).
val svec = slist.toVector
svec.indices
.map(x => (x,svec.indexOf(svec(x),x+1)))
.filter(_._2 > 0)
.toList
//res0: List[(Int, Int)] = List((0,3), (2,5), (3,6))
val v = slist.toVector; val s = v.size
for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
In Scala REPL:
scala> for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
res34: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((0,3), (0,6), (2,5), (3,6))

traverse list in scala and group all the first elements and second elements

This might be pretty simple questions. I have a list named "List1" that contain list of integer pairs as below.
List1 = List((1,2), (3,4), (9,8), (9,10))
Output should be:
r1 = (1,3,9,9) //List((1,2), (3,4), (9,8), (9,10))
r2 = (2,4,8,10) //List((1,2), (3,4), (9,8), (9,10))
array r1(Array[int]) should contains set of all first integers of each pair in the list.
array r2(Array[int]) should contains set of all second integers of each pair
Just use unzip:
scala> List((1,2), (3,4), (9,8), (9,10)).unzip
res0: (List[Int], List[Int]) = (List(1, 3, 9, 9),List(2, 4, 8, 10))
Use foldLeft
val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
Scala REPL
scala> val list1 = List((1, 2), (3, 4), (5, 6))
list1: List[(Int, Int)] = List((1,2), (3,4), (5,6))
scala> val (alist, blist) = list1.foldLeft((List.empty[Int], List.empty[Int])) { (r, c) => (r._1 ++ List(c._1), r._2 ++ List(c._2))}
alist: List[Int] = List(1, 3, 5)
blist: List[Int] = List(2, 4, 6)

How do I yield pairwise combinations of a collection, ignoring order?

Let's assume I have a collection (let's use a set):
scala> val x = Set(1, 2, 3)
x: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
I can get all pairwise combinations with the following code:
scala> for {
| a <- x
| b <- x
| if a != b
| } yield (a, b)
res9: scala.collection.immutable.Set[(Int, Int)] = Set((3,1), (3,2), (1,3), (2,3), (1,2), (2,1))
The problem is that I only want to get all pairwise combinations where order is ignored (so the combination (1, 2) is equivalent to (2, 1)). So I'd like to return Set((3, 1), (3, 2), (1, 2)).
Do not assume that the elements of the collection will be integers. They may be any arbitrary type.
Any ideas?
Edit: Python's itertools.combinations performs the exact functionality I'm looking for. I just want an idiomatic way to do it in Scala :)
Scala has a combinations method too, but it's only defined on Seq, not Set. So turn your set into a Seq first, and the following will give you an Iterator[Seq[Int]]:
x.toSeq.combinations(2)
If you really want tuples, add map {case Seq(a,b) => (a,b)} to the above.
If you don't mind having a Set of Sets instead of a Set of tuples it is easy:
scala> val s3 = for {
| a <- x
| b <- x
| if(a != b)
| } yield ( Set(a,b))
s3: scala.collection.immutable.Set[scala.collection.immutable.Set[Int]] = Set(Set(1, 2), Set(1, 3), Set(2, 3))