How do I yield pairwise combinations of a collection, ignoring order? - scala

Let's assume I have a collection (let's use a set):
scala> val x = Set(1, 2, 3)
x: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
I can get all pairwise combinations with the following code:
scala> for {
| a <- x
| b <- x
| if a != b
| } yield (a, b)
res9: scala.collection.immutable.Set[(Int, Int)] = Set((3,1), (3,2), (1,3), (2,3), (1,2), (2,1))
The problem is that I only want to get all pairwise combinations where order is ignored (so the combination (1, 2) is equivalent to (2, 1)). So I'd like to return Set((3, 1), (3, 2), (1, 2)).
Do not assume that the elements of the collection will be integers. They may be any arbitrary type.
Any ideas?
Edit: Python's itertools.combinations performs the exact functionality I'm looking for. I just want an idiomatic way to do it in Scala :)

Scala has a combinations method too, but it's only defined on Seq, not Set. So turn your set into a Seq first, and the following will give you an Iterator[Seq[Int]]:
x.toSeq.combinations(2)
If you really want tuples, add map {case Seq(a,b) => (a,b)} to the above.

If you don't mind having a Set of Sets instead of a Set of tuples it is easy:
scala> val s3 = for {
| a <- x
| b <- x
| if(a != b)
| } yield ( Set(a,b))
s3: scala.collection.immutable.Set[scala.collection.immutable.Set[Int]] = Set(Set(1, 2), Set(1, 3), Set(2, 3))

Related

flatMap() in Scala during recursion

I was going through a Scala-99 problem to reduce a complex nested list into a flat list. Code given below:
def flatten(l: List[Any]): List[Any] = l flatMap {
case ms:List[_] => flatten(ms)
case l => List(l)
}
val L = List(List(1, 1), 2, List(3, List(5, 8)))
val flattenedList = flatten(L)
For the given input above L, I understood this problem by drawing a tree (given below)
List(List(1, 1), 2, List(3, List(5, 8))) (1)
| \ \
List(1, 1) List(2) List(3, List(5, 8)) (2)
| \ | \
List(1) List(1) List(3) List(5, 8) (3)
| \
List(5) List(8) (4)
What I've understood is that, the program results in the leaf nodes being added in a list maintained by Scala internally, like:
li = List(List(1), List(1), List(2), List(3), List(5), List(8))
and then the result is passed to the flatten method which results in the final answer:
List(1, 1, 2, 3, 5, 8)
Is my understanding correct?
EDIT: I'm sorry, I forgot to add this:
I wanted to ask that if my understanding is correct then why does replacing flatMap with map in the flatten's definition above produces this list:
List(List(List(1), List(1)), List(2), List(List(3), List(List(5), List(8))))
I mean isn't flatMap just map then flatten. Shouldn't I be getting like the one I mentioned above:
li = List(List(1), List(1), List(2), List(3), List(5), List(8))
You're right that flatMapis just map and flatten but note that this flatten is not the same flatten you define, for list it only concatenate inner lists at 1 level.
One very useful way to unpack these is to use substitution model, just like maths
if I define it like this, (calling it f to avoid confusion with flatten here and flatten in std library)
def f(l: List[Any]): List[Any] = l map {
case ms:List[_] => f(ms)
case l => List(l)
}
then
f(List( List(1, 1), 2))
= List(f(List(1, 1)), f(2)) // apply f to element of the outer most list
= List(List(f(1), f(1)), f(2)) // apply f to element of the inner list
= List(List(List(1), List(1)), List(2))) // no more recursion
Notice map doesn't change the structure of your list, it only applies the function to each element. This should explains how you have the result if you replace flatMap with map
Now if you have flatMap instead of map, then the flatten step is simply concatenating
def f(l: List[Any]): List[Any] = l flatMap {
case ms:List[_] => f(ms)
case l => List(l)
}
then
f(List(List(1,1), 2))
= f(List(1,1)) ++ f(2) // apply f to each element and concatenate
= (f(1) ++ f(1)) ++ f(2)
= (List(1) ++ List(1)) ++ List(2)
= List( 1,1) ++ List(2)
= List(1,2,3)
or in another way, using flatten instead of ++
f( List( List(1,1), 2))
= flatten(List( f( List( 1, 1)) , f(2))) // map and flatten
= flatten(List( flatten(List(f(1), f(1))), f(2))) // again map and flatten
= flatten(List( flatten(List(List(1), List(1))), List(2))))
now you can see that flatten is called multiple times, at every level where you recursively apply f which will collapse your tree 1 level at a time into just 1 big list.
To answer your comment: why is List(1,1) is turned into flatten(List(List(1), List(1)). It's because this is the simple case, but consider List(1, List(2)), then f will be applied for 1 and List(2). Because the next step is to 'flatten' (in stdlib) then both 1 & List(2) must be turned into a List so that it is in the right shape

How the get the index of the duplicate pair in the scala list

I have a scala list like this below:
slist = List("a","b","c","a","d","c","a")
I want to get the index of the same element pair in this list.
For example,the result of this slist is
(0,3),(0,6),(3,6),(2,5)
which (0,3) means the slist(0)==slist(3)
(0,6) means the slist(0)==slist(6)
and so on.
So is there any method to do this in scala?
Thanks very much
There's simpler approaches but starting with zipWithIndex leads down this path. zipWithIndex returns a Tuple2 with the index and one of the letters. From there we groupBy the letter to get a map of the letter to it's indices and filter the ones with more than one value. Lastly, we have this MapLike.DefaultValuesIterable(List((a,0), (a,3), (a,6)), List((c,2), (c,5)))
which we take the indices from and make combinations.
scala> slist.zipWithIndex.groupBy(zipped => zipped._1).filter(t => t._2.size > 1).values.flatMap(xs => xs.map(t => t._2).combinations(2))
res40: Iterable[List[Int]] = List(List(0, 3), List(0, 6), List(3, 6), List(2, 5))
Indexing a List is rather inefficient so I recommend a transition to Vector and then back again (if needed).
val svec = slist.toVector
svec.indices
.map(x => (x,svec.indexOf(svec(x),x+1)))
.filter(_._2 > 0)
.toList
//res0: List[(Int, Int)] = List((0,3), (2,5), (3,6))
val v = slist.toVector; val s = v.size
for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
In Scala REPL:
scala> for(i<-0 to s-1;j<-0 to s-1;if(i<j && v(i)==v(j))) yield (i,j)
res34: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((0,3), (0,6), (2,5), (3,6))

How to sum adjacent elements in scala

I want to sum adjacent elements in scala and I'm not sure how to deal with the last element.
So I have a list:
val x = List(1,2,3,4)
And I want to sum adjacent elements using indices and map:
val size = x.indices.size
val y = x.indices.map(i =>
if (i < size - 1)
x(i) + x(i+1))
The problem is that this approach creates an AnyVal elemnt at the end:
res1: scala.collection.immutable.IndexedSeq[AnyVal] = Vector(3, 5, 7, ())
and if I try to sum the elements or another numeric method of the collection, it doesn't work:
error: could not find implicit value for parameter num: Numeric[AnyVal]
I tried to filter out the element using:
y diff List(Unit) or y diff List(AnyVal)
but it doesn't work.
Is there a better approach in scala to do this type of adjacent sum without using a foor loop?
For a more functional solution, you can use sliding to group the elements together in twos (or any number of them), then map to their sum.
scala> List(1, 2, 3, 4).sliding(2).map(_.sum).toList
res80: List[Int] = List(3, 5, 7)
What sliding(2) will do is create an intermediate iterator of lists like this:
Iterator(
List(1, 2),
List(2, 3),
List(3, 4)
)
So when we chain map(_.sum), we will map each inner List to it's own sum. toList will convert the Iterator back into a List.
You can try pattern matching and tail recursion also.
import scala.annotation.tailrec
#tailrec
def f(l:List[Int],r :List[Int]=Nil):List[Int] = {
l match {
case x :: xs :: xss =>
f(l.tail, r :+ (x + xs))
case _ => r
}
}
scala> f(List(1,2,3,4))
res4: List[Int] = List(3, 5, 7)
With a for comprehension by zipping two lists, the second with the first item dropped,
for ( (a,b) <- x zip x.drop(1) ) yield a+b
which results in
List(3, 5, 7)

Scala generate unique pairs from list

Input :
val list = List(1, 2, 3, 4)
Desired output :
Iterator((1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4))
This code works :
for (cur1 <- 0 until list.size; cur2 <- (cur1 + 1) until list.size)
yield (list(cur1), list(cur2))
but it not seems optimal, is there any better way of doing it ?
There's a .combinations method built-in:
scala> List(1,2,3,4).combinations(2).toList
res0: List[List[Int]] = List(List(1, 2), List(1, 3), List(1, 4), List(2, 3), List(2, 4), List(3, 4))
It returns an Iterator, but I added .toList just for the purpose of printing the result. If you want your results in tuple form, you can do:
scala> List(1,2,3,4).combinations(2).map{ case Seq(x, y) => (x, y) }.toList
res1: List[(Int, Int)] = List((1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
You mentioned uniqueness as well, so you could apply .distinct to your input list uniqueness isn't a precondition of your function, because .combination will not deduplicate for you.
.combinations is the proper way for generating unique arbitrary groups of any size, another alternative solution that does not check the uniqueness in first place is using foldLeft that way:
val list = (1 to 10).toList
val header :: tail = list
tail.foldLeft((header, tail, List.empty[(Int, Int)])) {
case ((header, tail, res), elem) =>
(elem, tail.drop(1), res ++ tail.map(x => (header, x)))
}._3
Will produce:
res0: List[(Int, Int)] = List((1,2), (1,3), (1,4), (1,5), (1,6), (1,7), (1,8), (1,9), (1,10), (2,3), (2,4), (2,5), (2,6), (2,7), (2,8), (2,9), (2,10), (3,4), (3,5), (3,6), (3,7), (3,8), (3,9), (3,10), (4,5), (4,6), (4,7), (4,8), (4,9), (4,10), (5,6), (5,7), (5,8), (5,9), (5,10), (6,7), (6,8), (6,9), (6,10), (7,8), (7,9), (7,10), (8,9), (8,10), (9,10))
If you expect there to be duplicates then you can turn the output list into a set and bring it back into a list, but you will lose the ordering then. Thus not the recommended way if you want to have uniqueness, but should be preferred if you want to generate all of the pairs included equal elements.
E.g. I used it in the field of machine learning for generating all of the products between each pair of variables in the feature space and if two or more variables have the same value I still want to produce a new variable corresponding to their product even though those newly generated "interaction variables" will have duplicates.

verifying a probability distribution with variable arguments sums to 1

I was wondering how you would write a method in Scala that takes a function f and a list of arguments args where each arg is a range. Suppose I have three arguments (Range(0,2), Range(0,10), and Range(1, 5)). Then I want to iterate over f with all the possibilities of those three arguments.
var sum = 0.0
for (a <- arg(0)) {
for (b <- arg(1)) {
for (c <- arg(2)) {
sum += f(a, b, c)
}
}
}
However, I want this method to work for functions with a variable number of arguments. Is this possible?
Edit: is there any way to do this when the function does not take a list, but rather takes a standard parameter list or is curried?
That's a really good question!
You want to run flatMap in sequence over a list of elements of arbitrary size. When you don't know how long your list is, you can process it with recursion, or equivalently, with a fold.
scala> def sequence[A](lss: List[List[A]]) = lss.foldRight(List(List[A]())) {
| (m, n) => for (x <- m; xs <- n) yield x :: xs
| }
scala> sequence(List(List(1, 2), List(4, 5), List(7)))
res2: List[List[Int]] = List(List(1, 4, 7), List(1, 5, 7), List(2, 4, 7), List(2
, 5, 7))
(If you can't figure out the code, don't worry, learn how to use Hoogle and steal it from Haskell)
You can do this with Scalaz (in general it starts with a F[G[X]] and returns a G[F[X]], given that the type constructors G and F have the Traverse and Applicative capabilities respectively.
scala> import scalaz._
import scalaz._
scala> import Scalaz._
import Scalaz._
scala> List(List(1, 2), List(4, 5), List(7)).sequence
res3: List[List[Int]] = List(List(1, 4, 7), List(1, 5, 7), List(2, 4, 7), List(2
, 5, 7))
scala> Seq(some(1), some(2)).sequence
res4: Option[Seq[Int]] = Some(List(1, 2))
scala> Seq(some(1), none[Int]).sequence
res5: Option[Seq[Int]] = None
That would more or less do the job (without applying f, which you can do separately)
def crossProduct[A](xxs: Seq[A]*) : Seq[Seq[A]]
= xxs.foldLeft(Vector(Vector[A]())){(res, xs) =>
for(r <- res; x <- xs) yield r :+ x
}
You can then just map your function on that. I'm not sure it's a very efficient implementation though.
That's the answer from recursive perspective. Unfortunately, not so short as others.
def foo(f: List[Int] => Int, args: Range*) = {
var sum = 0.0
def rec(ranges: List[Range], ints: List[Int]): Unit = {
if (ranges.length > 0)
for (i <- ranges.head)
rec(ranges.tail, i :: ints)
else
sum += f(ints)
}
rec(args.toList, List[Int]())
sum
}
Have a look at this answer. I use this code for exactly this purpose. It's slightly optimized. I think I could produce a faster version if you need one.