Find the indices at which two arrays intersect - scala

I'm wondering if in Scala there's a decent way to get the indices at which two arrays intersect.
So given arrays:
a1 = [0, 5, 10, 15, 20, 25, 30]
a2 = [10, 20, 30, 40, 50]
Ideally, taking advantage of the fact that both arrays are ordered and contain no duplicates.
These share common elements a1.intersect(a2) = [10, 20, and 30]. The index (position) at which these elements occur is different for each array.
I would like to produce a sequence of tuples with the positions from each list where they intersect:
intersectingIndices(a1, a2) = [(2, 0), (4, 1), (6, 2)]
While intersect gives the intersecting values, I need to know the original indices and would prefer not to have to do an O(N) scan to find each one - as these arrays get very long (millions of elements). I also suspect the complexity of intersect is unnecessarily large given both arrays will always be sorted in advance, so a single-pass option would be preferable.

If both lists are storted, it seems fairly straightforward, just a slight variation of the "merge" phase of merge-sort.
#taiilrec
def intersect(
left: List[Int],
right: List[Int],
lidx: Int = 0,
ridx: Int = 0,
result: List[(Int, Int)] = Nil
): List[(Int, Int)] = (left, right) match {
case (Nil, _) | (_, Nil) => result.reverse
case (l::tail, r::_) if l < r => intersect(tail, right, lidx+1, ridx, result)
case (l::_, r::tail) if l > r => intersect(left, tail, lidx, ridx+1, result)
case (l::ltail, r::rtail) => intersect(ltail, rtail, lidx+1, ridx+1, (lidx, ridx) :: result)
}
Or just hash one of the lists, and then scan the other (it is still O(n), albeit somewhat more expensive, but much simpler):
val hashed = left.zipWithIndex.toMap
right.zipWithIndex.flatMap { case(x, idx) => hashed.get(x).map(idx -> _) }

Related

Count how many times numbers from a list occur in a list of tupled intervals in Scala

Say I have a list of tuples:
val ranges= List((1,4), (5,8), (9,10))
and a list of numbers
val nums = List(2,2,3,7,8,9)
I want to make a map from tuple in ranges to how many times a given number from nums fall into the interval of that tuple.
Output:
Map ((1,4) -> 3, (5,8) -> 2, (9,10) -> 1)
What is the best way to go about it in Scala
I have been trying to use for loops and keeping a counter but am falling short.
Something like this:
val ranges = List((1, 4), (5, 8), (9, 10))
val nums = List(2, 2, 3, 7, 8, 9)
val occurences = ranges.map { case (l, r) => nums.count((l to r) contains _) }
val map = (ranges zip occurences).toMap
println(map) // Map((1,4) -> 3, (5,8) -> 2, (9,10) -> 1)
Basically it first calculates the number of occurrences, [3, 2, 1]. From there it's easy to construct a map. And the way it calculates the occurrences is:
go through the list of ranges
transform each range into number of occurrences for that range, which is done like this :
how many numbers from the list nums are contained in that range?
Here is an efficient single-pass solution:
ranges
.map(r => r -> nums.count(n => n >= r._1 && n <= r._2))
.toMap
This avoids the overhead of creating a list of numbers and then zipping them with the ranges in a separate step.
This is a version that uses more Scala features but is a bit too fancy:
(for {
r <- ranges
range = r._1 to r._2
} yield r -> nums.count(range.contains)
).toMap
This is also less efficient because contains has to allow for ranges with a step value and is therefore more complicated.
And here is an even more efficient version that avoids any temporary data structures:
val result: Map[(Int, Int), Int] =
ranges.map(r => r -> nums.count(n => n >= r._1 && n <= r._2))(collection.breakOut)
See this explanation of breakOut if you are not familiar with it. Using breakOut means that the map call will build the Map directly rather than creating a List that has to be converted to a Map using toMap.

What's the idiomatic way to take top n values according to some criteria?

I have the following code:
Sighting.all
.iterator
.map(s => (s, haversineDistance(s, ourLocation)))
.toSeq
.sortBy(_._2)
.take(5)
As expected, it returns 5 sightings closests to ourLocation.
However, for a very large number of sightings, it does not scale well. We can instead just go through all sightings O(N) and find the 5 closest ones, instead of sorting them all and thus doing O(N*logN). How to do so idiomatically?
As with your previous questions, fold might be of use. In this case I'd be tempted to fold over a PriorityQueue initialized to values larger than the expected data set.
import scala.collection.mutable.PriorityQueue
...
.iterator
.foldLeft(PriorityQueue((999,"x"),(999,"x"),(999,"x"),(999,"x"),(999,"x")){
case (pq, s) => pq.+=((haversineDistance(s, ourLocation), s)).tail
}
The result is a PriorityQueue of 5 (distance, sighting) tuples, but only the 5 smallest distances.
You can avoid sorting the big list by iterating through each of the elements in the list just once while maintaining a 5-element list as follows:
Keep the 5-element list sorted by distance in descending order so that its head element has the longest distance (Note that since 5 is small the cost of sorting is negligible)
In each iteration, if the current element in the original list has its distance shorter than that of the head element in the 5-element list, replace the head element with the current element; otherwise keep the current 5-element list
Upon completing the iterations, the 5-element list will consist of elements with the shortest distances and a final sorting by distance in ascending order will give the top5 list:
val list = Sighting.all.
iterator.
map(s => (s, haversineDistance(s, ourLocation))).
toSeq
// For example ...
res1: list = List(
("a", 5), ("b", 2), ("c", 12), ("d", 9), ("e", 6), ("f", 15),
("g", 9), ("h", 7), ("i", 6), ("j", 3), ("k", 10), ("l", 5)
)
val top5 = list.drop(5).
foldLeft( list.take(5).sortWith(_._2 > _._2) )(
(l, e) => if (e._2 < l.head._2)
(e :: l.tail).sortWith(_._2 > _._2)
else
l
).
sortBy(_._2)
// top5: List[(String, Int)] = List((b,2), (f,3), (h,5), (a,5), (e,6))
[UPDATE]
Below is a verbose version of the above top5 value assignment which hopefully makes the foldLeft expression look less overwhelming.
val initialTop5Sorted = list.take(5).sortWith(_._2 > _._2)
val originalListTail = list.drop(5)
def updateTop5Sorted = ( list: List[(String, Int)], element: (String, Int) ) => {
if (element._2 < list.head._2)
(element :: list.tail).sortWith(_._2 > _._2)
else
list
}
val top5 = originalListTail.
foldLeft( initialTop5Sorted )( updateTop5Sorted ).
sortBy(_._2)
Here's signature of foldLeft for your reference:
def foldLeft[B](z: B)(op: (B, A) => B): B
Here's a slightly different approach:
def topNBy[A, B : Ordering](xs: Iterable[A], n: Int, f: A => B): List[A] = {
val q = new scala.collection.mutable.PriorityQueue[A]()(Ordering.by(f))
for (x <- xs) {
q += x
if (q.size > n) {
q.dequeue()
}
}
q.dequeueAll.toList.reverse
}
fold is useful, and worth getting comfortable with, but if you're not creating a new object to act on in each iteration, and just modifying an existing one, it's no better than a for-loop. And I'd prefer relying on PriorityQueue to do the sorting rather than rolling your own, especially given it's an efficient O(log n) implementation. Functional purists might balk at this for being more imperative, but to me it's worth it for readability and conciseness. The mutable state is limited to a single local data structure.
You could even put it in an implicit class:
implicit class IterableWithTopN[A](xs: Iterable[A]) {
def topNBy[B : Ordering](n: Int, f: A => B): List[A] = {
...
}
}
And then use it like:
Sighting.all.topNBy(5, s => haversineDistance(s, ourLocation))

How to transform a list into a list of pairs like that?

Suppose I have a list of A:
case class A(x: Int, y: Int)
val as = List(A(0, 0), A(0, 1), A(1, 0), A(1, 1))
I would like to transform it into a list of pairs (A, Set[A]) so that:
list of the fist elements of pairs is as
in each pair (a, set) set consists of such items of as that have the same x or y as a
For example:
val pairs = List(
A(0, 0) -> Set(A(0, 1), A(1, 0)),
A(0, 1) -> Set(A(0, 0), A(1, 1)),
A(1, 0) -> Set(A(0, 0), A(1, 1)),
A(1, 1) -> Set(A(0, 1), A(1, 0))
)
"Brute force":
case class A(x: Int, y: Int)
val as = List(A(0, 0), A(0, 1), A(1, 0), A(1, 1))
val mappings = for {
a1 <- as
a2 <- as
if (a1 != a2)
if (a1.x == a2.x || a1.y == a2.y)
} yield a1 -> a2
val result = mappings.groupBy(_._1).mapValues(_.map(_._2))
With some indexing you might get something a bit faster on the average case (while paying some extra storage for the indices):
First we create an index for Xs and Ys:
val xi = as.groupBy(_.x) // O(n)
val yi = as.groupBy(_.y) // O(n)
Then we do a single pass and map each element using the indices:
// This is essentially O(n^2) worst case, but it can be much less if the toSet doesn't have to go through a lot each time
val inter = for( a <- as ) yield
(a, (xi.get(a.x).toSet ++ yi.get(a.y).toSet).flatten)
Finally, you might want to remove from the Set the same elements as the key:
inter.map(x => (x._1, x._2 - x._1)) // O(n)
So overall this is still worst case O(n^2), but in the cases where the indices are sparse enough, it can be essentially O(n).

Functional "Find pairs that add up to X" with linear time complexity

I am trying to implement the "find pairs that add up to X" functionally with linear time complexity, for which I have the following:
def pairs(nums: List[Int], sum: Int): List[(Int, Int)] = {
def pairsR(nums: List[Int], sum: Int, start: Int, end: Int, acc: List[(Int, Int)]): List[(Int, Int)] = {
val newAcc = nums(start) + nums(end) match {
case n if n == sum => ( (nums(start), nums(end)) :: acc, start + 1, end - 1)
case n if n < sum => (acc, start + 1, end)
case n if n > sum => (acc, start, end - 1)
}
if(start < end) pairsR(nums, sum, newAcc._2, newAcc._3, newAcc._1)
else newAcc._1
}
pairsR(nums, sum, 0, nums.length - 1, List())
}
Which would work if I were trying to look for the first pair that adds to X (assuming I return after finding the first occurrence). But because I am trying to find all pairs I am getting some duplicates, as seen here: (note in the list there is only a single 5, yet because the pointers arrive at 5 at the same time I am guessing they count it twice)
pairs(List(1,2,3,4,5,6,7,8,9), 10) should equalTo (List( (1, 9), (2, 8), (3, 7), (4, 6) ))
Yet I get the following failure:
List((5,5), (4,6), (3,7), (2,8), (1,9)) is not equal to List((1,9),
(2,8), (3,7), (4,6))
Is this algorithm just not possible to accomplish in linear time when you are looking for ALL pairs (and not just the first)? I know its possible to do with a HashSet, but I wanted to know if you could take the "pointers" approach?
Here is an updated version of your code:
def pairs(nums: List[Int], sum: Int): List[(Int, Int)] = {
val numsArr = nums.toArray
def pairsR(start: Int, end: Int, acc: List[(Int, Int)]): List[(Int, Int)] =
numsArr(start) + numsArr(end) match {
case _ if start >= end => acc.reverse
case `sum` => pairsR(start + 1, end - 1, (numsArr(start), numsArr(end)) :: acc)
case n if n < sum => pairsR(start + 1, end, acc)
case n if n > sum => pairsR(start, end - 1, acc)
}
pairsR(0, numsArr.length - 1, Nil)
}
Test:
pairs(1 to 9 toList, 10)
Result:
res0: List[(Int, Int)] = List((1,9), (2,8), (3,7), (4,6))
Some notes:
When start and end pointers intersect somewhere at the middle of the array, it's time to end recursion and return acc. This condition must go first so you don't apply generic logic
As you prepend results to the acc, acc is in reversed order in the end, so it's reasonable to reverse it before returning.
No need to add static parameters, such as nums and sum as arguments of the inner recursive function
As people suggested in the comments, if you need read-only collection with constant time indexed access, your choice is Array. Vector will also work, but it's slower
As a side note, current implementation may produce incorrect results if you have duplicate elements in your input list:
pairs(List(1,1,1,2,2), 3)
Result:
res1: List[(Int, Int)] = List((1,2), (1,2))
The easiest way to fix that is to preprocess input list removing duplicate elements. This will make result contain only distinct pairs. However if you want to include all pairs with elements of the same value but different index, then it's not possible to do that in linear time (consider an example when you have N elements of X value and you want to find all sums of 2X. Result's length will be N2).
Also, this algorithm requires input data to be sorted. There is an easy change to the algorithm to make it work on unsorted data (using counting sort).

Listing combinations WITH repetitions in Scala

Trying to learn a bit of Scala and ran into this problem. I found a solution for all combinations without repetions here and I somewhat understand the idea behind it but some of the syntax is messing me up. I also don't think the solution is appropriate for a case WITH repetitions. I was wondering if anyone could suggest a bit of code that I could work from. I have plenty of material on combinatorics and understand the problem and iterative solutions to it, I am just looking for the scala-y way of doing it.
Thanks
I understand your question now. I think the easiest way to achieve what you want is to do the following:
def mycomb[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(el <- l;
sl <- mycomb(n-1, l dropWhile { _ != el } ))
yield el :: sl
}
def comb[T](n: Int, l: List[T]): List[List[T]] = mycomb(n, l.removeDuplicates)
The comb method just calls mycomb with duplicates removed from the input list. Removing the duplicates means it is then easier to test later whether two elements are 'the same'. The only change I have made to your mycomb method is that when the method is being called recursively I strip off the elements which appear before el in the list. This is to stop there being duplicates in the output.
> comb(3, List(1,2,3))
> List[List[Int]] = List(
List(1, 1, 1), List(1, 1, 2), List(1, 1, 3), List(1, 2, 2),
List(1, 2, 3), List(1, 3, 3), List(2, 2, 2), List(2, 2, 3),
List(2, 3, 3), List(3, 3, 3))
> comb(6, List(1,2,1,2,1,2,1,2,1,2))
> List[List[Int]] = List(
List(1, 1, 1, 1, 1, 1), List(1, 1, 1, 1, 1, 2), List(1, 1, 1, 1, 2, 2),
List(1, 1, 1, 2, 2, 2), List(1, 1, 2, 2, 2, 2), List(1, 2, 2, 2, 2, 2),
List(2, 2, 2, 2, 2, 2))
Meanwhile, combinations have become integral part of the scala collections:
scala> val li = List (1, 1, 0, 0)
li: List[Int] = List(1, 1, 0, 0)
scala> li.combinations (2) .toList
res210: List[List[Int]] = List(List(1, 1), List(1, 0), List(0, 0))
As we see, it doesn't allow repetition, but to allow them is simple with combinations though: Enumerate every element of your collection (0 to li.size-1) and map to element in the list:
scala> (0 to li.length-1).combinations (2).toList .map (v=>(li(v(0)), li(v(1))))
res214: List[(Int, Int)] = List((1,1), (1,0), (1,0), (1,0), (1,0), (0,0))
I wrote a similar solution to the problem in my blog: http://gabrielsw.blogspot.com/2009/05/my-take-on-99-problems-in-scala-23-to.html
First I thought of generating all the possible combinations and removing the duplicates, (or use sets, that takes care of the duplications itself) but as the problem was specified with lists and all the possible combinations would be too much, I've came up with a recursive solution to the problem:
to get the combinations of size n, take one element of the set and append it to all the combinations of sets of size n-1 of the remaining elements, union the combinations of size n of the remaining elements.
That's what the code does
//P26
def combinations[A](n:Int, xs:List[A]):List[List[A]]={
def lift[A](xs:List[A]):List[List[A]]=xs.foldLeft(List[List[A]]())((ys,y)=>(List(y)::ys))
(n,xs) match {
case (1,ys)=> lift(ys)
case (i,xs) if (i==xs.size) => xs::Nil
case (i,ys)=> combinations(i-1,ys.tail).map(zs=>ys.head::zs):::combinations(i,ys.tail)
}
}
How to read it:
I had to create an auxiliary function that "lift" a list into a list of lists
The logic is in the match statement:
If you want all the combinations of size 1 of the elements of the list, just create a list of lists in which each sublist contains an element of the original one (that's the "lift" function)
If the combinations are the total length of the list, just return a list in which the only element is the element list (there's only one possible combination!)
Otherwise, take the head and tail of the list, calculate all the combinations of size n-1 of the tail (recursive call) and append the head to each one of the resulting lists (.map(ys.head::zs) ) concatenate the result with all the combinations of size n of the tail of the list (another recursive call)
Does it make sense?
The question was rephrased in one of the answers -- I hope the question itself gets edited too. Someone else answered the proper question. I'll leave that code below in case someone finds it useful.
That solution is confusing as hell, indeed. A "combination" without repetitions is called permutation. It could go like this:
def perm[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(el <- l;
sl <- perm(n-1, l filter (_ != el)))
yield el :: sl
}
If the input list is not guaranteed to contain unique elements, as suggested in another answer, it can be a bit more difficult. Instead of filter, which removes all elements, we need to remove just the first one.
def perm[T](n: Int, l: List[T]): List[List[T]] = {
def perm1[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(el <- l;
(hd, tl) = l span (_ != el);
sl <- perm(n-1, hd ::: tl.tail))
yield el :: sl
}
perm1(n, l).removeDuplicates
}
Just a bit of explanation. In the for, we take each element of the list, and return lists composed of it followed by the permutation of all elements of the list except for the selected element.
For instance, if we take List(1,2,3), we'll compose lists formed by 1 and perm(List(2,3)), 2 and perm(List(1,3)) and 3 and perm(List(1,2)).
Since we are doing arbitrary-sized permutations, we keep track of how long each subpermutation can be. If a subpermutation is size 0, it is important we return a list containing an empty list. Notice that this is not an empty list! If we returned Nil in case 0, there would be no element for sl in the calling perm, and the whole "for" would yield Nil. This way, sl will be assigned Nil, and we'll compose a list el :: Nil, yielding List(el).
I was thinking about the original problem, though, and I'll post my solution here for reference. If you meant not having duplicated elements in the answer as a result of duplicated elements in the input, just add a removeDuplicates as shown below.
def comb[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(i <- (0 to (l.size - n)).toList;
l1 = l.drop(i);
sl <- comb(n-1, l1.tail))
yield l1.head :: sl
}
It's a bit ugly, I know. I have to use toList to convert the range (returned by "to") into a List, so that "for" itself would return a List. I could do away with "l1", but I think this makes more clear what I'm doing. Since there is no filter here, modifying it to remove duplicates is much easier:
def comb[T](n: Int, l: List[T]): List[List[T]] = {
def comb1[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(i <- (0 to (l.size - n)).toList;
l1 = l.drop(i);
sl <- comb(n-1, l1.tail))
yield l1.head :: sl
}
comb1(n, l).removeDuplicates
}
Daniel -- I'm not sure what Alex meant by duplicates, it may be that the following provides a more appropriate answer:
def perm[T](n: Int, l: List[T]): List[List[T]] =
n match {
case 0 => List(List())
case _ => for(el <- l.removeDuplicates;
sl <- perm(n-1, l.slice(0, l.findIndexOf {_ == el}) ++ l.slice(1 + l.findIndexOf {_ == el}, l.size)))
yield el :: sl
}
Run as
perm(2, List(1,2,2,2,1))
this gives:
List(List(2, 2), List(2, 1), List(1, 2), List(1, 1))
as opposed to:
List(
List(1, 2), List(1, 2), List(1, 2), List(2, 1),
List(2, 1), List(2, 1), List(2, 1), List(2, 1),
List(2, 1), List(1, 2), List(1, 2), List(1, 2)
)
The nastiness inside the nested perm call is removing a single 'el' from the list, I imagine there's a nicer way to do that but I can't think of one.
This solution was posted on Rosetta Code: http://rosettacode.org/wiki/Combinations_with_repetitions#Scala
def comb[A](as: List[A], k: Int): List[List[A]] =
(List.fill(k)(as)).flatten.combinations(k).toList
It is really not clear what you are asking for. It could be one of a few different things. First would be simple combinations of different elements in a list. Scala offers that with the combinations() method from collections. If elements are distinct, the behavior is exactly what you expect from classical definition of "combinations". For n-element combinations of p elements there will be p!/n!(p-n)! combinations in the output.
If there are repeated elements in the list, though, Scala will generate combinations with the item appearing more than once in the combinations. But just the different possible combinations, with the element possibly replicated as many times as they exist in the input. It generates only the set of possible combinations, so repeated elements, but not repeated combinations. I'm not sure if underlying it there is an iterator to an actual Set.
Now what you actually mean if I understand correctly is combinations from a given set of different p elements, where an element can appear repeatedly n times in the combination.
Well, coming back a little, to generate combinations when there are repeated elements in the input, and you wanna see the repeated combinations in the output, the way to go about it is just to generate it by "brute-force" using n nested loops. Notice that there is really nothing brute about it, it is just the natural number of combinations, really, which is O(p^n) for small n, and there is nothing you can do about it. You only should be careful to pick these values properly, like this:
val a = List(1,1,2,3,4)
def comb = for (i <- 0 until a.size - 1; j <- i+1 until a.size) yield (a(i), a(j))
resulting in
scala> comb
res55: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,1), (1,2), (1,3), (1,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4))
This generates the combinations from these repeated values in a, by first creating the intermediate combinations of 0 until a.size as (i, j)...
Now to create the "combinations with repetitions" you just have to change the indices like this:
val a = List('A','B','C')
def comb = for (i <- 0 until a.size; j <- i until a.size) yield (a(i), a(j))
will produce
List((A,A), (A,B), (A,C), (B,B), (B,C), (C,C))
But I'm not sure what's the best way to generalize this to larger combinations.
Now I close with what I was looking for when I found this post: a function to generate the combinations from an input that contains repeated elements, with intermediary indices generated by combinations(). It is nice that this method produces a list instead of a tuple, so that means we can actually solve the problem using a "map of a map", something I'm not sure anyone else has proposed here, but that is pretty nifty and will make your love for FP and Scala grow a bit more after you see it!
def comb[N](p:Seq[N], n:Int) = (0 until p.size).combinations(n) map { _ map p }
results in
scala> val a = List('A','A','B','C')
scala> comb(a, 2).toList
res60: List[scala.collection.immutable.IndexedSeq[Int]] = List(Vector(1, 1), Vector(1, 2), Vector(1, 3), Vector(1, 2), Vector(1, 3), Vector(2, 3))