Zip two lists with diminishing lengths - scala

I got the below problem statement that should be solved with scala:
Two list of elements, size of first one less than the second one. For instance list 1 have 2 elements & list 2 have 10 elements.
Need to map each element of list 1 with two elements of second list. The elements used for first element shouldn't be used for the second element i.e. it takes two unique elements from second list & returns the remaining elements in second list with the mapped elements list.
scala> val list1 = List(1,2)
list1: List[Int] = List(1, 2)
scala> val list2 = List(3,4,5,6,7,8,9)
list2: List[Int] = List(3, 4, 5, 6, 7, 8, 9)
expected output
(List((1,3), (1,4), (2,5), (2,6)), List(7,8,9))

This is the kind of problem that I personally prefer to solve using a tail-recursive approach.
/** Zips two lists together by taking multiple elements from the second list
* for each element of the first list.
*
* #param l1 The first (small) list.
* #param l2 The second (big) list.
* #param n The number of elements to take of second list for each element of teh first list,
* must be greater than zero.
* #return A pair of the zipped list with the remaining elements of the second list,
* wrapped in an option to catch the possibility than the second list was consumed before finishing.
*/
def zipWithLarger[A, B](l1: List[A], l2: List[B])(n: Int): Option[(List[(A, B)], List[B])] = {
#annotation.tailrec
def loop(remainingA: List[A], remainingB: List[B], count: Int, acc: List[(A, B)]): Option[(List[(A, B)], List[B])] =
(remainingA, remainingB) match {
case (a :: as, b :: bs) =>
val newElement = (a, b)
if (count == n)
loop(remainingA = as, remainingB = bs, count = 1, newElement :: acc)
else
loop(remainingA, remainingB = bs, count + 1, newElement :: acc)
case (Nil, _) =>
Some(acc.reverse -> remainingB)
case (_, Nil) =>
// We consumed the second list beforing finishing the first one.
None
}
// Ensure n is positive.
if (n >= 1) loop(remainingA = l1, remainingB = l2, count = 1, acc = List.empty)
else None
}
You can see the code running here.

You start with repeating elements of the first list n times, flatMap the result and zip with second collection:
val list1 = List(1,2)
val list2 = List(3,4,5,6,7,8,9)
val zipped = list1
.flatMap(i => (1 to list1.size).map(_ => i))
.zip(list2)
val result = (zipped, list2.drop(zipped.size))

Related

Scala: Remove duplicated integers from Vector( tuples(Int,Int) , ...)

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.
val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))
I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)
I enable to access them and print out in this method:
val (id1,id2) = ( allSource.foreach(i=>println(i._1)), allSource.foreach(i=>i._2))
How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0
To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => v
}.flatten
This returns a List which you can call .toVector on if you need a Vector
This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:
myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)
This is more efficient for larger vectors but still preserves order:
val keep =
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => k
}.toSet
myVectorEG.filter(x => keep.contains(x._1))
You can use a distinctBy to remove duplicates.
In the case of Vector[(Int, Int)] it will look like this
myVectorEG.distinctBy(_._1)
Updated, if you need to remove all the duplicates:
You can use groupBy but this will rearrange your order.
myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector
Another option, taking advantage that you want the list sorted at the end.
def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
import Ordering.Implicits._
val sorted = input.sortBy(_._1)
#annotation.tailrec
def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
remaining match {
case x :: xs =>
if (x._1 == previous._1)
loop(remaining = xs, previous, repeated = true, acc)
else if (!repeated)
loop(remaining = xs, previous = x, repeated = false, previous :: acc)
else
loop(remaining = xs, previous = x, repeated = false, acc)
case Nil =>
(previous :: acc).reverse
}
sorted match {
case x :: xs =>
loop(remaining = xs, previous = x, repeated = false, acc = List.empty)
case Nil =>
List.empty
}
}
Which you can test like this:
val data = List(
1 -> "A",
3 -> "B",
1 -> "C",
4 -> "D",
3 -> "E",
5 -> "F",
1 -> "G",
0 -> "H"
)
sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))
(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)

Reduce sequence by parts

I have a sequence Seq[T] and I want to do partial reduce. For example for a Seq[Int] I want to get Seq[Int] consisting of the longest partial sums of monotonic regions. For example:
val s = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
groupMonotionic(s) = Seq(1 + 2 + 4, 3 + 2 + (-1), 0 + 6 + 8)
I was looking for some method like conditional fold with the signature fold(z: B)((B, T) => B, (T, T) => Boolean) where the predicate states for where to terminate current sum aggregation, but it seems there is no something like that in the subtrait hierarchy of Seq.
What would be a solution using Scala Collection API and without using mutable variables?
Here is one way amongst many to do this (using Scala 2.13's List#unfold):
// val items = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
items match {
case first :: _ :: _ => // If there are more than 2 items
List
.unfold(items.sliding(2).toList) { // We slid items to work on pairs of consecutive items
case Nil => // No more items to unfold
None // None signifies the end of the unfold
case rest # Seq(a, b) :: _ => // We span based on the sign of a-b
Some(rest.span(x => (x.head - x.last).signum == (a-b).signum))
}
.map(_.map(_.last)) // back from slided pairs
match { case head :: rest => (first :: head) :: rest }
case _ => // If there is 0 or 1 item
items.map(List(_))
}
// List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
List.unfold iterates as long as the unfolding function provides Some. It starts with an initial state which is the list of items to unfold. At each iteration, we span the state (remaining elements to unfold) based on the sign of the heading two elements difference. The unfolded elements are heading items sharing the same monotony and the unfolding state becomes the other remaining elements.
List#span splits a list into a tuple whose first part contains elements matching the predicate applied until the predicate stops being valid. The second part of the tuple contains the rest of the elements. Which fits perfectly the expected return type of List.unfold's unfolding function, which is Option[(A, S)] (In this case Option[(List[Int], List[Int])]).
Int.signum returns -1, 0 or 1 depending on the sign of the integer it's applied on.
Note that the first item has to be put back in the result as it hasn't an ancestor determining its signum (match { case head :: rest => (first :: head) :: rest }).
To apply the reducing function (in this case a sum), we can map the final result: .map(_.sum)
Works in Scala 2.13+ with cats
import scala.util.chaining._
import cats.data._
import cats.implicits._
val s = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
def isLocalExtrema(a: List[Int]) =
a.max == a(1) || a.min == a(1)
implicit class ListOps[T](ls: List[T]) {
def multiSpanUntil(f: T => Boolean): List[List[T]] = ls.span(f) match {
case (h, Nil) => List(h)
case (h, t) => (h ::: t.take(1)) :: t.tail.multiSpanUntil(f)
}
}
def groupMonotionic(groups: List[Int]) = groups match {
case Nil => Nil
case x if x.length < 3 => List(groups.sum)
case _ =>
groups
.sliding(3).toList
.map(isLocalExtrema)
.pipe(false :: _ ::: List(false))
.zip(groups)
.multiSpanUntil(!_._1)
.pipe(Nested.apply)
.map(_._2)
.value
.map(_.sum)
}
println(groupMonotionic(s))
//List(7, 4, 14)
Here's one way using foldLeft to traverse the numeric list with a Tuple3 accumulator (listOfLists, prevElem, prevTrend) that stores the previous element and previous trend to conditionally assemble a list of lists in the current iteration:
val list = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
val isUpward = (a: Int, b: Int) => a < b
val initTrend = isUpward(list.head, list.tail.head)
val monotonicLists = list.foldLeft( (List[List[Int]](), list.head, initTrend) ){
case ((lol, prev, prevTrend), curr) =>
val currTrend = isUpward(curr, prev)
if (currTrend == prevTrend)
((curr :: lol.head) :: lol.tail , curr, currTrend)
else
(List(curr) :: lol , curr, currTrend)
}._1.reverse.map(_.reverse)
// monotonicLists: List[List[Int]] = List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
To sum the individual nested lists:
monotonicLists.map(_.sum)
// res1: List[Int] = List(7, 4, 14)

What's the idiomatic way to take top n values according to some criteria?

I have the following code:
Sighting.all
.iterator
.map(s => (s, haversineDistance(s, ourLocation)))
.toSeq
.sortBy(_._2)
.take(5)
As expected, it returns 5 sightings closests to ourLocation.
However, for a very large number of sightings, it does not scale well. We can instead just go through all sightings O(N) and find the 5 closest ones, instead of sorting them all and thus doing O(N*logN). How to do so idiomatically?
As with your previous questions, fold might be of use. In this case I'd be tempted to fold over a PriorityQueue initialized to values larger than the expected data set.
import scala.collection.mutable.PriorityQueue
...
.iterator
.foldLeft(PriorityQueue((999,"x"),(999,"x"),(999,"x"),(999,"x"),(999,"x")){
case (pq, s) => pq.+=((haversineDistance(s, ourLocation), s)).tail
}
The result is a PriorityQueue of 5 (distance, sighting) tuples, but only the 5 smallest distances.
You can avoid sorting the big list by iterating through each of the elements in the list just once while maintaining a 5-element list as follows:
Keep the 5-element list sorted by distance in descending order so that its head element has the longest distance (Note that since 5 is small the cost of sorting is negligible)
In each iteration, if the current element in the original list has its distance shorter than that of the head element in the 5-element list, replace the head element with the current element; otherwise keep the current 5-element list
Upon completing the iterations, the 5-element list will consist of elements with the shortest distances and a final sorting by distance in ascending order will give the top5 list:
val list = Sighting.all.
iterator.
map(s => (s, haversineDistance(s, ourLocation))).
toSeq
// For example ...
res1: list = List(
("a", 5), ("b", 2), ("c", 12), ("d", 9), ("e", 6), ("f", 15),
("g", 9), ("h", 7), ("i", 6), ("j", 3), ("k", 10), ("l", 5)
)
val top5 = list.drop(5).
foldLeft( list.take(5).sortWith(_._2 > _._2) )(
(l, e) => if (e._2 < l.head._2)
(e :: l.tail).sortWith(_._2 > _._2)
else
l
).
sortBy(_._2)
// top5: List[(String, Int)] = List((b,2), (f,3), (h,5), (a,5), (e,6))
[UPDATE]
Below is a verbose version of the above top5 value assignment which hopefully makes the foldLeft expression look less overwhelming.
val initialTop5Sorted = list.take(5).sortWith(_._2 > _._2)
val originalListTail = list.drop(5)
def updateTop5Sorted = ( list: List[(String, Int)], element: (String, Int) ) => {
if (element._2 < list.head._2)
(element :: list.tail).sortWith(_._2 > _._2)
else
list
}
val top5 = originalListTail.
foldLeft( initialTop5Sorted )( updateTop5Sorted ).
sortBy(_._2)
Here's signature of foldLeft for your reference:
def foldLeft[B](z: B)(op: (B, A) => B): B
Here's a slightly different approach:
def topNBy[A, B : Ordering](xs: Iterable[A], n: Int, f: A => B): List[A] = {
val q = new scala.collection.mutable.PriorityQueue[A]()(Ordering.by(f))
for (x <- xs) {
q += x
if (q.size > n) {
q.dequeue()
}
}
q.dequeueAll.toList.reverse
}
fold is useful, and worth getting comfortable with, but if you're not creating a new object to act on in each iteration, and just modifying an existing one, it's no better than a for-loop. And I'd prefer relying on PriorityQueue to do the sorting rather than rolling your own, especially given it's an efficient O(log n) implementation. Functional purists might balk at this for being more imperative, but to me it's worth it for readability and conciseness. The mutable state is limited to a single local data structure.
You could even put it in an implicit class:
implicit class IterableWithTopN[A](xs: Iterable[A]) {
def topNBy[B : Ordering](n: Int, f: A => B): List[A] = {
...
}
}
And then use it like:
Sighting.all.topNBy(5, s => haversineDistance(s, ourLocation))

Find max and min values in List[List[String]] comparing by the 1st element in each sub-list

I have a data structure like List[(String, List[List[String]])]. I need to find maximum and minimum value in the List[List[String]] comparing the 1st element in each sub-list:
This is how I do it:
val timestamp_col_ind = 1
val sorted = processed.map(list => (list._1,list._2.sortWith(_.productElement(timestamp_col_ind).toString.toLong < _.productElement(timestamp_col_ind).toString.toLong)))
Then I access maximum and minimum elements using last.apply(timestamp_col_ind).toString.toLong and head.apply(timestamp_col_ind).toString.toLong, correspondingly.
But the problem is that the sub-lists do not get ordered by the 1st element. What am I doing wrong?
Like Alberto mentioned, it is always good to provide a full example to avoid confusion; see https://stackoverflow.com/help/mcve. Here is a solution to what it looks like you want:
// initialise structure
val data = List(("foo", List(List("2", "b", "b"), List("3", "c", "c"), List("1", "a", "a"))))
// sort by long value of first element of sublists
val sortedData = data.map { case (str, lst) => (str, lst.sortBy(_.head.toLong)) }
// get minimum and maximum
val min = sortedData.map { case (str, lst) => (str, lst.head) }
val max = sortedData.map { case (str, lst) => (str, lst.last) }
result:
min: List[(String, List[String])] = List((foo,List(1, a, a)))
max: List[(String, List[String])] = List((foo,List(3, c, c)))
This assumes that the sublists are never empty, i.e. they do have a head

fold left operation in Scala?

I am having difficulty understanding how fold left works in Scala.
The following code computes for each unique character in the list chars the number of
times it occurs. For example, the invocation
times(List('a', 'b', 'a'))
should return the following (the order of the resulting list is not important):
List(('a', 2), ('b', 1))
def times(chars: List[Char]): List[(Char, Int)] = {
def incr(acc: Map[Char,Int], c: Char) = {
val count = (acc get c).getOrElse(0) + 1
acc + ((c, count));
}
val map = Map[Char, Int]()
(map /: chars)(incr).iterator.toList
}
I am just confused as to what the last line of this function is actually doing?
Any help wpuld be great.
Thanks.
foldLeft in scala works like this:
suppose you have a list of integers,
val nums = List(2, 3, 4, 5, 6, 7, 8, 9, 10)
val res= nums.foldLeft(0)((m: Int, n: Int) => m + n)
you will get res=55.
lets visualise it.
val res1 = nums.foldLeft(0) { (m: Int, n: Int) => println("m: " + m + " n: " + n);
m + n }
m: 0 n: 1
m: 1 n: 2
m: 3 n: 3
m: 6 n: 4
m: 10 n: 5
m: 15 n: 6
m: 21 n: 7
m: 28 n: 8
m: 36 n: 9
m: 45 n: 10
so, we can see that we need to pass initial accumulator value in foldLeft argument. And accumulated value is stored in 'm' and next value we get in 'n'.
And finally we get the accumulator as result.
Let's start from the "last line" which you are asking about: as the Map trait extends Iterable which in turn extends Traversable where the operator /: is explained, the code (map /: chars)(incr) does fold-left over chars, with the initial value of the accumulator being the empty mapping from characters to integers, applying incr to each intermediate value of acc and each element c of chars.
For example, when chars is List('a', 'b', 'a', 'c'), the fold-left expression (map /: chars)(incr) equals incr(incr(incr(incr(Map[Char, Int](), 'a'), 'b'), 'a'), 'c').
Now, as for what incr does: it takes an intermediate mapping acc from characters to integers, along with a character c, and increments by 1 the integer corresponding to c in the mapping. (Strictly speaking, the mapping is immutable and therefore never mutated: instead, a new, updated mapping is created and returned. Also, getOrElse(0) says that, if c does not exist in acc, the integer to be incremented is considered 0.)
As a whole, given List('a', 'b', 'a', 'c') as chars for example, the final mapping would be List(('a', 2), ('b', 1), ('c', 1)) when converted to a list by toList.
I rewrote your function in a more verbose way:
def times(chars: List[Char]): List[(Char, Int)] = {
chars
.foldLeft(Map[Char, Int]()){ (acc, c) =>
acc + ((c, acc.getOrElse(c, 0) + 1))
}
.toList
}
Let's see the first steps on times("aba".toList)
First invocation:
(Map(), 'a') => Map() ++ Map(`a` -> 1)
Second invocation:
(Map(`a` -> 1), `b`) => Map('a' -> 1) ++ Map('b' ->1)
Third invocation:
(Map('a' -> 1, 'b' ->1), 'a') =>
Map('a' -> 1, 'b' ->1) ++ Map('a' -> 2) =>
Map('a' -> 2, 'b' ->1)
The actual implementation in the scala codebase is very concise:
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
Let me rename stuff for clarity:
def foldLeft[B](initialValue: B)(f: (B, A) => B): B = {
//Notice that both accumulator and collectionCopy are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var collectionCopy = this //the function is inside a collection class, so **this** is the collection
while (!collectionCopy.isEmpty) {
accumulator = f(accumulator , collection.head)
collectionCopy = these.tail
}
accumulator
}
Edit after comment:
Let us revisit now the the OPs function and rewrite it in an imperative manner (i.e. non-functional, which apparently is the source of confusion):
(map /: chars)(incr) is be exactly equivalent to chars.foldLeft(map)(incr), which can be imperatively rewritten as:
def foldLeft(initialValue: Map[Char,Int])(incrFunction: (Map[Char,Int], Char) => Map[Char,Int]): Map[Char,Int] = {
//Notice that both accumulator and charList are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var charList: List[Char] = this //the function is inside a collection class, so **this** is the collection
while (!charList.isEmpty) {
accumulator = incrFunction(accumulator , collection.head)
charList = these.tail
}
accumulator
}
I hope this makes the concept of foldLeft clearer.
So it is essentially an abstraction over an imperative while loop, that accumulates some value by traversing the collection and updating the accumulator. The accumulator is updated using a user-provided function that takes the previous value of the accumulator and the current item of the collection.
Its very description hints that it is a great tool to compute all sorts of aggregates on a collection, like sum, max etc. Yeah, scala collections actually provide all these functions, but they serve as a good example use case.
On the specifics of your question, let me point out that this can be easily done using groupBy:
def times(l: List[Char]) = l.groupBy(c => c).mapValues(_.size).toList
times(List('a','b','a')) // outputs List[(Char, Int)] = List((b,1), (a,2))
.groupBy(c => c) gives you Map[Char,List[Char]] = Map(b -> List(b), a -> List(a, a))
Then we use .mapValues(_.size) to map the values of the map to the size of the grouped sub-collections: Map[Char,Int] = Map(b -> 1, a -> 2).
Finally, you convert the map to a list of key-value tuples with .toList to get the final result.
Lastly, if you don't care about the order of the output list as you said, then leaving the output as a Map[Char,Int] conveys better this decision (instead of converting it to a list).