fold left operation in Scala? - scala

I am having difficulty understanding how fold left works in Scala.
The following code computes for each unique character in the list chars the number of
times it occurs. For example, the invocation
times(List('a', 'b', 'a'))
should return the following (the order of the resulting list is not important):
List(('a', 2), ('b', 1))
def times(chars: List[Char]): List[(Char, Int)] = {
def incr(acc: Map[Char,Int], c: Char) = {
val count = (acc get c).getOrElse(0) + 1
acc + ((c, count));
}
val map = Map[Char, Int]()
(map /: chars)(incr).iterator.toList
}
I am just confused as to what the last line of this function is actually doing?
Any help wpuld be great.
Thanks.

foldLeft in scala works like this:
suppose you have a list of integers,
val nums = List(2, 3, 4, 5, 6, 7, 8, 9, 10)
val res= nums.foldLeft(0)((m: Int, n: Int) => m + n)
you will get res=55.
lets visualise it.
val res1 = nums.foldLeft(0) { (m: Int, n: Int) => println("m: " + m + " n: " + n);
m + n }
m: 0 n: 1
m: 1 n: 2
m: 3 n: 3
m: 6 n: 4
m: 10 n: 5
m: 15 n: 6
m: 21 n: 7
m: 28 n: 8
m: 36 n: 9
m: 45 n: 10
so, we can see that we need to pass initial accumulator value in foldLeft argument. And accumulated value is stored in 'm' and next value we get in 'n'.
And finally we get the accumulator as result.

Let's start from the "last line" which you are asking about: as the Map trait extends Iterable which in turn extends Traversable where the operator /: is explained, the code (map /: chars)(incr) does fold-left over chars, with the initial value of the accumulator being the empty mapping from characters to integers, applying incr to each intermediate value of acc and each element c of chars.
For example, when chars is List('a', 'b', 'a', 'c'), the fold-left expression (map /: chars)(incr) equals incr(incr(incr(incr(Map[Char, Int](), 'a'), 'b'), 'a'), 'c').
Now, as for what incr does: it takes an intermediate mapping acc from characters to integers, along with a character c, and increments by 1 the integer corresponding to c in the mapping. (Strictly speaking, the mapping is immutable and therefore never mutated: instead, a new, updated mapping is created and returned. Also, getOrElse(0) says that, if c does not exist in acc, the integer to be incremented is considered 0.)
As a whole, given List('a', 'b', 'a', 'c') as chars for example, the final mapping would be List(('a', 2), ('b', 1), ('c', 1)) when converted to a list by toList.

I rewrote your function in a more verbose way:
def times(chars: List[Char]): List[(Char, Int)] = {
chars
.foldLeft(Map[Char, Int]()){ (acc, c) =>
acc + ((c, acc.getOrElse(c, 0) + 1))
}
.toList
}
Let's see the first steps on times("aba".toList)
First invocation:
(Map(), 'a') => Map() ++ Map(`a` -> 1)
Second invocation:
(Map(`a` -> 1), `b`) => Map('a' -> 1) ++ Map('b' ->1)
Third invocation:
(Map('a' -> 1, 'b' ->1), 'a') =>
Map('a' -> 1, 'b' ->1) ++ Map('a' -> 2) =>
Map('a' -> 2, 'b' ->1)

The actual implementation in the scala codebase is very concise:
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
Let me rename stuff for clarity:
def foldLeft[B](initialValue: B)(f: (B, A) => B): B = {
//Notice that both accumulator and collectionCopy are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var collectionCopy = this //the function is inside a collection class, so **this** is the collection
while (!collectionCopy.isEmpty) {
accumulator = f(accumulator , collection.head)
collectionCopy = these.tail
}
accumulator
}
Edit after comment:
Let us revisit now the the OPs function and rewrite it in an imperative manner (i.e. non-functional, which apparently is the source of confusion):
(map /: chars)(incr) is be exactly equivalent to chars.foldLeft(map)(incr), which can be imperatively rewritten as:
def foldLeft(initialValue: Map[Char,Int])(incrFunction: (Map[Char,Int], Char) => Map[Char,Int]): Map[Char,Int] = {
//Notice that both accumulator and charList are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var charList: List[Char] = this //the function is inside a collection class, so **this** is the collection
while (!charList.isEmpty) {
accumulator = incrFunction(accumulator , collection.head)
charList = these.tail
}
accumulator
}
I hope this makes the concept of foldLeft clearer.
So it is essentially an abstraction over an imperative while loop, that accumulates some value by traversing the collection and updating the accumulator. The accumulator is updated using a user-provided function that takes the previous value of the accumulator and the current item of the collection.
Its very description hints that it is a great tool to compute all sorts of aggregates on a collection, like sum, max etc. Yeah, scala collections actually provide all these functions, but they serve as a good example use case.
On the specifics of your question, let me point out that this can be easily done using groupBy:
def times(l: List[Char]) = l.groupBy(c => c).mapValues(_.size).toList
times(List('a','b','a')) // outputs List[(Char, Int)] = List((b,1), (a,2))
.groupBy(c => c) gives you Map[Char,List[Char]] = Map(b -> List(b), a -> List(a, a))
Then we use .mapValues(_.size) to map the values of the map to the size of the grouped sub-collections: Map[Char,Int] = Map(b -> 1, a -> 2).
Finally, you convert the map to a list of key-value tuples with .toList to get the final result.
Lastly, if you don't care about the order of the output list as you said, then leaving the output as a Map[Char,Int] conveys better this decision (instead of converting it to a list).

Related

SortedSet fold type mismatch

I have this code:
def distinct(seq: Seq[Int]): Seq[Int] =
seq.fold(SortedSet[Int]()) ((acc, i) => acc + i)
I want to iterate over seq, delete duplicates (keep the first number) and keep order of the numbers. My idea was to use a SortedSet as an acc.
But I am getting:
Type mismatch:
Required: String
Found: Any
How to solve this? (I also don't know how to convert SortedSet to Seq in the final iteration as I want distinct to return seq)
p.s. without using standard seq distinct method
Online code
You shouldn't use fold if you try to accumulate something with different type than container (SortedSet != Int) in your case. Look at signature fold:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
it takes accumulator with type A1 and combiner function (A1, A1) => A1 which combines two A1 elements.
In your case is better to use foldLeft which takes accumulator with different type than container:
def foldLeft[B](z: B)(op: (B, A) => B): B
it accumulates some B value using seed z and combiner from B and A to B.
In your case I would like to use LinkedHashSet it keeps the order of added elements and remove duplicates, look:
import scala.collection.mutable
def distinct(seq: Seq[Int]): Seq[Int] = {
seq.foldLeft(mutable.LinkedHashSet.empty[Int])(_ + _).toSeq
}
distinct(Seq(7, 2, 4, 2, 3, 0)) // ArrayBuffer(7, 2, 4, 3, 0)
distinct(Seq(0, 0, 0, 0)) // ArrayBuffer(0)
distinct(Seq(1, 5, 2, 7)) // ArrayBuffer(1, 5, 2, 7)
and after folding just use toSeq
be careful, lambda _ + _ is just syntactic sugar for combiner:
(linkedSet, nextElement) => linkedSet + nextElement
I would just call distinct on your Seq. You can see in the source-code of SeqLike, that distinct will just traverse the Seq und skip already seen data:
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result
}

Using span function to bisect a Map in Scala

I have a Map of the form [Int, Option[/* of some type that I am using */], thus:
scala> val t1: Map[Int,Option[(String,List[Int])]] = Map(500->Some("A",List(1,2,3)))
t1: Map[Int,Option[(String, List[Int])]] = Map(500 -> Some((A,List(1, 2, 3))))
scala> t1 + (400 -> Some("B",List(9,8,7))) + (300 -> None) + (200 -> None)
res6: scala.collection.immutable.Map[Int,Option[(String, List[Int])]] = Map(500 -> Some((A,List(1, 2, 3))), 400 -> Some((B,List(9, 8, 7))), 300 -> None, 200 -> None)
Now, I am trying to cleave into two maps, one having all the empty Values -from eponymous Key/Value - and the other having none of them, thus:
res6.span(e => e._2.isEmpty)
res7: (scala.collection.immutable.Map[Int,Option[(String, List[Int])]], scala.collection.immutable.Map[Int,Option[(String, List[Int])]]) = (Map(),Map(500 -> Some((A,List(1, 2, 3))), 400 -> Some((B,List(9, 8, 7))), 300 -> None, 200 -> None))
I am failing to understand why I am getting an empty Map on the left, while the < K,None > pairs are sitting blissfully inside the Map on the right. They should have been in the Map on the left, or so I expect.
What is the obvious thing that I am missing?
You should use partition instead of span.
Note: c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.
So, span would stop scanning if the condition is not met.
For example,
scala> val l = List(1, 9, 8, 0)
scala> l.span(e => e < 2)
res7: (List[Int], List[Int]) = (List(1),List(9, 8, 0))
scala> l.partition(e => e < 2)
res8: (List[Int], List[Int]) = (List(1, 0),List(9, 8))
Note that, actually for span, it might return different results for different runs, unless the underlying collection type is ordered.
In your case, the first element in map may not None. (Map is not ordered)
By definition when you use span we get a Tuple2 of sequences that are of the same type as the original collection, one contain true values and other false values.
def span(p: A => Boolean): (Repr, Repr) =
In your case
res6.span(e => e._2.isEmpty)
So in your case you have a empty and non-empty elements of Tuple2.
If you wish to get non-empty values you use simply use _2 as
val nonEmptyValue = res6.span(e => e._2.isEmpty)._2

Scala: How to "map" an Array[Int] to a Map[String, Int] using the "map" method?

I have the following Array[Int]: val array = Array(1, 2, 3), for which I have the following mapping relation between an Int and a String:
val a1 = array.map{
case 1 => "A"
case 2 => "B"
case 3 => "C"
}
To create a Map to contain the above mapping relation, I am aware that I can use a foldLeft method:
val a2 = array.foldLeft(Map[String, Int]()) { (m, e) =>
m + (e match {
case 1 => ("A", 1)
case 2 => "B" -> 2
case 3 => "C" -> 3
})
}
which outputs:
a2: scala.collection.immutable.Map[String,Int] = Map(A -> 1, B -> 2, C
-> 3)
This is the result I want. But can I achieve the same result via the map method?
The following codes do not work:
val a3 = array.map[(String, Int), Map[String, Int]] {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}
The signature of map is
def map[B, That](f: A => B)
(implicit bf: CanBuildFrom[Repr, B, That]): That
What is this CanBuildFrom[Repr, B, That]? I tried to read Tribulations of CanBuildFrom but don't really understand it. That article mentioned Scala 2.12+ has provided two implementations for map. But how come I didn't find it when I use Scala 2.12.4?
I mostly use Scala 2.11.12.
Call toMap in the end of your expression:
val a3 = array.map {
case 1 => ("A", 1)
case 2 => ("B", 2)
case 3 => ("C", 3)
}.toMap
I'll first define your function here for the sake of brevity in later explanation:
// worth noting that this function is effectively partial
// i.e. will throw a `MatchError` if n is not in (1, 2, 3)
def toPairs(n: Int): (String, Int) =
n match {
case 1 => "a" -> 1
case 2 => "b" -> 2
case 3 => "c" -> 3
}
One possible way to go (as already highlighted in another answer) is to use toMap, which only works on collection of pairs:
val ns = Array(1, 2, 3)
ns.toMap // doesn't compile
ns.map(toPairs).toMap // does what you want
It is worth noting however that unless you are working with a lazy representation (like an Iterator or a Stream) this will result in two passes over the collection and the creation of unnecessary intermediate collections: the first time by mapping toPairs over the collection and then by turning the whole collection from a collection of pairs to a Map (with toMap).
You can see it clearly in the implementation of toMap.
As suggested in the read you already linked in the answer (and in particular here) You can avoid this double pass in two ways:
you can leverage scala.collection.breakOut, an implementation of CanBuildFrom that you can give map (among others) to change the target collection, provided that you explicitly provide a type hint for the compiler:
val resultMap: Map[String, Int] = ns.map(toPairs)(collection.breakOut)
val resultSet: Set[(String, Int)] = ns.map(toPairs)(collection.breakOut)
otherwise, you can create a view over your collection, which puts it in the lazy wrapper that you need for the operation to not result in a double pass
ns.view.map(toPairs).toMap
You can read more about implicit builder providers and views in this Q&A.
Basically toMap (credits to Sergey Lagutin) is the right answer.
You could actually make the code a bit more compact though:
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
If you run this code:
val array = Array(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 0)
val a1 = array.map { i => ((i + 64).toChar, i) }.toMap
println(a1)
You will see this on the console:
Map(E -> 5, J -> 10, F -> 6, A -> 1, # -> 0, G -> 7, L -> 12, B -> 2, C -> 3, H -> 8, K -> 11, D -> 4)

What's the idiomatic way to take top n values according to some criteria?

I have the following code:
Sighting.all
.iterator
.map(s => (s, haversineDistance(s, ourLocation)))
.toSeq
.sortBy(_._2)
.take(5)
As expected, it returns 5 sightings closests to ourLocation.
However, for a very large number of sightings, it does not scale well. We can instead just go through all sightings O(N) and find the 5 closest ones, instead of sorting them all and thus doing O(N*logN). How to do so idiomatically?
As with your previous questions, fold might be of use. In this case I'd be tempted to fold over a PriorityQueue initialized to values larger than the expected data set.
import scala.collection.mutable.PriorityQueue
...
.iterator
.foldLeft(PriorityQueue((999,"x"),(999,"x"),(999,"x"),(999,"x"),(999,"x")){
case (pq, s) => pq.+=((haversineDistance(s, ourLocation), s)).tail
}
The result is a PriorityQueue of 5 (distance, sighting) tuples, but only the 5 smallest distances.
You can avoid sorting the big list by iterating through each of the elements in the list just once while maintaining a 5-element list as follows:
Keep the 5-element list sorted by distance in descending order so that its head element has the longest distance (Note that since 5 is small the cost of sorting is negligible)
In each iteration, if the current element in the original list has its distance shorter than that of the head element in the 5-element list, replace the head element with the current element; otherwise keep the current 5-element list
Upon completing the iterations, the 5-element list will consist of elements with the shortest distances and a final sorting by distance in ascending order will give the top5 list:
val list = Sighting.all.
iterator.
map(s => (s, haversineDistance(s, ourLocation))).
toSeq
// For example ...
res1: list = List(
("a", 5), ("b", 2), ("c", 12), ("d", 9), ("e", 6), ("f", 15),
("g", 9), ("h", 7), ("i", 6), ("j", 3), ("k", 10), ("l", 5)
)
val top5 = list.drop(5).
foldLeft( list.take(5).sortWith(_._2 > _._2) )(
(l, e) => if (e._2 < l.head._2)
(e :: l.tail).sortWith(_._2 > _._2)
else
l
).
sortBy(_._2)
// top5: List[(String, Int)] = List((b,2), (f,3), (h,5), (a,5), (e,6))
[UPDATE]
Below is a verbose version of the above top5 value assignment which hopefully makes the foldLeft expression look less overwhelming.
val initialTop5Sorted = list.take(5).sortWith(_._2 > _._2)
val originalListTail = list.drop(5)
def updateTop5Sorted = ( list: List[(String, Int)], element: (String, Int) ) => {
if (element._2 < list.head._2)
(element :: list.tail).sortWith(_._2 > _._2)
else
list
}
val top5 = originalListTail.
foldLeft( initialTop5Sorted )( updateTop5Sorted ).
sortBy(_._2)
Here's signature of foldLeft for your reference:
def foldLeft[B](z: B)(op: (B, A) => B): B
Here's a slightly different approach:
def topNBy[A, B : Ordering](xs: Iterable[A], n: Int, f: A => B): List[A] = {
val q = new scala.collection.mutable.PriorityQueue[A]()(Ordering.by(f))
for (x <- xs) {
q += x
if (q.size > n) {
q.dequeue()
}
}
q.dequeueAll.toList.reverse
}
fold is useful, and worth getting comfortable with, but if you're not creating a new object to act on in each iteration, and just modifying an existing one, it's no better than a for-loop. And I'd prefer relying on PriorityQueue to do the sorting rather than rolling your own, especially given it's an efficient O(log n) implementation. Functional purists might balk at this for being more imperative, but to me it's worth it for readability and conciseness. The mutable state is limited to a single local data structure.
You could even put it in an implicit class:
implicit class IterableWithTopN[A](xs: Iterable[A]) {
def topNBy[B : Ordering](n: Int, f: A => B): List[A] = {
...
}
}
And then use it like:
Sighting.all.topNBy(5, s => haversineDistance(s, ourLocation))

Scala List Operation

Given a List of Int and variable X of Int type . What is the best in Scala functional way to retain only those values in the List (starting from beginning of list) such that sum of list values is less than equal to variable.
This is pretty close to a one-liner:
def takeWhileLessThan(x: Int)(l: List[Int]): List[Int] =
l.scan(0)(_ + _).tail.zip(l).takeWhile(_._1 <= x).map(_._2)
Let's break that into smaller pieces.
First you use scan to create a list of cumulative sums. Here's how it works on a small example:
scala> List(1, 2, 3, 4).scan(0)(_ + _)
res0: List[Int] = List(0, 1, 3, 6, 10)
Note that the result includes the initial value, which is why we take the tail in our implementation.
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail
res1: List[Int] = List(1, 3, 6, 10)
Now we zip the entire thing against the original list. Taking our example again, this looks like the following:
scala> List(1, 2, 3, 4).scan(0)(_ + _).tail.zip(List(1, 2, 3, 4))
res2: List[(Int, Int)] = List((1,1), (3,2), (6,3), (10,4))
Now we can use takeWhile to take as many values as we can from this list before the cumulative sum is greater than our target. Let's say our target is 5 in our example:
scala> res2.takeWhile(_._1 <= 5)
res3: List[(Int, Int)] = List((1,1), (3,2))
This is almost what we want—we just need to get rid of the cumulative sums:
scala> res2.takeWhile(_._1 <= 5).map(_._2)
res4: List[Int] = List(1, 2)
And we're done. It's worth noting that this isn't very efficient, since it computes the cumulative sums for the entire list, etc. The implementation could be optimized in various ways, but as it stands it's probably the simplest purely functional way to do this in Scala (and in most cases the performance won't be a problem, anyway).
In addition to Travis' answer (and for the sake of completeness), you can always implement these type of operations as a foldLeft:
def takeWhileLessThanOrEqualTo(maxSum: Int)(list: Seq[Int]): Seq[Int] = {
// Tuple3: the sum of elements so far; the accumulated list; have we went over x, or in other words are we finished yet
val startingState = (0, Seq.empty[Int], false)
val (_, accumulatedNumbers, _) = list.foldLeft(startingState) {
case ((sum, accumulator, finished), nextNumber) =>
if(!finished) {
if (sum + nextNumber > maxSum) (sum, accumulator, true) // We are over the sum limit, finish
else (sum + nextNumber, accumulator :+ nextNumber, false) // We are still under the limit, add it to the list and sum
} else (sum, accumulator, finished) // We are in a finished state, just keep iterating over the list
}
accumulatedNumbers
}
This only iterates over the list once, so it should be more efficient, but is more complicated and requires a bit of reading code to understand.
I will go with something like this, which is more functional and should be efficient.
def takeSumLessThan(x:Int,l:List[Int]): List[Int] = (x,l) match {
case (_ , List()) => List()
case (x, _) if x<= 0 => List()
case (x, lh :: lt) => lh :: takeSumLessThan(x-lh,lt)
}
Edit 1 : Adding tail recursion and implicit for shorter call notation
import scala.annotation.tailrec
implicit class MyList(l:List[Int]) {
def takeSumLessThan(x:Int) = {
#tailrec
def f(x:Int,l:List[Int],acc:List[Int]) : List[Int] = (x,l) match {
case (_,List()) => acc
case (x, _ ) if x <= 0 => acc
case (x, lh :: lt ) => f(x-lh,lt,acc ++ List(lh))
}
f(x,l,Nil)
}
}
Now you can use this like
List(1,2,3,4,5,6,7,8).takeSumLessThan(10)