Functional Creating a list based on values in Scala - scala

I have a task to traverse a sequence of tuples and based on last value in the tuple make 1 or more copies of a case class Item. I can solve this task with foreach and Mutable List. As I'm learning FP and Scala collections could it be done more functional way with immutable collections and high order functions in Scala?
For example, input:
List[("A", 2), ("B", 3), ...]
Output:
List[Item("A"), Item("A"), Item("B"),Item("B"),Item("B"), ...]

For each tuple flatMap using List.fill[A](n: Int)(elem: ⇒ A) which produces a List of elem n times.
scala> val xs = List(("A", 2), ("B", 3), ("C", 4))
xs: List[(String, Int)] = List((A,2), (B,3), (C,4))
scala> case class Item(s: String)
defined class Item
scala> xs.flatMap(x => List.fill(x._2)(Item(x._1)))
res2: List[Item] = List(Item(A), Item(A), Item(B), Item(B), Item(B), Item(C), Item(C), Item(C), Item(C))

Using flatten for case class Item(v: String) as follows
myList.map{ case(s,n) => List.fill(n)(Item(s)) }.flatten
Also with a for comprehension like this,
for ( (s,n) <- myList ; l <- List.fill(n)(Item(s)) ) yield l
which is syntax sugar for a call to flatMap.
In addition to List.fill consider List.tabulate for initialising lists, for instance in this way,
for ( (s,n) <- myList ; l <- List.tabulate(n)(_ => Item(s)) ) yield l

Related

What's the idiomatic way to take top n values according to some criteria?

I have the following code:
Sighting.all
.iterator
.map(s => (s, haversineDistance(s, ourLocation)))
.toSeq
.sortBy(_._2)
.take(5)
As expected, it returns 5 sightings closests to ourLocation.
However, for a very large number of sightings, it does not scale well. We can instead just go through all sightings O(N) and find the 5 closest ones, instead of sorting them all and thus doing O(N*logN). How to do so idiomatically?
As with your previous questions, fold might be of use. In this case I'd be tempted to fold over a PriorityQueue initialized to values larger than the expected data set.
import scala.collection.mutable.PriorityQueue
...
.iterator
.foldLeft(PriorityQueue((999,"x"),(999,"x"),(999,"x"),(999,"x"),(999,"x")){
case (pq, s) => pq.+=((haversineDistance(s, ourLocation), s)).tail
}
The result is a PriorityQueue of 5 (distance, sighting) tuples, but only the 5 smallest distances.
You can avoid sorting the big list by iterating through each of the elements in the list just once while maintaining a 5-element list as follows:
Keep the 5-element list sorted by distance in descending order so that its head element has the longest distance (Note that since 5 is small the cost of sorting is negligible)
In each iteration, if the current element in the original list has its distance shorter than that of the head element in the 5-element list, replace the head element with the current element; otherwise keep the current 5-element list
Upon completing the iterations, the 5-element list will consist of elements with the shortest distances and a final sorting by distance in ascending order will give the top5 list:
val list = Sighting.all.
iterator.
map(s => (s, haversineDistance(s, ourLocation))).
toSeq
// For example ...
res1: list = List(
("a", 5), ("b", 2), ("c", 12), ("d", 9), ("e", 6), ("f", 15),
("g", 9), ("h", 7), ("i", 6), ("j", 3), ("k", 10), ("l", 5)
)
val top5 = list.drop(5).
foldLeft( list.take(5).sortWith(_._2 > _._2) )(
(l, e) => if (e._2 < l.head._2)
(e :: l.tail).sortWith(_._2 > _._2)
else
l
).
sortBy(_._2)
// top5: List[(String, Int)] = List((b,2), (f,3), (h,5), (a,5), (e,6))
[UPDATE]
Below is a verbose version of the above top5 value assignment which hopefully makes the foldLeft expression look less overwhelming.
val initialTop5Sorted = list.take(5).sortWith(_._2 > _._2)
val originalListTail = list.drop(5)
def updateTop5Sorted = ( list: List[(String, Int)], element: (String, Int) ) => {
if (element._2 < list.head._2)
(element :: list.tail).sortWith(_._2 > _._2)
else
list
}
val top5 = originalListTail.
foldLeft( initialTop5Sorted )( updateTop5Sorted ).
sortBy(_._2)
Here's signature of foldLeft for your reference:
def foldLeft[B](z: B)(op: (B, A) => B): B
Here's a slightly different approach:
def topNBy[A, B : Ordering](xs: Iterable[A], n: Int, f: A => B): List[A] = {
val q = new scala.collection.mutable.PriorityQueue[A]()(Ordering.by(f))
for (x <- xs) {
q += x
if (q.size > n) {
q.dequeue()
}
}
q.dequeueAll.toList.reverse
}
fold is useful, and worth getting comfortable with, but if you're not creating a new object to act on in each iteration, and just modifying an existing one, it's no better than a for-loop. And I'd prefer relying on PriorityQueue to do the sorting rather than rolling your own, especially given it's an efficient O(log n) implementation. Functional purists might balk at this for being more imperative, but to me it's worth it for readability and conciseness. The mutable state is limited to a single local data structure.
You could even put it in an implicit class:
implicit class IterableWithTopN[A](xs: Iterable[A]) {
def topNBy[B : Ordering](n: Int, f: A => B): List[A] = {
...
}
}
And then use it like:
Sighting.all.topNBy(5, s => haversineDistance(s, ourLocation))

Reduce/fold over scala sequence with grouping

In scala, given an Iterable of pairs, say Iterable[(String, Int]),
is there a way to accumulate or fold over the ._2s based on the ._1s? Like in the following, add up all the #s that come after A and separately the # after B
List(("A", 2), ("B", 1), ("A", 3))
I could do this in 2 steps with groupBy
val mapBy1 = list.groupBy( _._1 )
for ((key,sublist) <- mapBy1) yield (key, sublist.foldLeft(0) (_+_._2))
but then I would be allocating the sublists, which I would rather avoid.
You could build the Map as you go and convert it back to a List after the fact.
listOfPairs.foldLeft(Map[String,Int]().withDefaultValue(0)){
case (m,(k,v)) => m + (k -> (v + m(k)))
}.toList
You could do something like:
list.foldLeft(Map[String, Int]()) {
case (map, (k,v)) => map + (k -> (map.getOrElse(k, 0) + v))
}
You could also use groupBy with mapValues:
list.groupBy(_._1).mapValues(_.map(_._2).sum).toList
res1: List[(String, Int)] = List((A,5), (B,1))

scala map function of map vs. list

Snippet 1:
val l = List(1,2,43,4)
l.map(i => i *2)
Snippet 2:
val s = "dsadadaqer12"
val g = s.groupBy(c=>c)
g.map ( {case (c,s) => (c,s.length)})
In snippet #2, the syntax different than #1 , i.e. curly braces required -- why?
I thought the following would compile, but it does not:
g.map ( (c,s) => (c,s.length))
Can someone explain why?
Thanks
The difference between the two is - the latter uses Pattern Matching and the former doesn't.
The syntax g.map({case (c,s) => (c,s.length)}) is just syntax sugar for:
g.map(v => v match { case (c,s) => (c,s.length) })
Which means: we name the input argument of our anonymous function v, and then in the function body we match it to a tuple (c,s). Since this is so useful, Scala provides the shorthand version you used.
Of course - this doesn't really have anything to do with whether you use a Map or a List - consider all the following possibilities:
scala> val l = List(1,2,43,4)
l: List[Int] = List(1, 2, 43, 4)
scala> l.map({ case i => i*2 })
res0: List[Int] = List(2, 4, 86, 8)
scala> val l2 = List((1,2), (3,4))
l2: List[(Int, Int)] = List((1,2), (3,4))
scala> l2.map({ case (i, j) => i*j })
res1: List[Int] = List(2, 12)
scala> val g = Map(1 -> 2, 3 -> 4)
g: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4)
scala> g.map(t => t._1 * t._2)
res2: scala.collection.immutable.Iterable[Int] = List(2, 12)
Both Map and List can use both syntax options, depending mostly on what you actually want to do.
1- g.map{case (c,s) => (c,s.length)}
2- g.map((c,s) => (c,s.length))
The map method pulls a single argument, a 2-tuple, from the g collection. The 1st example compiles because the case statement uses pattern matching to extract the tuple's elements whereas the 2nd example doesn't and it won't compile. For that you'd have to do something like: g.map(t => (t._1, t._2.length))
As for the parenthesis vs. curly braces: braces have always been required for "partial functions," which is what that case statement is. You can use either braces or parens for anonymous functions (i.e. x => ...) although you are required to use braces if the function is more than a single line (i.e. has a carriage-return).
I read somewhere that this parens/braces distinction might be relaxed but I don't know if that's going to happen any time soon.

combine two lists with same keys

Here's a quite simple request to combine two lists as following:
scala> list1
res17: List[(Int, Double)] = List((1,0.1), (2,0.2), (3,0.3), (4,0.4))
scala> list2
res18: List[(Int, String)] = List((1,aaa), (2,bbb), (3,ccc), (4,ddd))
The desired output is as:
((aaa,0.1),(bbb,0.2),(ccc,0.3),(ddd,0.4))
I tried:
scala> (list1 ++ list2)
res23: List[(Int, Any)] = List((1,0.1), (2,0.2), (3,0.3), (4,0.4),
(1,aaa), (2,bbb), (3,ccc), (4,ddd))
But:
scala> (list1 ++ list2).groupByKey
<console>:10: error: value groupByKey is not a member of List[(Int,
Any)](list1 ++ list2).groupByKey
Any hints? Thanks!
The method you're looking for is groupBy:
(list1 ++ list2).groupBy(_._1)
If you know that for each key you have exactly two values, you can join them:
scala> val pairs = List((1, "a1"), (2, "b1"), (1, "a2"), (2, "b2"))
pairs: List[(Int, String)] = List((1,a1), (2,b1), (1,a2), (2,b2))
scala> pairs.groupBy(_._1).values.map {
| case List((_, v1), (_, v2)) => (v1, v2)
| }
res0: Iterable[(String, String)] = List((b1,b2), (a1,a2))
Another approach using zip is possible if the two lists contain the same keys in the same order:
scala> val l1 = List((1, "a1"), (2, "b1"))
l1: List[(Int, String)] = List((1,a1), (2,b1))
scala> val l2 = List((1, "a2"), (2, "b2"))
l2: List[(Int, String)] = List((1,a2), (2,b2))
scala> l1.zip(l2).map { case ((_, v1), (_, v2)) => (v1, v2) }
res1: List[(String, String)] = List((a1,a2), (b1,b2))
Here's a quick one-liner:
scala> list2.map(_._2) zip list1.map(_._2)
res0: List[(String, Double)] = List((aaa,0.1), (bbb,0.2), (ccc,0.3), (ddd,0.4))
If you are unsure why this works then read on! I'll expand it step by step:
list2.map(<function>)
The map method iterates over each value in list2 and applies your function to it. In this case each of the values in list2 is a Tuple2 (a tuple with two values). What you are wanting to do is access the second tuple value. To access the first tuple value use the ._1 method and to access the second tuple value use the ._2 method. Here is an example:
val myTuple = (1.0, "hello") // A Tuple2
println(myTuple._1) // prints "1.0"
println(myTuple._2) // prints "hello"
So what we want is a function literal that takes one parameter (the current value in the list) and returns the second tuple value (._2). We could have written the function literal like this:
list2.map(item => item._2)
We don't need to specify a type for item because the compiler is smart enough to infer it thanks to target typing. A really helpful shortcut is that we can just leave out item altogether and replace it with a single underscore _. So it gets simplified (or cryptified, depending on how you view it) to:
list2.map(_._2)
The other interesting part about this one liner is the zip method. All zip does is take two lists and combines them into one list just like the zipper on your favorite hoodie does!
val a = List("a", "b", "c")
val b = List(1, 2, 3)
a zip b // returns ((a,1), (b,2), (c,3))
Cheers!

How to sum adjacent elements in scala

I want to sum adjacent elements in scala and I'm not sure how to deal with the last element.
So I have a list:
val x = List(1,2,3,4)
And I want to sum adjacent elements using indices and map:
val size = x.indices.size
val y = x.indices.map(i =>
if (i < size - 1)
x(i) + x(i+1))
The problem is that this approach creates an AnyVal elemnt at the end:
res1: scala.collection.immutable.IndexedSeq[AnyVal] = Vector(3, 5, 7, ())
and if I try to sum the elements or another numeric method of the collection, it doesn't work:
error: could not find implicit value for parameter num: Numeric[AnyVal]
I tried to filter out the element using:
y diff List(Unit) or y diff List(AnyVal)
but it doesn't work.
Is there a better approach in scala to do this type of adjacent sum without using a foor loop?
For a more functional solution, you can use sliding to group the elements together in twos (or any number of them), then map to their sum.
scala> List(1, 2, 3, 4).sliding(2).map(_.sum).toList
res80: List[Int] = List(3, 5, 7)
What sliding(2) will do is create an intermediate iterator of lists like this:
Iterator(
List(1, 2),
List(2, 3),
List(3, 4)
)
So when we chain map(_.sum), we will map each inner List to it's own sum. toList will convert the Iterator back into a List.
You can try pattern matching and tail recursion also.
import scala.annotation.tailrec
#tailrec
def f(l:List[Int],r :List[Int]=Nil):List[Int] = {
l match {
case x :: xs :: xss =>
f(l.tail, r :+ (x + xs))
case _ => r
}
}
scala> f(List(1,2,3,4))
res4: List[Int] = List(3, 5, 7)
With a for comprehension by zipping two lists, the second with the first item dropped,
for ( (a,b) <- x zip x.drop(1) ) yield a+b
which results in
List(3, 5, 7)