Folding a list of tuples using Scala

Folding a list of tuples using Scala - scala

What's wrong with this simple scala code?
val l = List(("a", 1), ("b", 2), ("c", 3), ("d", 4), ("e", 5))
l.fold(0) {(acc: Int, tup: (String, Int)) => acc + tup._2}
:9: error: type mismatch;
found : (Int, (String, Int)) => Int
required: (Any, Any) => Any
l.fold(0) {(acc: Int, tup: (String, Int)) => acc + tup._2}
In other functional languages (e.g., f#) this works:
let l = [("a", 1); ("b", 2); ("c", 3); ("d", 4)];;
List.fold(fun accm tup -> accm + (snd tup)) 0 l;;
val it : int = 10

The fold method assumes an associative operator and can in theory (e.g. when using parallelism) be performed in arbitrary order. The signature thus makes it clear that the accumulating type must be a super-type of the collection's element:
def fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
The inferred super-type of (String, Int) and Int is Any.
This is all described in the API documentation.
What you want is a foldLeft or foldRight which do not have this restriction on the type:
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
Therefore:
l.foldLeft(0) { (acc, tup) => acc + tup._2 }
or
(0 /: l) { case (acc, (_, n)) => acc + n }

Related

How to write curried polymorphic function & its higher kind in Scala 3?

In Scala 3, I'm able to write a poly-function of type 1:
val y = [C <: Int] => (x: C) => x * 2
When I try to generalise it into type 2:
val z = [C <: Int] => ([D <: Int] => (x: C, y: D) = x * y)
I got the following error:
DependentPoly.scala:19:37: Implementation restriction: polymorphic function literals must have a value parameter
So is this feature not implemented? Or I'm not writing it properly?

Implementation restriction: polymorphic function literals must have a value parameter means that
val y = [C <: Int] => foo[C]
is illegal (for example for def foo[C <: Int]: C => Int = _ * 2) while
val y = [C <: Int] => (x: C) => x * 2
is legal.
Similarly,
val z = [C <: Int] => [D <: Int] => (x: C, y: D) => x * y
val z = [C <: Int] => [D <: Int] => (x: C) => (y: D) => x * y
are illegal while
val z = [C <: Int, D <: Int] => (x: C, y: D) => x * y
val z = [C <: Int, D <: Int] => (x: C) => (y: D) => x * y
val z = [C <: Int] => (x: C) => [D <: Int] => (y: D) => x * y
val z = [C <: Int] => (_: C) => [D <: Int] => (x: C, y: D) => x * y
val z = [C <: Int] => (_: C) => [D <: Int] => (x: C) => (y: D) => x * y
are legal.
This is because of
trait PolyFunction:
def apply[A](x: A): B[A]
https://docs.scala-lang.org/scala3/reference/new-types/polymorphic-function-types.html
https://github.com/lampepfl/dotty/pull/4672

Pattern matching for list of pairs scala

I have been trying to learn pattern matching and pairs in Scala and use it to implement the merge sort but the pattern matching. But the pattern match to extract head and tail pair doesn't work. What is it which I am missing in the below code?
def merge(xs: List[Int], ys: List[Int]): List[Int] =
(xs, ys) match {
case (x: Int, y: Int) :: (xs1: List[Int], ys1: List[Int]) =>
if (x < y) x :: merge(xs1, ys)
else y :: merge(xs, ys1)
case (x: List[Int], Nil) => x
case (Nil, y: List[Int]) => y
}

you have a syntax error in your 1st case statement, change it to
case (x :: xs1 , y :: ys1)
from
case (x: Int, y: Int) :: (xs1: List[Int], ys1: List[Int])
You are trying to match a tuple containing Lists, not List of Tuples

When to explicitly state types of inputs of functions?

I am able to calculate the mean word length per starting letter for the spark collection
val animals23 = sc.parallelize(List(("a","ant"), ("c","crocodile"), ("c","cheetah"), ("c","cat"), ("d","dolphin"), ("d","dog"), ("g","gnu"), ("l","leopard"), ("l","lion"), ("s","spider"), ("t","tiger"), ("w","whale")), 2)
either with
animals23.
aggregateByKey((0,0))(
(x, y) => (x._1 + y.length, x._2 + 1),
(x, y) => (x._1 + y._1, x._2 + y._2)
).
map(x => (x._1, x._2._1.toDouble / x._2._2.toDouble)).
collect
or with
animals23.
combineByKey(
(x:String) => (x.length,1),
(x:(Int, Int), y:String) => (x._1 + y.length, x._2 + 1),
(x:(Int, Int), y:(Int, Int)) => (x._1 + y._1, x._2 + y._2)
).
map(x => (x._1, x._2._1.toDouble / x._2._2.toDouble)).
collect
each resulting in
Array((a,3.0), (c,6.333333333333333), (d,5.0), (g,3.0), (l,5.5), (w,5.0), (s,6.0), (t,5.0))
What I do not understand: Why am I required to explicitly state the types in the functions in the second example while the first example's functions can do without?
I am talking about
(x, y) => (x._1 + y.length, x._2 + 1),
(x, y) => (x._1 + y._1, x._2 + y._2)
vs
(x:(Int, Int), y:String) => (x._1 + y.length, x._2 + 1),
(x:(Int, Int), y:(Int, Int)) => (x._1 + y._1, x._2 + y._2)
and it might be more a Scala than a Spark question.

Why am I required to explicitly state the types in the functions in
the second example while the first example's functions can do without?
Because in the first example, the compiler is able to infer the type of seqOp based on the first argument list supplied. aggregateByKey is using currying:
def aggregateByKey[U](zeroValue: U)
(seqOp: (U, V) ⇒ U,
combOp: (U, U) ⇒ U)
(implicit arg0: ClassTag[U]): RDD[(K, U)]
The way type inference works in Scala, is that the compiler is able to infer the type of the second argument list based on the first. So in the first example, it knows that that seqOp is a function ((Int, Int), String) => (Int, Int), same goes for combOp.
On the contrary, combineByKey there's only a single argument list:
combineByKey[C](createCombiner: (V) ⇒ C,
mergeValue: (C, V) ⇒ C,
mergeCombiners: (C, C) ⇒ C): RDD[(K, C)]
And without explicitly stating the types, the compiler doesn't know what to infer x and y to.
What you can do to help the compiler is to explicitly specify the type arguments:
animals23
.combineByKey[(Int, Int)](x => (x.length,1),
(x, y) => (x._1 + y.length, x._2 + 1),
(x, y) => (x._1 + y._1, x._2 + y._2))
.map(x => (x._1, x._2._1.toDouble / x._2._2.toDouble))
.collect

Sort elements as part of concatenating elements of 2 Lists

Here is a function I wrote for concatenating elements of a List using an accumulator with tail recursion :
val l1 = List(1, 2, 3) //> l1 : List[Int] = List(1, 2, 3)
val l2 = List(1, 2, 3) //> l2 : List[Int] = List(1, 2, 3)
def func(l1: List[Int], l2: List[Int], acc: List[Int]): List[Int] = {
(l1, l2) match {
case (Nil, Nil) => acc.reverse
case (h1 :: t1, h2 :: t2) => {
func(t1, t2, h1 :: h2 :: acc)
}
}
} //> func: (l1: List[Int], l2: List[Int], acc: List[Int])List[Int]
func(l1, l2, List()) //> res0: List[Int] = List(1, 1, 2, 2, 3, 3)
This is my understanding of the call order
func( 1 :: 1 :: () )
func( 2 :: 2 :: 1 :: 1 : () )
func( 3 :: 3 :: 2 :: 2 :: 1 :: 1 : () )
So the call order is the reason why I must call reverse on base call of acc so that the List is ordered in same ordering initial List elements. To try to minimize the steps required to concatenate the lists I have tried to add the elements like this :
func(t1, t2, acc :: h1 :: h2)
instead of
func(t1, t2, h1 :: h2 :: acc)
but receive compile time error :
value :: is not a member of Int
So it seems I cannot prepend these elements to this List ?

When you write x :: y, y must be a list and x the element you want to prepend.
You can use acc :+ h1 :+ h2 to append h1 and h2 to acc, but note that adding elements to the end of the list is a relatively expensive operation (linear with the length of the list).

2 FoldLefts are Equal

Why do both of the following foldLeft's result in the same output?
#1
scala> List(1,2,3).foldLeft(List[Int]())( (acc, el) => acc :+ el)
res114: List[Int] = List(1, 2, 3)
And, now using _ :+ _ as the (B, A) => B argument.
#2
scala> List(1,2,3).foldLeft(List[Int]())(_ :+ _)
res115: List[Int] = List(1, 2, 3)
In particular, the lack of explicitly appending to the accumulator in the second case confuses me.

_ :+ _ is simply a shorthand for (x1, x2) => x1 :+ x2, just as list.map(_.toString) is simply list.map(x => x.toString).
See more on the placeholder syntax here.

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Folding a list of tuples using Scala - scala

Related

How to write curried polymorphic function & its higher kind in Scala 3?

Pattern matching for list of pairs scala

When to explicitly state types of inputs of functions?

Sort elements as part of concatenating elements of 2 Lists

2 FoldLefts are Equal

Categories

Resources