SortedSet fold type mismatch - scala

I have this code:
def distinct(seq: Seq[Int]): Seq[Int] =
seq.fold(SortedSet[Int]()) ((acc, i) => acc + i)
I want to iterate over seq, delete duplicates (keep the first number) and keep order of the numbers. My idea was to use a SortedSet as an acc.
But I am getting:
Type mismatch:
Required: String
Found: Any
How to solve this? (I also don't know how to convert SortedSet to Seq in the final iteration as I want distinct to return seq)
p.s. without using standard seq distinct method
Online code

You shouldn't use fold if you try to accumulate something with different type than container (SortedSet != Int) in your case. Look at signature fold:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
it takes accumulator with type A1 and combiner function (A1, A1) => A1 which combines two A1 elements.
In your case is better to use foldLeft which takes accumulator with different type than container:
def foldLeft[B](z: B)(op: (B, A) => B): B
it accumulates some B value using seed z and combiner from B and A to B.
In your case I would like to use LinkedHashSet it keeps the order of added elements and remove duplicates, look:
import scala.collection.mutable
def distinct(seq: Seq[Int]): Seq[Int] = {
seq.foldLeft(mutable.LinkedHashSet.empty[Int])(_ + _).toSeq
}
distinct(Seq(7, 2, 4, 2, 3, 0)) // ArrayBuffer(7, 2, 4, 3, 0)
distinct(Seq(0, 0, 0, 0)) // ArrayBuffer(0)
distinct(Seq(1, 5, 2, 7)) // ArrayBuffer(1, 5, 2, 7)
and after folding just use toSeq
be careful, lambda _ + _ is just syntactic sugar for combiner:
(linkedSet, nextElement) => linkedSet + nextElement

I would just call distinct on your Seq. You can see in the source-code of SeqLike, that distinct will just traverse the Seq und skip already seen data:
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result
}

Related

fold left operation in Scala?

I am having difficulty understanding how fold left works in Scala.
The following code computes for each unique character in the list chars the number of
times it occurs. For example, the invocation
times(List('a', 'b', 'a'))
should return the following (the order of the resulting list is not important):
List(('a', 2), ('b', 1))
def times(chars: List[Char]): List[(Char, Int)] = {
def incr(acc: Map[Char,Int], c: Char) = {
val count = (acc get c).getOrElse(0) + 1
acc + ((c, count));
}
val map = Map[Char, Int]()
(map /: chars)(incr).iterator.toList
}
I am just confused as to what the last line of this function is actually doing?
Any help wpuld be great.
Thanks.
foldLeft in scala works like this:
suppose you have a list of integers,
val nums = List(2, 3, 4, 5, 6, 7, 8, 9, 10)
val res= nums.foldLeft(0)((m: Int, n: Int) => m + n)
you will get res=55.
lets visualise it.
val res1 = nums.foldLeft(0) { (m: Int, n: Int) => println("m: " + m + " n: " + n);
m + n }
m: 0 n: 1
m: 1 n: 2
m: 3 n: 3
m: 6 n: 4
m: 10 n: 5
m: 15 n: 6
m: 21 n: 7
m: 28 n: 8
m: 36 n: 9
m: 45 n: 10
so, we can see that we need to pass initial accumulator value in foldLeft argument. And accumulated value is stored in 'm' and next value we get in 'n'.
And finally we get the accumulator as result.
Let's start from the "last line" which you are asking about: as the Map trait extends Iterable which in turn extends Traversable where the operator /: is explained, the code (map /: chars)(incr) does fold-left over chars, with the initial value of the accumulator being the empty mapping from characters to integers, applying incr to each intermediate value of acc and each element c of chars.
For example, when chars is List('a', 'b', 'a', 'c'), the fold-left expression (map /: chars)(incr) equals incr(incr(incr(incr(Map[Char, Int](), 'a'), 'b'), 'a'), 'c').
Now, as for what incr does: it takes an intermediate mapping acc from characters to integers, along with a character c, and increments by 1 the integer corresponding to c in the mapping. (Strictly speaking, the mapping is immutable and therefore never mutated: instead, a new, updated mapping is created and returned. Also, getOrElse(0) says that, if c does not exist in acc, the integer to be incremented is considered 0.)
As a whole, given List('a', 'b', 'a', 'c') as chars for example, the final mapping would be List(('a', 2), ('b', 1), ('c', 1)) when converted to a list by toList.
I rewrote your function in a more verbose way:
def times(chars: List[Char]): List[(Char, Int)] = {
chars
.foldLeft(Map[Char, Int]()){ (acc, c) =>
acc + ((c, acc.getOrElse(c, 0) + 1))
}
.toList
}
Let's see the first steps on times("aba".toList)
First invocation:
(Map(), 'a') => Map() ++ Map(`a` -> 1)
Second invocation:
(Map(`a` -> 1), `b`) => Map('a' -> 1) ++ Map('b' ->1)
Third invocation:
(Map('a' -> 1, 'b' ->1), 'a') =>
Map('a' -> 1, 'b' ->1) ++ Map('a' -> 2) =>
Map('a' -> 2, 'b' ->1)
The actual implementation in the scala codebase is very concise:
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
Let me rename stuff for clarity:
def foldLeft[B](initialValue: B)(f: (B, A) => B): B = {
//Notice that both accumulator and collectionCopy are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var collectionCopy = this //the function is inside a collection class, so **this** is the collection
while (!collectionCopy.isEmpty) {
accumulator = f(accumulator , collection.head)
collectionCopy = these.tail
}
accumulator
}
Edit after comment:
Let us revisit now the the OPs function and rewrite it in an imperative manner (i.e. non-functional, which apparently is the source of confusion):
(map /: chars)(incr) is be exactly equivalent to chars.foldLeft(map)(incr), which can be imperatively rewritten as:
def foldLeft(initialValue: Map[Char,Int])(incrFunction: (Map[Char,Int], Char) => Map[Char,Int]): Map[Char,Int] = {
//Notice that both accumulator and charList are `var`s! They are reassigned each time in the loop.
var accumulator = initialValue
//create a copy of the collection
var charList: List[Char] = this //the function is inside a collection class, so **this** is the collection
while (!charList.isEmpty) {
accumulator = incrFunction(accumulator , collection.head)
charList = these.tail
}
accumulator
}
I hope this makes the concept of foldLeft clearer.
So it is essentially an abstraction over an imperative while loop, that accumulates some value by traversing the collection and updating the accumulator. The accumulator is updated using a user-provided function that takes the previous value of the accumulator and the current item of the collection.
Its very description hints that it is a great tool to compute all sorts of aggregates on a collection, like sum, max etc. Yeah, scala collections actually provide all these functions, but they serve as a good example use case.
On the specifics of your question, let me point out that this can be easily done using groupBy:
def times(l: List[Char]) = l.groupBy(c => c).mapValues(_.size).toList
times(List('a','b','a')) // outputs List[(Char, Int)] = List((b,1), (a,2))
.groupBy(c => c) gives you Map[Char,List[Char]] = Map(b -> List(b), a -> List(a, a))
Then we use .mapValues(_.size) to map the values of the map to the size of the grouped sub-collections: Map[Char,Int] = Map(b -> 1, a -> 2).
Finally, you convert the map to a list of key-value tuples with .toList to get the final result.
Lastly, if you don't care about the order of the output list as you said, then leaving the output as a Map[Char,Int] conveys better this decision (instead of converting it to a list).

How does map() on 'zipped' Lists work?

I am looking to calculate the scalar product of two lists. Let's say we have two Lists, l1 = List(1,2,3) and l2 = List(4,5,6), the result should be List(4,10,18)
The code below works:
def scalarProduct(l1 : List[Int], l2 : List[Int]):List[Int] = {
val l3 = l1 zip(l2); l3 map(xy => xy._1*xy._2)
}
However, the following fails to compile, and says Cannot resolve reference map with such signature :
def scalarProduct(l1 : List[Int], l2 : List[Int]):List[Int] = {
val l3 = l1 zip(l2); l3 map((x:Int,y:Int) => x*y)
}
This zip() would return a list of Int pairs, and the above map is also taking a function which takes an Int pair.
Could someone point out why does the second variant fail in this case?
Your second example fails because you provide a function with 2 parameters to the map, while map takes a function with 1 parameter.
Have a look, here's a (simplified) signature of the map function:
def map[B, That](f: A => B): That
The function f is the one that you have to pass to do the conversion. As you can see, it has type A => B, i.e. accept a single parameter.
Now take a look at the (simplified) zip function signature:
def zip [B](that : List[B]) : List[(A, B)]
It actually produces a list whose members are tuples. Tuple of 2 elements looks like this: (A, B). When you call map on the list of tuples, you have to provide the function f that takes a tuple of 2 elements as a parameter, exactly like you do in your first example.
Since it's inconvenient to work with tuples directly, you could extract values of tuple's members to a separate variables using pattern matching.
Here's an REPL session to illustrate this.
scala> List(1, 2, 3)
res0: List[Int] = List(1, 2, 3)
scala> List(2, 3, 4)
res1: List[Int] = List(2, 3, 4)
scala> res0 zip res1
res2: List[(Int, Int)] = List((1,2), (2,3), (3,4))
Here's how you do a standard tuple values extraction with pattern matching:
scala> res2.map(t => t match {
| case (x, y) => x * y
| })
res3: List[Int] = List(2, 6, 12)
It's important to note here that pattern matching expects a partial function as an argument. I.e. the following expression is actually a partial function:
{
case (x, y) => x * y
}
The partial function has its own type in Scala: trait PartialFunction[-A, +B] extends (A) => B, and you could read more about it, for example, here.
Partial function is a normal function, since it extends (A) => B, and that's why you can pass a partial function to the map call:
scala> res2.map { case (x, y) => x * y }
res4: List[Int] = List(2, 6, 12)
You actually use special Scala syntax here, that allows for functions invocations (map in our case) without parentheses around its parameters. You can alternatively write this with parentheses as follows:
scala> res2.map ({ case (x, y) => x * y })
res5: List[Int] = List(2, 6, 12)
There's no difference between the 2 last calls at all.
The fact that you don't have to declare a parameter of anonymous function you pass to the map before you do pattern matching on it, is actually Scala's syntactic sugar. When you call res2.map { case (x, y) => x * y }, what's really going on is pattern matching with partial function.
Hope this helps.
you need something like:
def scalarProduct(l1 : List[Int], l2 : List[Int]):List[Int] = {
val l3 = l1 zip(l2); l3 map{ case (x:Int,y:Int) => x*y}
}
You can have a look at this link to help you with this type of problems.

Parallel Aggregates in Scala using associative operators

I want to perform aggregate of a list of values in Scala. Here are few considerations:
aggregate function [1] is associative as well as commutative: examples are plus and multiply
this function is applied to the list in parallel so as to utilize all the cores of CPU
Here is an implementation:
package com.example.reactive
import scala.concurrent.Future
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
object AggregateParallel {
private def pm[T](l: List[Future[T]])(zero: T)(fn: (T, T) => T): Future[T] = {
val l1 = l.grouped(2)
val l2 = l1.map { sl =>
sl match {
case x :: Nil => x
case x :: y :: Nil =>
for (a <- x; b <- y) yield fn(a, b)
case _ => Future(zero)
}
}.toList
l2 match {
case x :: Nil => x
case x :: xs => pm(l2)(zero)(fn)
case Nil => Future(zero)
}
}
def parallelAggregate[T](l: List[T])(zero: T)(fn: (T, T) => T): T = {
val n = pm(l.map(Future(_)))(zero)(fn)
Await.result(n, 1000 millis)
n.value.get.get
}
def main(args: Array[String]) {
// multiply empty list: zero value is 1
println(parallelAggregate(List[Int]())(1)((x, y) => x * y))
// multiply a list: zero value is 1
println(parallelAggregate(List(1, 2, 3, 4, 5))(1)((x, y) => x * y))
// sum a list: zero value is 0
println(parallelAggregate(List(1, 2, 3, 4, 5))(0)((x, y) => x + y))
// sum a list: zero value is 0
val bigList1 = List(1, 2, 3, 4, 5).map(BigInt(_))
println(parallelAggregate(bigList1)(0)((x, y) => x + y))
// sum a list of BigInt: zero value is 0
val bigList2 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList2)(0)((x, y) => x + y))
// multiply a list of BigInt: zero value is 1
val bigList3 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList3)(1)((x, y) => x * y))
}
}
OUTPUT:
1
120
15
15
5050
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
How else can I achieve the same objective or improve this code in Scala?
EDIT1:
I have implemented bottom up aggregate. I think I am quite close to the aggregate method in Scala ( below). The difference being that I am only splitting into sub lists of two elements:
Scala implementation:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
With this implementation I assume that the aggregate happens in parallel like so:
List(1,2,3,4,5,6)
-> split parallel -> List(List(1,2), List(3,4), List(5,6) )
-> execute in parallel -> List( 3, 7, 11 )
-> split parallel -> List(List(3,7), List(11) )
-> execute in parallel -> List( 10, 11)
-> Result is 21
Is that correct to assume that Scala aggregate is also doing bottom-up aggregates in parallel?
[1] http://www.mathsisfun.com/associative-commutative-distributive.html
The parallel lists of scala already have an aggregate method that do you just what you're asking for:
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
It works like foldLeft but takes an extra parameter:
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
When called on a parallel collection aggregate splits the collection in N parts, uses foldLeft parrallely on each parts, and uses combop to unit all the results.
But when called on a non parallel collection aggregate just works like foldLeft and ignores the combop.
To have consistent results you need associative and commutative operators because you don't control how the list will be split at all.
A short example:
List(1, 2, 3, 4, 5).par.aggregate(1)(_ * _, _ * _)
res0: Int = 120
Answer to Edit1 (improved according to comment):
I don't think it's the right approach, for a N items list you'll create N Futures. Which creates a big overhead in scheduling. Unless the seqop is really long, I'll avoid creating a Future each time you are calling it.

Best way to compute a function on each element with incremental shadowing of element in these collection

What is the best way to compute a function on each element in a collection with an incremental shadowing of each element, like this simple example :
val v = IndexedSeq(1,2,3,4)
v.shadowMap{ e => e + 1}
shadow 1: (3,4,5)
shadow 2: (2,4,5)
shadow 3: (2,3,5)
shadow 4: (2,3,4)
I think first to patch or slice to make this, but perhaps there is a more better way to do that in pure functional style ?
Thanks
Sr.
def shadowMap[A,B](xs: Seq[A])(f: A => B) = {
val ys = xs map f
for (i <- ys.indices; (as, bs) = ys splitAt i) yield as ++ bs.tail
}
You can define it like this:
class ShadowMapSeq[A, Repr <: Seq[A]](seq: SeqLike[A, Repr]) {
def shadowMap[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): Iterator[That] = {
seq.indices.iterator.map { i =>
val b = bf(seq.asInstanceOf[Repr])
b.sizeHint(seq.size - 1)
b ++= (seq.take(i) ++ seq.drop(i + 1)).map(f)
b.result
}
}
}
implicit def shadowMapSeq[A, Repr <: Seq[A]](seq: SeqLike[A, Repr]) = new ShadowMapSeq(seq)
And then use it like this:
scala> val v = IndexedSeq(1, 2, 3, 4)
scala> val results = v.shadowMap(_ + 1)
scala> results foreach println
Vector(3, 4, 5)
Vector(2, 4, 5)
Vector(2, 3, 5)
Vector(2, 3, 4)
If by "pure functional style" you mean something like "without making reference to indices" (since you say you want to avoid patch and slice), you can do this pretty elegantly with zippers. For example, here's how to write it with Scalaz's zipper implementation (this is just a demonstration—if you want to wrap it up more nicely you can use the approach dhg gives in his answer):
import scalaz._, Scalaz._
List(1, 2, 3, 4).map(_ + 1).toZipper.map(
_.positions.map(p => (p.lefts.reverse ++ p.rights).toList).toStream
).flatten
In general you're probably better off going with dhg's solution in real code, but the zipper is a handy data structure to know about, and it's a good fit for this problem.

What's the relation of fold on Option, Either etc and fold on Traversable?

Scalaz provides a method named fold for various ADTs such as Boolean, Option[_], Validation[_, _], Either[_, _] etc. This method basically takes functions corresponding to all possible cases for that given ADT. In other words, a pattern match shown below:
x match {
case Case1(a, b, c) => f(a, b, c)
case Case2(a, b) => g(a, b)
.
.
case CaseN => z
}
is equivalent to:
x.fold(f, g, ..., z)
Some examples:
scala> (9 == 8).fold("foo", "bar")
res0: java.lang.String = bar
scala> 5.some.fold(2 *, 2)
res1: Int = 10
scala> 5.left[String].fold(2 +, "[" +)
res2: Any = 7
scala> 5.fail[String].fold(2 +, "[" +)
res6: Any = 7
At the same time, there is an operation with the same name for the Traversable[_] types, which traverses over the collection performing certain operation on its elements, and accumulating the result value. For example,
scala> List(2, 90, 11).foldLeft("Contents: ")(_ + _.toString + " ")
res9: java.lang.String = "Contents: 2 90 11 "
scala> List(2, 90, 11).fold(0)(_ + _)
res10: Int = 103
scala> List(2, 90, 11).fold(1)(_ * _)
res11: Int = 1980
Why are these two operations identified with the same name - fold/catamorphism? I fail to see any similarities/relation between the two. What am I missing?
I think the problem you are having is that you see these things based on their implementation, not their types. Consider this simple representation of types:
List[A] = Nil
| Cons head: A tail: List[A]
Option[A] = None
| Some el: A
Now, let's consider Option's fold:
fold[B] = (noneCase: => B, someCase: A => B) => B
So, on Option, it reduces every possible case to some value in B, and return that. Now, let's see the same thing for List:
fold[B] = (nilCase: => B, consCase: (A, List[A]) => B) => B
Note, however, that we have a recursive call there, on List[A]. We have to fold that somehow, but we know fold[B] on a List[A] will always return B, so we can rewrite it like this:
fold[B] = (nilCase: => B, consCase: (A, B) => B) => B
In other words, we replaced List[A] by B, because folding it will always return a B, given the type signature of fold. Now, let's see Scala's (use case) type signature for foldRight:
foldRight[B](z: B)(f: (A, B) ⇒ B): B
Say, does that remind you of something?
If you think of "folding" as "condensing all the values in a container through an operation, with a seed value", and you think of an Option as a container that can can have at most one value, then this starts to make sense.
In fact, foldLeft has the same signature and gives you exactly the same results if you use it on an empty list vs None, and on a list with only one element vs Some:
scala> val opt : Option[Int] = Some(10)
opt: Option[Int] = Some(10)
scala> val lst : List[Int] = List(10)
lst: List[Int] = List(10)
scala> opt.foldLeft(1)((a, b) => a + b)
res11: Int = 11
scala> lst.foldLeft(1)((a, b) => a + b)
res12: Int = 11
fold is also defined on both List and Option in the Scala standard library, with the same signature (I believe they both inherit it from a trait, in fact). And again, you get the same results on a singleton list as on Some:
scala> opt.fold(1)((a, b) => a * b)
res25: Int = 10
scala> lst.fold(1)((a, b) => a * b)
res26: Int = 10
I'm not 100% sure about the fold from Scalaz on Option/Either/etc, you raise a good point there. It seems to have quite a different signature and operation from the "folding" I'm used to.