Scala default Set Implementation - scala

I can see that from Scala documentation scala.collection.immutable.Set is only a trait. Which one on the Set implementation is used by default ? HashSet or TreeSet (or something else) ?
I would like to know/plan the running time of certain functions.
Example:
scala> val s = Set(1,3,6,2,7,1)
res0: scala.collection.immutable.Set[Int] = Set(1, 6, 2, 7, 3)
What would be the running time of s.find(5), O(1) or O(log(n)) ?
Since same apply for Map, what is the best way to figure this out ?

By looking at the source code, you can find that sets up to four elements have an optimized implementation provided by EmptySet, Set1, Set2, Set3 and Set4, which simply hold the single values.
For example here's Set2 declaration (as of scala 2.11.4):
class Set2[A] private[collection] (elem1: A, elem2: A) extends AbstractSet[A] with Set[A] with Serializable
And here's the contains implementation:
def contains(elem: A): Boolean =
elem == elem1 || elem == elem2
or the find implementation
override def find(f: A => Boolean): Option[A] = {
if (f(elem1)) Some(elem1)
else if (f(elem2)) Some(elem2)
else None
}
Very straightforward.
For sets with more than 4 elements, the underlying implementation is an HashSet. We can easily verify this in the REPL:
scala> Set(1, 2, 3, 4).getClass
res1: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set4
scala> Set(1, 2, 3, 4, 5, 6).getClass
res0: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.HashSet$HashTrieSet
That being said, find must always iterate over the whole HashSet, since it's unsorted, so it will be O(n).
Conversely, a lookup operation like contains will be O(1) instead.
Here's a more in-depth reference about performance of scala collections in general.
Speaking of Map, pretty much the same concepts apply. There are optimized Map implementations up to 4 elements, and then it's an HashMap.

Related

Return a generic Traversable of a specified type

I'd like to be able to generically manipulate types like T[_] <: Traversable so that I can do things like map and filter, but I'd like to defer the decision about which Traversable I select for as long as possible.
I'd like to be able write functions against a generic T[Int] that return a T[Int] not a Traversable[Int]. So for example, I'd like to apply a function to a Set[Int] or a Vector[Int] or anything that extends Traversable and get that type back.
I first attempted to do this in a simple manner like:
trait CollectionHolder[T[_] <: Traversable[_]] {
def easyLessThanTen(xs: T[Int]): T[Int] = {
xs.filter(_ < 10)
}
}
but this won't compile: Missing parameter type for expanded function. It will compile, however, if the function takes a Traversable[Int] instead of a T[Int], so thought I could work with Traversable and convert to a T. This lead me to CanBuildFrom
object DoingThingsWithTypes {
trait CollectionHolder[T[_] <: Traversable[_]] {
def lessThanTen(xs: T[Int])(implicit cbf: CanBuildFrom[Traversable[Int], Int, T[Int]]): T[Int] = {
val filteredTraversable = xs.asInstanceOf[Traversable[Int]].filter(_ < 10)
(cbf() ++= filteredTraversable).result
}
which compiles. But then in my tests:
val xs = Set(1, 2, 3, 4, 1000)
object withSet extends CollectionHolder[Set]
withSet.lessThanTen(xs) shouldBe Set(1, 2, 3, 4)
I get the following compiler error:
Cannot construct a collection of type Set[Int] with elements of type
Int based on a collection of type Traversable[Int]. not enough
arguments for method lessThanTen: (implicit cbf:
scala.collection.generic.CanBuildFrom[Traversable[Int],Int,Set[Int]])Set[Int].
Unspecified value parameter cbf.
Where can I get a CanBuildFrom to make this conversion? Or better yet, how can I modify my simpler approach for the result I want? Or do I need to use a typeclass and write an implicit implementation for each Traversable I'm interested in using (one for Set, one for Vector etc)? I'd prefer to avoid the last approach if possible.
Using the (Scala 2.12.8) standard library instead of cats/scalaz/etc. you need to look at GenericTraversableTemplate. filter isn't defined there, but can easily be:
import scala.collection.GenTraversable
import scala.collection.generic.GenericTraversableTemplate
trait CollectionHolder[T[A] <: GenTraversable[A] with GenericTraversableTemplate[A, T]] {
def lessThanTen(xs: T[Int]): T[Int] = {
filter(xs)(_ < 10)
}
def filter[A](xs: T[A])(pred: A => Boolean) = {
val builder = xs.genericBuilder[A]
xs.foreach(x => if (pred(x)) { builder += x })
builder.result()
}
}
In the comment you mention nonEmpty and exists; they are available because of the GenTraversable type bound. Really filter is too, the problem is that it returns GenTraversable[A] instead of T[A].
Scala 2.13 reworks collections so the methods will probably be slightly different there, but I haven't looked enough at it yet.
Also: T[_] <: Traversable[_] is likely not what you want as opposed to T[A] <: Traversable[A]; e.g. the first constraint is not violated if you have T[Int] <: Traversable[String].
Yes, I am saying that you should use typeclasses.
But, you do not have to implement them, nor provide their instances for the types you need. As, those are very common and can be found in libraries like cats or scalaz.
For example, using cats:
import cats.{Traverse, TraverseFilter}
import cats.syntax.all._ // Provides the nonEmpty, filter & map extension methods to C.
import scala.language.higherKinds
def algorithm[C[_]: TraverseFilter: Traverse](col: C[Int]): C[Int] =
if (col.nonEmpty)
col.filter(x => x < 10)
else
col.map(x => x * 2) // nonsense, but just to show that you can use map too.
Which you can use like this:
import cats.instances.list._
algorithm(List(1, 200, 3, 100))
// res: List[Int] = List(1, 3)
It may be worth adding, that there are a lot of other methods like exists, foldLeft, size, etc.
Take a look to the documentation. And if it is your first time using either cats or scalaz or those concepts in general, you may find scala-with-cats very instructive.

Merging two iterables in Scala

I'd like to write a merge method that takes two iterables and merges them together. (maybe merge is not the best word to describe what I want, but for the sake of this question it's irrelevant). I'd like this method be generic to work with different concrete iterables.
For example, merge(Set(1,2), Set(2,3)) should return Set(1,2,3) and
merge(List(1,2), List(2,3)) should return List(1, 2, 2, 3). I've done the following naive attempt, but the compiler is complaining about the type of res: It is Iterable[Any] instead of A.
def merge[A <: Iterable[_]](first: A, second: A): A = {
val res = first ++ second
res
}
How can I fix this compile error? (I'm more interested in understanding how to implement such a functionality, rather than a library that does it for me, so explanation of why my code does not work is very appreciated.)
Let's start off with why your code didn't work. First off you're accidentally using the abbreviated syntax for an existential type, rather than actually using a type bound on a higher kinded type.
// What you wrote is equivalent to this
def merge[A <: Iterable[T] forSome {type T}](first: A, second: A): A
Even fixing it though doesn't quite get you what you want.
def merge[A, S[T] <: Iterable[T]](first: S[A], second: S[A]): S[A] = {
first ++ second // CanBuildFrom errors :(
}
This is because ++ doesn't use type bounds to achieve its polymorphism, it uses an implicit CanBuildFrom[From, Elem, To]. CanBuildFrom is responsible for giving an appropriate Builder[Elem, To], which is a mutable buffer which we use to build up the collection of our desired type.
So that means we're going to have to give it the CanBuildFrom it so desires and everything'll work right?
import collection.generic.CanBuildFrom
// Cannot construct a collection of type S[A] with elements of type A
// based on a collection of type Iterable[A]
merge0[A, S[T] <: Iterable[T], That](x: S[A], y: S[A])
(implicit bf: CanBuildFrom[S[A], A, S[A]]): S[A] = x.++[A, S[A]](y)
Nope :(.
I've added the extra type annotations to ++ to make the compiler error more relevant. What this is telling us is that because we haven't specifically overridden Iterable's ++ with our own for our arbitrary S, we're using Iterable's implementation of it, which just so happens to take an implicit CanBuildFrom that builds from Iterable's to our S.
This is incidentally the problem #ChrisMartin was running into (and this whole thing really is a long-winded comment to his answer).
Unfortunately Scala does not offer such a CanBuildFrom, so it looks like we're gonna have to use CanBuildFrom manually.
So down the rabbit hole we go...
Let's start off by noticing that ++ is in fact actually defined originally in TraversableLike and so we can make our custom merge a bit more general.
def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
(implicit bf: CanBuildFrom[S[A], A, That]): That = ???
Now let's actually implement that signature.
import collection.mutable.Builder
def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
(implicit bf: CanBuildFrom[S[A], A, That]): That= {
// Getting our mutable buffer from CanBuildFrom
val builder: Builder[A, That] = bf()
builder ++= it
builder ++= that
builder.result()
}
Note that I've changed GenTraversableOnce[B]* to TraversableOnce[B]**. This is because the only way to make Builder's ++= work is to have sequential access***. And that's there is all to CanBuildFrom. It gives you a mutable buffer that you fill with all values you want, then you convert the buffer into whatever your desired output collection is with result.
scala> merge(List(1, 2, 3), List(2, 3, 4))
res0: List[Int] = List(1, 2, 3, 2, 3, 4)
scala> merge(Set(1, 2, 3), Set(2, 3, 4))
res1: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
scala> merge(List(1, 2, 3), Set(1, 2, 3))
res2: List[Int] = List(1, 2, 3, 1, 2, 3)
scala> merge(Set(1, 2, 3), List(1, 2, 3)) // Not the same behavior :(
res3: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
In short, the CanBuildFrom machinery lets you build code that deals with the fact that we often wish to automatically convert between different branches of the inheritance graph of Scala's collections, but it comes at the cost of some complexity and occasionally unintuitive behavior. Weigh the tradeoffs accordingly.
Footnotes:
* "Generalized" collections for which we can "Traverse" at least "Once", but maybe not more, in some order which may or may not be sequential, e.g. perhaps parallel.
** Same thing as GenTraversableOnce except not "General" because it guarantees sequential access.
*** TraversableLike gets around this by forcibly calling seq on the GenTraversableOnce internally, but I feel like that's cheating people out of parallelism when they might have otherwise expected it. Force callers to decide whether they want to give up their parallelism; don't do it invisibly for them.
Preliminarily, here are the imports needed for all of the code in this answer:
import collection.GenTraversableOnce
import collection.generic.CanBuildFrom
Start by looking at the API doc to see the method signature for Iterable.++ (Note that the API docs for most collections are wrong, and you need to click "Full Signature" to see the real type):
def ++[B >: A, That](that: GenTraversableOnce[B])
(implicit bf: CanBuildFrom[Iterable[A], B, That]): That
From there you can just do a straightforward translation from an instance method to a function:
def merge[A, B >: A, That](it: Iterable[A], that: GenTraversableOnce[B])
(implicit bf: CanBuildFrom[Iterable[A], B, That]): That = it ++ that
Breaking this down:
[A, B >: A, That] —
Iterable has one type parameter A, and ++ has two type parameters B and That, so the resulting function has all three type parameters A, B, and That
it: Iterable[A] — The method belongs to Iterable[A], so we made that the first value parameter
that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Iterable[A], B, That]): That — the remaining parameter and type constraint copied directly from the signature of ++

In scala, is it possible to have a two-elements only Set?

In the end, I want to have a case class Swap so that Swap(a, b) == Swap(b, a).
I thought I could use a Set of two elements, and it quite does the job :
scala> case class Swap(val s:Set[Int])
defined class Swap
scala> Swap(Set(2, 1)) == Swap(Set(1, 2))
res0: Boolean = true
But this allows for any number of elements, and I would like to limit my elements to two. I found the class Set.Set2, which is the default implementation for an immutable Set with two elements, but it doesn't work the way I tried, or variations of it :
scala> val a = Set(2, 1)
a: scala.collection.immutable.Set[Int] = Set(2, 1)
scala> a.getClass
res3: Class[_ <: scala.collection.immutable.Set[Int]] = class scala.collection.immutable.Set$Set2
scala> case class Swap(val s:Set.Set2[Int])
defined class Swap
scala> val swp = Swap(a)
<console>:10: error: type mismatch;
found : scala.collection.immutable.Set[Int]
required: Set.Set2[Int]
val swp = Swap(a)
^
So my questions are :
is there a way to use Set2 as I try ?
is there a better way to implement my case class Swap ? I read that one shouldn't override equals in a case class, though it was my first idea.
This is a generic implementation -
import scala.collection.immutable.Set.Set2
def set2[T](a: T, b: T): Set2[T] = Set(a, b).asInstanceOf[Set2[T]]
case class Swap[T](s: Set2[T])
Swap(set2(1,2)) == Swap(set2(2,1)) //true
The reason that your solution didn't work is because of the signature
Set(elems: A*): Set
In case of 2 elements the concrete type will be Set2 but the compiler doesn't know that so you have to cast it to Set2
You can always hide the implementation details of Swap, in this case you actually should.
You could implement it using Set or you could implement it as:
// invariant a <= b
class Swap private (val a: Int, val b: Int)
object Swap {
def apply(a: Int, b: Int): Swap =
if (a <= b) new Swap(a, b) else new Swap(b, a)
}
Unfortunately you have to use class here and reimplement equals hashCode etc yourself, as we cannot get rid of scalac auto-generated apply: related SO Q/A
And make all functions on Swap maintain that invariant.
Then equals comparion is essentially this.a == other.a && this.b == other.b, we don't need to care about swapping anymore.
The problem is that you don't know statically that a is a Set2 - as far as the compiler is concerned you called Set(as: A*) and got back some kind of Set.
You could use shapeless sized collections to enforce a statically known collection size.

What is the Scala syntax for summing a List of objects?

For example
case class Blah(security: String, price: Double)
val myList = List(Blah("a", 2.0), Blah("b", 4.0))
val sum = myList.sum(_.price) // does not work
What is the syntax for obtaining the sum?
Try this:
val sum = myList.map(_.price).sum
Or alternately:
val sum = myList.foldLeft(0.0)(_ + _.price)
You appear to be trying to use this method:
def sum [B >: A] (implicit num: Numeric[B]): B
and the compiler can't figure out how the function you're providing is an instance of Numeric, because it isn't.
Scalaz has this method under a name foldMap. The signature is:
def M[A].foldMap[B](f: A => B)(implicit f: Foldable[M], m: Monoid[B]): B
Usage:
scala> case class Blah(security: String, price: Double)
defined class Blah
scala> val myList = List(Blah("a", 2.0), Blah("b", 4.0))
myList: List[Blah] = List(Blah(a,2.0), Blah(b,4.0))
scala> myList.foldMap(_.price)
res11: Double = 6.0
B here doesn't have to be a numeric type. It can be any monoid. Example:
scala> myList.foldMap(_.security)
res12: String = ab
As an alternative to missingfaktor's Scalaz example, if you really want to sum a list of objects (as opposed to mapping each of them to a number and then summing those numbers), scalaz supports this as well.
This depends on the class in question having an instance of Monoid defined for it (which in practice means that it must have a Zero and a Semigroup defined). The monoid can be considered a weaker generalisation of core scala's Numeric trait specifically for summing; after all, if you can define a zero element and a way to add/combine two elements, then you have everything you need to get the sum of multiple objects.
Scalaz' logic is exactly the same as the way you'd sum integers manually - list.foldLeft(0) { _ + _ } - except that the Zero provides the initial zero element, and the Semigroup provides the implementation of + (called append).
It might look something like this:
import scalaz._
import Scalaz._
// Define Monoid for Blah
object Blah {
implicit def zero4Blah: Zero[Blah] = zero(Blah("", 0))
implicit def semigroup4Blah: Semigroup[Blah] = semigroup { (a, b) =>
// Decide how to combine security names - just append them here
Blah(a.security + b.security, a.price + b.price)
}
}
// Now later in your class
val myList = List(Blah("a", 2.0), Blah("b", 4.0))
val mySum = myList.asMA.sum
In this case mySum will actually be an instance of Blah equal to Blah("ab", 6.0), rather than just being a Double.
OK, for this particular example you don't really gain that much because getting a "sum" of the security names isn't very useful. But for other classes (e.g. if you had a quantity as well as a price, or multiple relevant properties) this can be very useful. Fundamentally it's great that if you can define some way of adding two instances of your class together, you can tell scalaz about it (by defining a Semigroup); and if you can define a zero element too, you can use that definition to easily sum collections of your class.

Convert List of Ints to a SortedSet in Scala

If I have a List of Ints like:
val myList = List(3,2,1,9)
what is the right/preferred way to create a SortedSet from a List or Seq of Ints, where the items are sorted from least to greatest?
If you held a gun to my head I would have said:
val itsSorted = collection.SortedSet(myList)
but I get an error regarding that there is no implicit ordering defined for List[Int].
Use:
collection.SortedSet(myList: _*)
The way you used it, the compiler thinks you want to create a SortedSet[List[Int]] not a SortedSet[Int]. That's why it complains about no implicit Ordering for List[Int].
Notice the repeated parameter of type A* in the signature of the method:
def apply [A] (elems: A*)(implicit ord: Ordering[A]): SortedSet[A]
To treat myList as a sequence argument of A use, the _* type annotation.
You could also take advantage of the CanBuildFrom instance, and do this:
val myList = List(3,2,1,9)
myList.to[SortedSet]
// scala.collection.immutable.SortedSet[Int] = TreeSet(1, 2, 3, 9)
There doesn't seem to be a constructor that directly accepts List (correct me if I'm wrong). But you can easily write
val myList = List(3,2,1,9)
val itsSorted = collection.SortedSet.empty[Int] ++ myList
to the same effect. (See http://www.scala-lang.org/docu/files/collections-api/collections_20.html.)
This is especially useful if you have to map anyway:
import scala.collection.breakOut
val s: collection.SortedSet[Int] = List(1,2,3,4).map(identity)(breakOut)
//--> s: scala.collection.SortedSet[Int] = TreeSet(1, 2, 3, 4)