Performance and usage comparison of different collection types - scala

I have been programming in Scala for a few months now. I'm still confused by the many different collections there are.
Is there a page/article somewhere that shows what each type is best suitable for?
The problem with Scala is that it has too many different types, then you have something like Array which maps directly to a Java array, then you have something like "Set" which is actually a "trait" but you can use it like a normal class even though my understanding is that a trait is like an interface. The documentation says "to implement a concrete set, you need to define the following methods: ..." but actually I can use it just fine.
The whole thing is really confusing to me. Coming from C#/.NET, things there were quite clear and I didn't have the odd types like "LinkedHashMap" and "LinkedHashSet".

Use the trait (interface) Seq (ordered list), Map (key value), Set, IndexedSeq, Array (for Java primitives) types and let the compiler choose the implementation. If you look at the source you will see a companion object for each. This uses a factory to find an implementation for you.
This page helped me.
http://docs.scala-lang.org/overviews/collections/overview.html
The section on Concrete collections goes into the implementations.
Seq companion object:
object Seq extends SeqFactory[Seq] {
/** $genericCanBuildFromInfo */
implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Seq[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]
def newBuilder[A]: Builder[A, Seq[A]] = immutable.Seq.newBuilder[A]
}
https://github.com/scala/scala/blob/v2.10.3/src/library/scala/collection/Seq.scala#L1
Look at the factory source to see how apply initializes the collection.
abstract class GenericCompanion[+CC[X] <: GenTraversable[X]] {
/** The underlying collection type with unknown element type */
type Coll = CC[_]
/** The default builder for `$Coll` objects.
* #tparam A the type of the ${coll}'s elements
*/
def newBuilder[A]: Builder[A, CC[A]]
/** An empty collection of type `$Coll[A]`
* #tparam A the type of the ${coll}'s elements
*/
def empty[A]: CC[A] = newBuilder[A].result
/** Creates a $coll with the specified elements.
* #tparam A the type of the ${coll}'s elements
* #param elems the elements of the created $coll
* #return a new $coll with elements `elems`
*/
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result
}
}
}
https://github.com/scala/scala/blob/v2.10.3/src/library/scala/collection/generic/GenericCompanion.scala#L1
Update:
I usually use Array for a mutable indexed collections type since it is easier to type, but Vector for immutable. The Scala style encourages using the immutable collections since making a new "copy" of an immutable data structure is performant because of the underlying implementation being done with Hash array mapped trie structures. http://en.wikipedia.org/wiki/Hash_array_mapped_trie

I have been programming in Scala for a few months now. I'm still
confused by the many different collections there are.
You probably only need only some time to get used to things. Scala wouldn't be Scala, if it would look exactly like another language, right? :) Every programming language has its strengths and weaknesses.
Is there a page/article somewhere that shows what each type is best
suitable for?
You actually need a general article about data structures. E.g. if you need to store data in a collection, and need to access them quickly, without the need to modify the collection, than an array is suitable. Arrays, lists, sets, maps etc. are data structures which basically behave the same way in every language. Differences are in terms of syntax.
The problem with Scala is that it has too many different types, then
you have something like Array which maps directly to a Java array,
then you have something like "Set" which is actually a "trait" but you
can use it like a normal class even though my understanding is that a
trait is like an interface. The documentation says "to implement a
concrete set, you need to define the following methods: ..." but
actually I can use it just fine.
You should look at the link #sam already posted: http://docs.scala-lang.org/overviews/collections/overview.html Here's another: http://twitter.github.io/scala_school/collections.html

Related

Quick Documentation For Scala Apply Constructor Pattern in IntelliJ IDE

I am wondering if there is a way to get the quick documentation in IntelliJ to work for the class construction pattern many scala developers use below.
SomeClass(Param1,Parma2)
instead of
new SomeClass(param1,Param2)
The direct constructor call made with new obviously works but many scala devs use apply to construct objects. When that pattern is used the Intelij documentation look up fails to find any information on the class.
I don't know if there are documents in IntelliJ per se. However, the pattern is fairly easy to explain.
There's a pattern in Java code for having static factory methods (this is a specialization of the Gang of Four Factory Method Pattern), often along the lines of (translated to Scala-ish):
object Foo {
def barInstance(args...): Bar = ???
}
The main benefit of doing this is that the factory controls object instantiation, in particular:
the particular runtime class to instantiate, possibly based on the arguments to the factory. For example, the generic immutable collections in Scala have factory methods which may create optimized small collections if they're created with a sufficiently small amount of contents. An example of this is a sequence of length 1 can be implemented with basically no overhead with a single field referring to the object and a lookup that checks if the offset is 0 and either throws or returns its sole field.
whether an instance is created. One can cache arguments to the factory and memoize or "hashcons" the created objects, or precreate the most common instances and hand them out repeatedly.
A further benefit is that the factory is a function, while new is an operator, which allows the factory to be passed around:
class Foo(x: Int)
object Foo {
def instance(x: Int) = new Foo(x)
}
Seq(1, 2, 3).map(x => Foo(x)) // results in Seq(Foo(1), Foo(2), Foo(3))
In Scala, this is combined with the fact that the language allows any object which defines an apply method to be used syntactically as a function (even if it doesn't extend Function, which would allow the object to be passed around as if it's a function) and with the "companion object" to a class (which incorporates the things that in Java would be static in the class) to get something like:
class Foo(constructor_args...)
object Foo {
def apply(args...): Foo = ???
}
Which can be used like:
Foo(...)
For a case class, the Scala compiler automatically generates a companion object with certain behaviors, one of which is an apply with the same arguments as the constructor (other behaviors include contract-obeying hashCode and equals as well as an unapply method to allow for pattern matching).

Scala type alias with companion object

I'm a relatively new Scala user and I wanted to get an opinion on the current design of my code.
I have a few classes that are all represented as fixed length Vector[Byte] (ultimately they are used in a learning algorithm that requires a byte string), say A, B and C.
I would like these classes to be referred to as A, B and C elsewhere in the package for readability sake and I don't need to add any extra class methods to Vector for these methods. Hence, I don't think the extend-my-library pattern is useful here.
However, I would like to include all the useful functional methods that come with Vector without having to 'drill' into a wrapper object each time. As efficiency is important here, I also didn't want the added weight of a wrapper.
Therefore I decided to define type aliases in the package object:
package object abc {
type A: Vector[Byte]
type B: Vector[Byte]
type C: Vector[Byte]
}
However, each has it's own fixed length and I would like to include factory methods for their creation. It seems like this is what companion objects are for. This is how my final design looks:
package object abc {
type A: Vector[Byte]
object A {
val LENGTH: Int = ...
def apply(...): A = {
Vector.tabulate...
}
}
...
}
Everything compiles and it allows me to do stuff like this:
val a: A = A(...)
a map {...} mkString(...)
I can't find anything specifically warning against writing companion objects for type aliases, but it seems it goes against how type aliases should be used. It also means that all three of these classes are defined in the same file, when ideally they should be separated.
Are there any hidden problems with this approach?
Is there a better design for this problem?
Thanks.
I guess it is totally ok, because you are not really implementing a companion object.
If you were, you would have access to private fields of immutable.Vector from inside object A (like e.g. private var dirty), which you do not have.
Thus, although it somewhat feels like A is a companion object, it really isn't.
If it were possible to create a companion object for any type by using type alias would make member visibility constraints moot (except maybe for private|protected[this]).
Furthermore, naming the object like the type alias clarifies context and purpose of the object, which is a plus in my book.
Having them all in one file is something that is pretty common in scala as I know it (e.g. when using the type class pattern).
Thus:
No pitfalls, I know of.
And, imho, no need for a different approach.

How can I mix higher-kinds with "regular" generics for Typeclasses in Scala

I'm trying to write my own Typeclass in scala, to provide a mechanism to convert classes into an arbitrary "DataObject" (for which I'm using a Map below, however I don't want that to be important). Up until now I have the following:
type DataObject = Map[String, Any]
trait DataSerializer[A] {
def toDataObject(instance: A): DataObject
def fromDataObject(dataObject: DataObject): A
}
This works well for 'simple' classes, for which I can create a concrete class implementing this trait to act as my serializer. However, I thought it would also be nice to allow Collections/Containers to be serialized, without having to create a different implementation for every type that could be contained. I ended up with this:
trait DataCollectionSerializer[Collection[_]] {
def toDataObject[A: DataSerializer](instance: Collection[A]): DataObject
def fromDataObject[A: DataSerializer](dataObject: DataObject): Collection[A]
}
ie. a collection can be serialized if it's contents can be serialized.
Again, this works well for most things, but what if I have a collection within a collection? For example, List[List[Int]] (assuming that there exists some implementation of DataCollectionSerializer[List] and DataSerializer[Int]) would require an implementation of DataSerializer[List[Int]]. I could simply continue writing a new trait for each level of containment, however that would eventually result in some upper limit for what my Typeclass could achieve.
Is there some way that I could combine these two traits, to allow DataCollectionSerializer to operate upon any collection, providing its contents have either a DataSerializer or DataCollectionSerializer?
You can change DataCollectionSerializer to
trait DataCollectionSerializer[Collection[_]] {
def serializer[A: DataSerializer]: DataSerializer[Collection[A]]
}
and to get DataSerializer for e.g. List[Int]: implicitly[DataCollectionSerializer[List]].serializer[Int]. Then all non-higher-kind types have a DataSerializer and you don't need to mix anything.

Creating an arraylist in Scala

I am a teachers assistant for a class that teaches Scala. As an assignment, I want the students to implement an arraylist class.
In java I wrote it like:
public class ArrayList<T> implements List<T>{....}
Is there any equivalent List trait that I should use to implement the arraylist?
The name ArrayList suggests that you should mix-in IndexedSeq. Actually you probably want to get all the goodies that are provided by IndexedSeqLike, i.e.
class ArrayList[A] extends IndexedSeq[A] with IndexedSeqLike[A, ArrayList[A]]
This gets you concrete implementations of head, tail, take, drop, filter, etc. If you also want map, flatMap, etc. (all the methods that take a type parameter) to work properly (return an ArrayList[A]), you also have to provide a type class instance for CanBuildFrom in your companion object, e.g.
def cbf[A, B] = new CanBuildFrom[ArrayList[A], B, ArrayList[B]] {
// TODO Implementation!
}
The scala collection library is very complex. For an overview on the inheritance take a look at these pictures:
scala.collection.immutable: http://www.scala-lang.org/docu/files/collections-api/collections.immutable.png
scala.collection.mutable: http://www.scala-lang.org/docu/files/collections-api/collections.mutable.png
Also the scaladoc gives a good overview about all the classes and traits of the collection library.
Be aware, that in Scala a List is a real list, meaning it is a LinearSeq, in Java a List is more like an IndexedSeq in Scala.
In Scala there are many Interfaces. First, they are separated in mutable and immutable ones. In Java ArrayList is based on an array - thus it is an indexed sequence. In Scala the interface for this is IndexedSeq[A]. Because ArrayList is also mutable, you can choose scala.collection.mutable.IndexedSeq otherwise scala.collection.immutable.IndexedSeq. Instead of mutable.IndexedSeq you can also choose scala.collection.mutable.Buffer, which does not guarantee an access time of O(1).
If you wanna have a more functional approach you can prefer Seq[A] as interface or Iterable[A] if you want to be able to implement more than Sequences.
That would be Seq[T], or maybe IndexedSeq[T] - or even List[T].

What are type classes in Scala useful for?

As I understand from this blog post "type classes" in Scala is just a "pattern" implemented with traits and implicit adapters.
As the blog says if I have trait A and an adapter B -> A then I can invoke a function, which requires argument of type A, with an argument of type B without invoking this adapter explicitly.
I found it nice but not particularly useful. Could you give a use case/example, which shows what this feature is useful for ?
One use case, as requested...
Imagine you have a list of things, could be integers, floating point numbers, matrices, strings, waveforms, etc. Given this list, you want to add the contents.
One way to do this would be to have some Addable trait that must be inherited by every single type that can be added together, or an implicit conversion to an Addable if dealing with objects from a third party library that you can't retrofit interfaces to.
This approach becomes quickly overwhelming when you also want to begin adding other such operations that can be done to a list of objects. It also doesn't work well if you need alternatives (for example; does adding two waveforms concatenate them, or overlay them?) The solution is ad-hoc polymorphism, where you can pick and chose behaviour to be retrofitted to existing types.
For the original problem then, you could implement an Addable type class:
trait Addable[T] {
def zero: T
def append(a: T, b: T): T
}
//yup, it's our friend the monoid, with a different name!
You can then create implicit subclassed instances of this, corresponding to each type that you wish to make addable:
implicit object IntIsAddable extends Addable[Int] {
def zero = 0
def append(a: Int, b: Int) = a + b
}
implicit object StringIsAddable extends Addable[String] {
def zero = ""
def append(a: String, b: String) = a + b
}
//etc...
The method to sum a list then becomes trivial to write...
def sum[T](xs: List[T])(implicit addable: Addable[T]) =
xs.FoldLeft(addable.zero)(addable.append)
//or the same thing, using context bounds:
def sum[T : Addable](xs: List[T]) = {
val addable = implicitly[Addable[T]]
xs.FoldLeft(addable.zero)(addable.append)
}
The beauty of this approach is that you can supply an alternative definition of some typeclass, either controlling the implicit you want in scope via imports, or by explicitly providing the otherwise implicit argument. So it becomes possible to provide different ways of adding waveforms, or to specify modulo arithmetic for integer addition. It's also fairly painless to add a type from some 3rd-party library to your typeclass.
Incidentally, this is exactly the approach taken by the 2.8 collections API. Though the sum method is defined on TraversableLike instead of on List, and the type class is Numeric (it also contains a few more operations than just zero and append)
Reread the first comment there:
A crucial distinction between type classes and interfaces is that for class A to be a "member" of an interface it must declare so at the site of its own definition. By contrast, any type can be added to a type class at any time, provided you can provide the required definitions, and so the members of a type class at any given time are dependent on the current scope. Therefore we don't care if the creator of A anticipated the type class we want it to belong to; if not we can simply create our own definition showing that it does indeed belong, and then use it accordingly. So this not only provides a better solution than adapters, in some sense it obviates the whole problem adapters were meant to address.
I think this is the most important advantage of type classes.
Also, they handle properly the cases where the operations don't have the argument of the type we are dispatching on, or have more than one. E.g. consider this type class:
case class Default[T](val default: T)
object Default {
implicit def IntDefault: Default[Int] = Default(0)
implicit def OptionDefault[T]: Default[Option[T]] = Default(None)
...
}
I think of type classes as the ability to add type safe metadata to a class.
So you first define a class to model the problem domain and then think of metadata to add to it. Things like Equals, Hashable, Viewable, etc. This creates a separation of the problem domain and the mechanics to use the class and opens up subclassing because the class is leaner.
Except for that, you can add type classes anywhere in the scope, not just where the class is defined and you can change implementations. For example, if I calculate a hash code for a Point class by using Point#hashCode, then I'm limited to that specific implementation which may not create a good distribution of values for the specific set of Points I have. But if I use Hashable[Point], then I may provide my own implementation.
[Updated with example]
As an example, here's a use case I had last week. In our product there are several cases of Maps containing containers as values. E.g., Map[Int, List[String]] or Map[String, Set[Int]]. Adding to these collections can be verbose:
map += key -> (value :: map.getOrElse(key, List()))
So I wanted to have a function that wraps this so I could write
map +++= key -> value
The main issue is that the collections don't all have the same methods for adding elements. Some have '+' while others ':+'. I also wanted to retain the efficiency of adding elements to a list, so I didn't want to use fold/map which create new collections.
The solution is to use type classes:
trait Addable[C, CC] {
def add(c: C, cc: CC) : CC
def empty: CC
}
object Addable {
implicit def listAddable[A] = new Addable[A, List[A]] {
def empty = Nil
def add(c: A, cc: List[A]) = c :: cc
}
implicit def addableAddable[A, Add](implicit cbf: CanBuildFrom[Add, A, Add]) = new Addable[A, Add] {
def empty = cbf().result
def add(c: A, cc: Add) = (cbf(cc) += c).result
}
}
Here I defined a type class Addable that can add an element C to a collection CC. I have 2 default implementations: For Lists using :: and for other collections, using the builder framework.
Then using this type class is:
class RichCollectionMap[A, C, B[_], M[X, Y] <: collection.Map[X, Y]](map: M[A, B[C]])(implicit adder: Addable[C, B[C]]) {
def updateSeq[That](a: A, c: C)(implicit cbf: CanBuildFrom[M[A, B[C]], (A, B[C]), That]): That = {
val pair = (a -> adder.add(c, map.getOrElse(a, adder.empty) ))
(map + pair).asInstanceOf[That]
}
def +++[That](t: (A, C))(implicit cbf: CanBuildFrom[M[A, B[C]], (A, B[C]), That]): That = updateSeq(t._1, t._2)(cbf)
}
implicit def toRichCollectionMap[A, C, B[_], M[X, Y] <: col
The special bit is using adder.add to add the elements and adder.empty to create new collections for new keys.
To compare, without type classes I would have had 3 options:
1. to write a method per collection type. E.g., addElementToSubList and addElementToSet etc. This creates a lot of boilerplate in the implementation and pollutes the namespace
2. to use reflection to determine if the sub collection is a List / Set. This is tricky as the map is empty to begin with (of course scala helps here also with Manifests)
3. to have poor-man's type class by requiring the user to supply the adder. So something like addToMap(map, key, value, adder), which is plain ugly
Yet another way I find this blog post helpful is where it describes typeclasses: Monads Are Not Metaphors
Search the article for typeclass. It should be the first match. In this article, the author provides an example of a Monad typeclass.
The forum thread "What makes type classes better than traits?" makes some interesting points:
Typeclasses can very easily represent notions that are quite difficult to represent in the presence of subtyping, such as equality and ordering.
Exercise: create a small class/trait hierarchy and try to implement .equals on each class/trait in such a way that the operation over arbitrary instances from the hierarchy is properly reflexive, symmetric, and transitive.
Typeclasses allow you to provide evidence that a type outside of your "control" conforms with some behavior.
Someone else's type can be a member of your typeclass.
You cannot express "this method takes/returns a value of the same type as the method receiver" in terms of subtyping, but this (very useful) constraint is straightforward using typeclasses. This is the f-bounded types problem (where an F-bounded type is parameterized over its own subtypes).
All operations defined on a trait require an instance; there is always a this argument. So you cannot define for example a fromString(s:String): Foo method on trait Foo in such a way that you can call it without an instance of Foo.
In Scala this manifests as people desperately trying to abstract over companion objects.
But it is straightforward with a typeclass, as illustrated by the zero element in this monoid example.
Typeclasses can be defined inductively; for example, if you have a JsonCodec[Woozle] you can get a JsonCodec[List[Woozle]] for free.
The example above illustrates this for "things you can add together".
One way to look at type classes is that they enable retroactive extension or retroactive polymorphism. There are a couple of great posts by Casual Miracles and Daniel Westheide that show examples of using Type Classes in Scala to achieve this.
Here's a post on my blog
that explores various methods in scala of retroactive supertyping, a kind of retroactive extension, including a typeclass example.
I don't know of any other use case than Ad-hoc polymorhism which is explained here the best way possible.
Both implicits and typeclasses are used for Type-conversion. The major use-case for both of them is to provide ad-hoc polymorphism(i.e) on classes that you can't modify but expect inheritance kind of polymorphism. In case of implicits you could use both an implicit def or an implicit class (which is your wrapper class but hidden from the client). Typeclasses are more powerful as they can add functionality to an already existing inheritance chain(eg: Ordering[T] in scala's sort function).
For more detail you can see https://lakshmirajagopalan.github.io/diving-into-scala-typeclasses/
In scala type classes
Enables ad-hoc polymorphism
Statically typed (i.e. type-safe)
Borrowed from Haskell
Solves the expression problem
Behavior can be extended
- at compile-time
- after the fact
- without changing/recompiling existing code
Scala Implicits
The last parameter list of a method can be marked implicit
Implicit parameters are filled in by the compiler
In effect, you require evidence of the compiler
… such as the existence of a type class in scope
You can also specify parameters explicitly, if needed
Below Example extension on String class with type class implementation extends the class with a new methods even though string is final :)
/**
* Created by nihat.hosgur on 2/19/17.
*/
case class PrintTwiceString(val original: String) {
def printTwice = original + original
}
object TypeClassString extends App {
implicit def stringToString(s: String) = PrintTwiceString(s)
val name: String = "Nihat"
name.printTwice
}
This is an important difference (needed for functional programming):
consider inc:Num a=> a -> a:
a received is the same that is returned, this cannot be done with subtyping
I like to use type classes as a lightweight Scala idiomatic form of Dependency Injection that still works with circular dependencies yet doesn't add a lot of code complexity. I recently rewrote a Scala project from using the Cake Pattern to type classes for DI and achieved a 59% reduction in code size.