Simple example of extending a Scala collection - scala

I'm looking for a very simple example of subclassing a Scala collection. I'm not so much interested in full explanations of how and why it all works; plenty of those are available here and elsewhere on the Internet. I'd like to know the simple way to do it.
The class below might be as simple an example as possible. The idea is, make a subclass of Set[Int] which has one additional method:
class SlightlyCustomizedSet extends Set[Int] {
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Obviously this is wrong. One problem is that there's no constructor to put things into the Set. A CanBuildFrom object must be built, preferably by calling some already-existing library code that knows how to build it. I've seen examples that implement several additional methods in the companion object, but they're showing how it all works or how to do something more complicated. I'd like to see how to leverage what's already in the libraries to knock this out in a couple lines of code. What's the smallest, simplest way to implement this?

If you just want to add a single method to a class, then subclassing may not be the way to go. Scala's collections library is somewhat complicated, and leaf classes aren't always amenable to subclassing (one might start by subclassing HashSet, but this would start you on a journey down a deep rabbit hole).
Perhaps a simpler way to achieve your goal would be something like:
implicit class SetPimper(val s: Set[Int]) extends AnyVal {
def findOdd: Option[Int] = s.find(_ % 2 == 1)
}
This doesn't actually subclass Set, but creates an implicit conversion that allows you to do things like:
Set(1,2,3).findOdd // Some(1)
Down the Rabbit Hole
If you've come from a Java background, it might be surprising that it's so difficult to extend standard collections - after all the Java standard library's peppered with j.u.ArrayList subclasses, for pretty much anything that can contain other things. However, Scala has one key difference: its first-choice collections are all immutable.
This means that they don't have add methods that modify them in-place. Instead, they have + methods that construct a new instance, with all the original items, plus the new item. If they'd implemented this naïvely, it'd be very inefficient, so they use various class-specific tricks to allow the new instances to share data with the original one. The + method may even return an object of a different type to the original - some of the collections classes use a different representation for small or empty collections.
However, this also means that if you want to subclass one of the immutable collections, then you need to understand the guts of the class you're subclassing, to ensure that your instances of your subclass are constructed in the same way as the base class.
By the way, none of this applies to you if you want to subclass the mutable collections. They're seen as second class citizens in the scala world, but they do have add methods, and rarely need to construct new instances. The following code:
class ListOfUsers(users: Int*) extends scala.collection.mutable.HashSet[Int] {
this ++= users
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Will probably do more-or-less what you expect in most cases (map and friends might not do quite what you expect, because of the the CanBuildFrom stuff that I'll get to in a minute, but bear with me).
The Nuclear Option
If inheritance fails us, we always have a nuclear option to fall back on: composition. We can create our own Set subclass that delegates its responsibilities to a delegate, as such:
import scala.collection.SetLike
import scala.collection.mutable.Builder
import scala.collection.generic.CanBuildFrom
class UserSet(delegate: Set[Int]) extends Set[Int] with SetLike[Int, UserSet] {
override def contains(key: Int) = delegate.contains(key)
override def iterator = delegate.iterator
override def +(elem: Int) = new UserSet(delegate + elem)
override def -(elem: Int) = new UserSet(delegate - elem)
override def empty = new UserSet(Set.empty)
override def newBuilder = UserSet.newBuilder
override def foreach[U](f: Int => U) = delegate.foreach(f) // Optional
override def size = delegate.size // Optional
}
object UserSet {
def apply(users: Int*) = (newBuilder ++= users).result()
def newBuilder = new Builder[Int, UserSet] {
private var delegateBuilder = Set.newBuilder[Int]
override def +=(elem: Int) = {
delegateBuilder += elem
this
}
override def clear() = delegateBuilder.clear()
override def result() = new UserSet(delegateBuilder.result())
}
implicit object UserSetCanBuildFrom extends CanBuildFrom[UserSet, Int, UserSet] {
override def apply() = newBuilder
override def apply(from: UserSet) = newBuilder
}
}
This is arguably both too complicated and too simple at the same time. It's far more lines of code than we meant to write, and yet, it's still pretty naïve.
It'll work without the companion class, but without CanBuildFrom, map will return a plain Set, which may not be what you expect. We've also overridden the optional methods that the documentation for Set recommends we implement.
If we were being thorough, we'd have created a CanBuildFrom, and implemented empty for our mutable class, as this ensures that the handful of methods that create new instances will work as we expect.
But that sounds like a lot of work...
If that sounds like too much work, consider something like the following:
case class UserSet(users: Set[Int])
Sure, you have to type a few more letters to get at the set of users, but I think it separates concerns better than subclassing.

Related

Does Scala's Vector add any new methods on top of those provided by Seq and other superclasses?

Are there any methods in Scala's Vector that are not declared by its superclasses like AbstractSeq?
I am working on providing language localization (translation) for a learning environment/IDE built on top of Scala called Kojo (see kojo.in). I have translated most commonly used methods of Seq. Vector inherits them automatically, so I don't need to duplicated the translation code (keeping DRY). E.g.,
implicit class TurkishTranslationsForSeqMethods[T](s: Seq[T]) {
def başı: T = s.head
def kuyruğu: Seq[T] = s.tail
def boyu: Int = s.length
def boşMu: Boolean = s.isEmpty
// ...
}
implicit class TranslationsForVectorMethods[T](v: Vector[T]) {
??? // what to translate here?
}
Hence the question. Maybe, more importantly, is there a way to find out such novel additions for any class without having to do a manual diff?
The scaladoc provides a way to filter methods to not see the ones inherited from Seq for instance: https://www.scala-lang.org/api/current/scala/collection/immutable/Vector.html a'd click on "Filter all members".
Or, probably easier, IDEs usually provide a "Hierarchy" view of a class and its methods that would give you the information quickly.

How to create a Scala function that can parametrically create instances of sub-types of some type

Sorry I'm not very familiar with Scala, but I'm curious if this is possible and haven't been able to figure out how.
Basically, I want to create some convenience initializers that can generate a random sample of data (in this case a grid). The grid will always be filled with instances of a particular type (in this case a Location). But in different cases I might want grids filled with different subtypes of Location, e.g. Farm or City.
In Python, this would be trivial:
def fillCollection(klass, size):
return [klass() for _ in range(size)]
class City: pass
cities = fillCollection(City, 10)
I tried to do something similar in Scala but it does not work:
def fillGrid[T <: Location](size): Vector[T] = {
Vector.fill[T](size, size) {
T()
}
}
The compiler just says "not found: value T"
So, it it possible to approximate the above Python code in Scala? If not, what's the recommended way to handle this kind of situation? I could write an initializer for each subtype, but in my real code there's a decent amount of boilerplate overlap between them so I'd like to share code if possible.
The best workaround I've come up with so far is to pass a closure into the initializer (which seems to be how the fill method on Vectors already works), e.g.:
def fillGrid[T <: Location](withElem: => T, size: Int = 100): Vector[T] = {
Vector.fill[T](n1 = size, n2 = size)(withElem)
}
That's not a huge inconvenience, but it makes me curious why Scala doesn't support the "simpler" Python-style construct (if it in fact doesn't). I sort of get why having a "fully generic" initializer could cause trouble, but in this case I can't see what the harm would be generically initializing instances that are all known to be subtypes of a given parent type.
You are correct, in that what you have is probably the simplest option. The reason Scala can't do things the pythonic way is because the type system is much stronger, and it has to contend with type erasure. Scala can not guarantee at compile time that any subclass of Location has a particular constructor, and it will only allow you to do things that it can guarantee will conform to the types (unless you do tricky things with reflection).
If you want to clean it up a little bit, you can make it work more like python by using implicits.
implicit def emptyFarm(): Farm = new Farm
implicit def emptyCity(): City = new City
def fillGrid[T <: Location](size: Int = 100)(implicit withElem: () => T): Vector[Vector[T]] = {
Vector.fill[T](n1 = size, n2 = size)(withElem())
}
fillGrid[farm](3)
To make this more usable in a library, it's common to put the implicits in a companion object of Location, so they can all be brought into scope where appropriate.
sealed trait Location
...
object Location
{
implicit def emptyFarm...
implicit def emptyCity...
}
...
import Location._
fillGrid[Farm](3)
You can use reflection to accomplish what you want...
This is a simple example that will only work if all your subclasses have a zero args constructor.
sealed trait Location
class Farm extends Location
class City extends Location
def fillGrid[T <: Location](size: Int)(implicit TTag: scala.reflect.ClassTag[T]): Vector[Vector[T]] = {
val TClass = TTag.runtimeClass
Vector.fill[T](size, size) { TClass.newInstance().asInstanceOf[T] }
}
However, I have never been a fan of runtime reflection, and I hope there could be another way.
Scala cannot do this kind of thing directly because it's not type safe. It will not work if you pass a class without a zero-argument constructor. The Python version throws an error at runtime if you try to do this.
The closure is probably the best way to go.

Return copy of case class from generic function without runtime cast

I want to get rid of a runtime cast to a generic (asInstanceOf[A]) without implicit conversions.
This happens when I have a fairly clean data model consisting of case classes with a common trait and want to implement a generic algorithm on it. As an example the resulting algorithm should take a class of type A that is a subclass of the trait T and is supposed to return a copy of the concrete class A with some updated field.
This is easy to achieve when I can simply add an abstract copy-method to the base trait and implement that in all sub-classes. However this potentially pollutes the model with methods only required by certain algorithms and is sometimes not possible because the model could be out of my control.
Here is a simplified example to demonstrate the problem and a solution using runtime casts.
Please don't get hung up on the details.
Suppose there is a trait and some case classes I can't change:
trait Share {
def absolute: Int
}
case class CommonShare(
issuedOn: String,
absolute: Int,
percentOfCompany: Float)
extends Share
case class PreferredShare(
issuedOn: String,
absolute: Int,
percentOfCompany: Float)
extends Share
And here is a simple method to recalculate the current percentOfCompany when the total number of shares have changed and update the field in the case class
def recalculateShare[A <: Share](share: A, currentTotalShares: Int): A = {
def copyOfShareWith(newPercentage: Float) = {
share match {
case common: CommonShare => common.copy(percentOfCompany = newPercentage)
case preferred: PreferredShare => preferred.copy(percentOfCompany = newPercentage)
}
}
copyOfShareWith(share.absolute / currentTotalShares.toFloat).asInstanceOf[A]
}
Some example invocations on the REPL:
scala> recalculateShare(CommonShare("2014-01-01", 100, 0.5f), 400)
res0: CommonShare = CommonShare(2014-01-01,100,0.25)
scala> recalculateShare(PreferredShare("2014-01-01", 50, 0.5f), 400)
res1: PreferredShare = PreferredShare(2014-01-01,50,0.125)
So it works and as far as I understand the .asInstanceOf[A] call will never fail but is required to make the code compile. Is there a way to avoid the runtime cast in a type-safe manner without implicit conversions?
You have a couple of choices I can think of, and it mostly comes down to a balance of how general of a solution you want and how much verbosity you can tolerate.
asInstanceOf
Your solution feels dirty, but I don't think it's all that bad, and the gnarliness is pretty well contained.
Typeclass
A great approach to providing behavior to data types while still maintaining separation of concerns in your code is the Enrich Your Library / typeclass pattern. I wish I had a perfect reference for this, but I don't. Look up those terms or "implicit class", and you should be able to find enough examples to get the drift.
You can create a trait Copyable[A] { def copy(?): A } typeclass (implicit class) and make instances of it for each of your types. The problem here is that it's kind of verbose, especially if you want that copy method to be fully generic. I left its parameter list as a question mark because you could just narrowly tailor it to what you actually need, or you could try to make it work for any case class, which would be quite difficult, as far as I know.
Optics
Lenses were made for solving this sort of awkwardness. You may want to check out Monocle, which is a nice generic approach to this issue. Although it still doesn't really solve the issue of verbosity, it might be the way to go if you have this issue recurring throughout your project, and especially if you find yourself trying to make changes deep within your object graph.
Here is a typeclass approach suggested by #acjay
trait Copyable[A <: Share] {
def copy(share: A, newPercentage: Float): A
}
object Copyable {
implicit val commonShareCopyable: Copyable[CommonShare] =
(share: CommonShare, newPercentage: Float) => share.copy(percentOfCompany = newPercentage)
implicit val preferredShareCopyable: Copyable[PreferredShare] =
(share: PreferredShare, newPercentage: Float) => share.copy(percentOfCompany = newPercentage)
}
implicit class WithRecalculateShare[A <: Share](share: A) {
def recalculateShare(currentTotalShares: Int)(implicit ev: Copyable[A]): A =
ev.copy(share, share.absolute / currentTotalShares.toFloat)
}
CommonShare("2014-01-01", 100, 0.5f).recalculateShare(400)
// res0: CommonShare = CommonShare(2014-01-01,100,0.25)
PreferredShare("2014-01-01", 50, 0.5f).recalculateShare(400)
// res1: PreferredShare = PreferredShare(2014-01-01,50,0.125)

When does it make sense to use implicit parameters in Scala, and what may be alternative scala idioms to consider?

Having used a Scala library that liberally exposes the reliance on implicits to the caller, I had experienced friction around this mechanism, as Scala makes it quite hard at times to debug implicit arguments, and because there's quite a bunch of places Scala would fill in values for implicit arguments from. (I could almost relate to it as "implicits hell" at one time).
At one time in my coding, Scala "complained" an implicit value could not be matched whereas in fact there was a "collision" of implicit values each coming from a different import.
Regardless of that perceived brittleness, it may at times feel borderline to an abuse of the context design pattern.
Why does it make sense to have implicit parameters in Scala?
In what scenarios would you use them and how would you avoid trouble?
As I'm not sure the experimentation-curve and potential for other team members getting totally confused are worth it, could you possibly suggest other scala idioms for sharing context between a multitude of Scala functions?
This questions is not for a specific implementation at hand, hopefully it's still a good fit for this site.
Generally, using a common type as an implicit parameter is a bad idea.
def badIdea(n: Int)(implicit s: String) = s * n
It doesn't take much to imagine why: you'll get conflicting implicits for the same thing if anyone else adopts this policy. Better to avoid it.
But who really wants to manually stuff in a scala.concurrent.ExecutionContext manually every time it's needed (which is practically everywhere)?
So the key is: when you have something with a specialized type, especially if it's bookkeeping that might need to be overridden manually but mostly should just do the right thing, then use implicit parameters. (This usually covers type classes as well.)
Then what do you do if you really need a string? Well, wrap it (at least formally--here it's a value class so in some contexts it will just pass the string around):
class MyWrappedString(val underlying: String) extends AnyVal {}
implicit val myString = new MyWrappedString("bird")
def decentIdea(n: Int)(implicit mws: MyWrappedString) = mws.underlying * n
scala> decentIdea(2) // In the bush?
res14: String = birdbird
Or if you think some additional logic is helpful, write a wrapper that takes an extra type parameter:
class ImplicitWithValue[K,V](val value: V) {
// Any extra generic logic goes here
}
object ImplicitWithValue {
class ValuePart[K] {
def apply[V](v: V) = new ImplicitWithValue[K,V](v)
}
private val genericValuePart = new ValuePart[Any]
private def typedValuePart[K] = genericValuePart.asInstanceOf[ValuePart[K]]
def apply[K] = typedValuePart[K]
}
Then you can
trait Marker1
implicit val implicit1 = ImplicitWithValue[Marker1]("fish")
def goodIdea(n: Int)(implicit ms: ImplicitWithValue[Marker1, String]) = ms.value * n
scala> goodIdea(3)
res17: String = fishfishfish

Extending Scala collections: One based Array index exercise

As an exercise, I'd like to extend the Scala Array collection to my own OneBasedArray (does what you'd expect, indexing starts from 1). Since this is an immutable collection, I'd like to have it return the correct type when calling filter/map etc.
I've read the resources here, here and here, but am struggling to understand how to translate this to Arrays (or collections other than the ones in the examples). Am I on the right track with this sort of structure?
class OneBasedArray[T]
extends Array[T]
with GenericTraversableTemplate[T, OneBasedArray]
with ArrayLike[T, OneBasedArray]
Are there any further resources that help explain extending collections?
For a in depth overview of new collections API: The Scala 2.8 Collections API
For a nice view of the relation between main classes and traits this
By the way I don't think Array is a collection in Scala.
Here is an example of pimping iterables with a method that always returns the expected runtime type of the iterable it operates on:
import scala.collection.generic.CanBuildFrom
trait MapOrElse[A] {
val underlying: Iterable[A]
def mapOrElse[B, To]
(m: A => Unit)
(pf: PartialFunction[A,B])
(implicit cbf: CanBuildFrom[Iterable[A], B, To])
: To = {
var builder = cbf(underlying.repr)
for (a <- underlying) if (pf.isDefinedAt(a)) builder += pf(a) else m(a)
builder.result
}
}
implicit def toMapOrElse[A](it: Iterable[A]): MapOrElse[A] =
new MapOrElse[A] {val underlying = it}
The new function mapOrElse is similar to the collect function but it allows you to pass a method m: A => Unit in addition to a partial function pf that is invoked whenever pf is undefined. m can for example be a logging method.
An Array is not a Traversable -- trying to work with that as a base class will cause all sorts of problems. Also, it is not immutable either, which makes it completely unsuited to what you want. Finally, Array is an implementation -- try to inherit from traits or abstract classes.
Array isn't a typical Scala collection... It's simply a Java array that's pimped to look like a collection by way of implicit conversions.
Given the messed-up variance of Java Arrays, you really don't want to be using them without an extremely compelling reason, as they're a source of lurking bugs.
(see here: http://www.infoq.com/presentations/Java-Puzzlers)
Creaking a 1-based collection like this isn't really a good idea either, as you have no way of knowing how many other collection methods rely on the assumption that sequences are 0-based. So to do it safely (if you really must) you'll want add a new method that leaves the default one unchanged:
class OneBasedLookup[T](seq:Seq) {
def atIdx(i:Int) = seq(i-1)
}
implicit def seqHasOneBasedLookup(seq:Seq) = new OneBasedLookup(seq)
// now use `atIdx` on any sequence.
Even safer still, you can create a Map[Int,T], with the indices being one-based
(Iterator.from(1) zip seq).toMap
This is arguably the most "correct" solution, although it will also carry the highest performance cost.
Not an array, but here's a one-based immutable IndexedSeq implementation that I recently put together. I followed the example given here where they implement an RNA class. Between that example, the ScalaDocs, and lots of "helpful" compiler errors, I managed to get it set up correctly. The fact that OneBasedSeq is genericized made it a little more complex than the RNA example. Also, in addition to the traits extended and methods overridden in the example, I had to extend IterableLike and override the iterator method, because various methods call that method behind the scenes, and the default iterator is zero-based.
Please pardon any stylistic or idiomadic oddities; I've been programming in Scala for less than 2 months.
import collection.{IndexedSeqLike, IterableLike}
import collection.generic.CanBuildFrom
import collection.mutable.{Builder, ArrayBuffer}
// OneBasedSeq class
final class OneBasedSeq[T] private (s: Seq[T]) extends IndexedSeq[T]
with IterableLike[T, OneBasedSeq[T]] with IndexedSeqLike[T, OneBasedSeq[T]]
{
private val innerSeq = s.toIndexedSeq
def apply(idx: Int): T = innerSeq(idx - 1)
def length: Int = innerSeq.length
override def iterator: Iterator[T] = new OneBasedSeqIterator(this)
override def newBuilder: Builder[T, OneBasedSeq[T]] = OneBasedSeq.newBuilder
override def toString = "OneBasedSeq" + super.toString
}
// OneBasedSeq companion object
object OneBasedSeq {
private def fromSeq[T](s: Seq[T]) = new OneBasedSeq(s)
def apply[T](vals: T*) = fromSeq(IndexedSeq(vals: _*))
def newBuilder[T]: Builder[T, OneBasedSeq[T]] =
new ArrayBuffer[T].mapResult(OneBasedSeq.fromSeq)
implicit def canBuildFrom[T, U]: CanBuildFrom[OneBasedSeq[T], U, OneBasedSeq[U]] =
new CanBuildFrom[OneBasedSeq[T], U, OneBasedSeq[U]] {
def apply() = newBuilder
def apply(from: OneBasedSeq[T]): Builder[U, OneBasedSeq[U]] = newBuilder[U]
}
}
// Iterator class for OneBasedSeq
class OneBasedSeqIterator[T](private val obs: OneBasedSeq[T]) extends Iterator[T]
{
private var index = 1
def hasNext: Boolean = index <= obs.length
def next: T = {
val ret = obs(index)
index += 1
ret
}
}