How do you extend a set of Integers in Scala? - scala

How do I write a custom set of integers in Scala? Specifically I want a class with the following properties:
It is immutable.
It extends the Set trait.
All collection operations return another object of this type as appropriate.
It takes a variable list of integer arguments in its constructor.
Its string representation is a comma-delimited list of elements surrounded by curly braces.
It defines a method mean that returns the mean value of the elements.
For example:
CustomIntSet(1,2,3) & CustomIntSet(2,3,4) // returns CustomIntSet(2, 3)
CustomIntSet(1,2,3).toString // returns {1, 2, 3}
CustomIntSet(2,3).mean // returns 2.5
(1) and (2) ensure that this object does things in the proper Scala way. (3) requires that the builder code be written correctly. (4) ensures that the constructor can be customized. (5) is an example of how to override an existing toString implementation. (6) is an example of how to add new functionality.
This should be done with a minimum of source code and boilerplate, utilizing functionality already present in the Scala language as much as possible.
I've asked a couple questions getting at aspects of the tasks, but I think this one covers the whole issue. The best response I've gotten so far is to use SetProxy, which is helpful but fails (3) above. I've studied the chapter "The Architecture of Scala Collections" in the second edition of Programming in Scala extensively and consulted various online examples, but remain flummoxed.
My goal in doing this is to write a blog post comparing the design tradeoffs in the way Scala and Java approach this problem, but before I do that I have to actually write the Scala code. I didn't think this would be that difficult, but it has been, and I'm admitting defeat.
After futzing around for a few days I came up with the following solution.
package example
import scala.collection.{SetLike, mutable}
import scala.collection.immutable.HashSet
import scala.collection.generic.CanBuildFrom
case class CustomSet(self: Set[Int] = new HashSet[Int].empty) extends Set[Int] with SetLike[Int, CustomSet] {
lazy val mean: Float = sum / size
override def toString() = mkString("{", ",", "}")
protected[this] override def newBuilder = CustomSet.newBuilder
override def empty = CustomSet.empty
def contains(elem: Int) = self.contains(elem)
def +(elem: Int) = CustomSet(self + elem)
def -(elem: Int) = CustomSet(self - elem)
def iterator = self.iterator
}
object CustomSet {
def apply(values: Int*): CustomSet = new CustomSet ++ values
def empty = new CustomSet
def newBuilder: mutable.Builder[Int, CustomSet] = new mutable.SetBuilder[Int, CustomSet](empty)
implicit def canBuildFrom: CanBuildFrom[CustomSet, Int, CustomSet] = new CanBuildFrom[CustomSet, Int, CustomSet] {
def apply(from: CustomSet) = newBuilder
def apply() = newBuilder
}
def main(args: Array[String]) {
val s = CustomSet(2, 3, 5, 7) & CustomSet(5, 7, 11, 13)
println(s + " has mean " + s.mean)
}
}
This appears to meet all the criteria above, but it's got an awful lot of boilerplate. I find the following Java version much easier to understand.
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
public class CustomSet extends HashSet<Integer> {
public CustomSet(Integer... elements) {
Collections.addAll(this, elements);
}
public float mean() {
int s = 0;
for (int i : this)
s += i;
return (float) s / size();
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (Iterator<Integer> i = iterator(); i.hasNext(); ) {
sb.append(i.next());
if (i.hasNext())
sb.append(", ");
}
return "{" + sb + "}";
}
public static void main(String[] args) {
CustomSet s1 = new CustomSet(2, 3, 5, 7, 11);
CustomSet s2 = new CustomSet(5, 7, 11, 13, 17);
s1.retainAll(s2);
System.out.println("The intersection " + s1 + " has mean " + s1.mean());
}
}
This is bad since one of Scala's selling points is that it is more terse and clean than Java.
There's a lot of opaque code in the Scala version. SetLike, newBuilder, and canBuildFrom are all language boilerplate: they have nothing to do with writing sets with curly braces or taking a mean. I can almost accept them as the price you pay for Scala's immutable collection class library (for the moment accepting immutability as an unqualified good), but that leaves still leaves contains, +, -, and iterator which are just boilerplate passthrough code.
They are at least as ugly as getter and setter functions.
It seems like Scala should provide a way not to write Set interface boilerplate, but I can't figure it out. I tried both using SetProxy and extending the concrete HashSet class instead of the abstract Set but both of these gave perplexing compiler errors.
Is there a way to write this code without the contains, +, -, and iterator definitions, or is the above the best I can do?
Following the advice of axel22 below I wrote a simple implementation that utilizes the extremely useful if unfortunately-named pimp my library pattern.
package example
class CustomSet(s: Set[Int]) {
lazy val mean: Float = s.sum / s.size
}
object CustomSet {
implicit def setToCustomSet(s: Set[Int]) = new CustomSet(s)
}
With this you just instantiate Sets instead of CustomSets and do implicit conversion as needed to take the mean.
scala> (Set(1,2,3) & Set(2,3,5)).mean
res4: Float = 2.0
This satisfies most of my original wish list but still fails item (5).
Something axel22 said in the comments below gets at the heart of why I am asking this question.
As for inheritance, immutable (collection) classes are not easy to
inherit in general…
This squares with my experience, but from a language design perspective something seems wrong here. Scala is an object oriented language. (When I saw Martin Odersky give a talk last year, that was the selling point over Haskell he highlighted.) Immutability is the explicitly preferred mode of operation. Scala's collection classes are touted as the crown jewel of its object library. And yet when you want to extend an immutable collection class, you run into all this informal lore to the effect of "don't do that" or "don't try it unless you really know what you're doing." Usually the point of classes is to make them easily extendible. (After all, the collection classes are not marked final.) I'm trying to decide whether this is a design flaw in Scala or a design trade-off that I'm just not seeing.

Aside from the mean which you could add as an extension method using implicit classes and value classes, all the properties you list should be supported by the immutable.BitSet class in the standard library. Perhaps you could find some hints in that implementation, particularly for efficiency purposes.
You wrote a lot of code to achieve delegation above, but you could achieve a similar thing using class inheritance as in Java -- note that writing a delegating version of the custom set would require much more boilerplate in Java too.
Perhaps macros will in the future allow you to write code which generates the delegation boilerplate automatically -- until then, there is the old AutoProxy compiler plugin.

Related

Does Scala's Vector add any new methods on top of those provided by Seq and other superclasses?

Are there any methods in Scala's Vector that are not declared by its superclasses like AbstractSeq?
I am working on providing language localization (translation) for a learning environment/IDE built on top of Scala called Kojo (see kojo.in). I have translated most commonly used methods of Seq. Vector inherits them automatically, so I don't need to duplicated the translation code (keeping DRY). E.g.,
implicit class TurkishTranslationsForSeqMethods[T](s: Seq[T]) {
def başı: T = s.head
def kuyruğu: Seq[T] = s.tail
def boyu: Int = s.length
def boşMu: Boolean = s.isEmpty
// ...
}
implicit class TranslationsForVectorMethods[T](v: Vector[T]) {
??? // what to translate here?
}
Hence the question. Maybe, more importantly, is there a way to find out such novel additions for any class without having to do a manual diff?
The scaladoc provides a way to filter methods to not see the ones inherited from Seq for instance: https://www.scala-lang.org/api/current/scala/collection/immutable/Vector.html a'd click on "Filter all members".
Or, probably easier, IDEs usually provide a "Hierarchy" view of a class and its methods that would give you the information quickly.

Map an instance using function in Scala

Say I have a local method/function
def withExclamation(string: String) = string + "!"
Is there a way in Scala to transform an instance by supplying this method? Say I want to append an exclamation mark to a string. Something like:
val greeting = "Hello"
val loudGreeting = greeting.applyFunction(withExclamation) //result: "Hello!"
I would like to be able to invoke (local) functions when writing a chain transformation on an instance.
EDIT: Multiple answers show how to program this possibility, so it seems that this feature is not present on an arbitraty class. To me this feature seems incredibly powerful. Consider where in Java I want to execute a number of operations on a String:
appendExclamationMark(" Hello! ".trim().toUpperCase()); //"HELLO!"
The order of operations is not the same as how they read. The last operation, appendExclamationMark is the first word that appears. Currently in Java I would sometimes do:
Function.<String>identity()
.andThen(String::trim)
.andThen(String::toUpperCase)
.andThen(this::appendExclamationMark)
.apply(" Hello "); //"HELLO!"
Which reads better in terms of expressing a chain of operations on an instance, but also contains a lot of noise, and it is not intuitive to have the String instance at the last line. I would want to write:
" Hello "
.applyFunction(String::trim)
.applyFunction(String::toUpperCase)
.applyFunction(this::withExclamation); //"HELLO!"
Obviously the name of the applyFunction function can be anything (shorter please). I thought backwards compatibility was the sole reason Java's Object does not have this.
Is there any technical reason why this was not added on, say, the Any or AnyRef classes?
You can do this with an implicit class which provides a way to extend an existing type with your own methods:
object StringOps {
implicit class RichString(val s: String) extends AnyVal {
def withExclamation: String = s"$s!"
}
def main(args: Array[String]): Unit = {
val m = "hello"
println(m.withExclamation)
}
}
Yields:
hello!
If you want to apply any functions (anonymous, converted from methods, etc.) in this way, you can use a variation on Yuval Itzchakov's answer:
object Combinators {
implicit class Combinators[A](val x: A) {
def applyFunction[B](f: A => B) = f(x)
}
}
A while after asking this question, I noticed that Kotlin has this built in:
inline fun <T, R> T.let(block: (T) -> R): R
Calls the specified function block with this value as its argument and returns
its result.
A lot more, quite useful variations of the above function are provided on all types, like with, also, apply, etc.

Simple example of extending a Scala collection

I'm looking for a very simple example of subclassing a Scala collection. I'm not so much interested in full explanations of how and why it all works; plenty of those are available here and elsewhere on the Internet. I'd like to know the simple way to do it.
The class below might be as simple an example as possible. The idea is, make a subclass of Set[Int] which has one additional method:
class SlightlyCustomizedSet extends Set[Int] {
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Obviously this is wrong. One problem is that there's no constructor to put things into the Set. A CanBuildFrom object must be built, preferably by calling some already-existing library code that knows how to build it. I've seen examples that implement several additional methods in the companion object, but they're showing how it all works or how to do something more complicated. I'd like to see how to leverage what's already in the libraries to knock this out in a couple lines of code. What's the smallest, simplest way to implement this?
If you just want to add a single method to a class, then subclassing may not be the way to go. Scala's collections library is somewhat complicated, and leaf classes aren't always amenable to subclassing (one might start by subclassing HashSet, but this would start you on a journey down a deep rabbit hole).
Perhaps a simpler way to achieve your goal would be something like:
implicit class SetPimper(val s: Set[Int]) extends AnyVal {
def findOdd: Option[Int] = s.find(_ % 2 == 1)
}
This doesn't actually subclass Set, but creates an implicit conversion that allows you to do things like:
Set(1,2,3).findOdd // Some(1)
Down the Rabbit Hole
If you've come from a Java background, it might be surprising that it's so difficult to extend standard collections - after all the Java standard library's peppered with j.u.ArrayList subclasses, for pretty much anything that can contain other things. However, Scala has one key difference: its first-choice collections are all immutable.
This means that they don't have add methods that modify them in-place. Instead, they have + methods that construct a new instance, with all the original items, plus the new item. If they'd implemented this naïvely, it'd be very inefficient, so they use various class-specific tricks to allow the new instances to share data with the original one. The + method may even return an object of a different type to the original - some of the collections classes use a different representation for small or empty collections.
However, this also means that if you want to subclass one of the immutable collections, then you need to understand the guts of the class you're subclassing, to ensure that your instances of your subclass are constructed in the same way as the base class.
By the way, none of this applies to you if you want to subclass the mutable collections. They're seen as second class citizens in the scala world, but they do have add methods, and rarely need to construct new instances. The following code:
class ListOfUsers(users: Int*) extends scala.collection.mutable.HashSet[Int] {
this ++= users
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Will probably do more-or-less what you expect in most cases (map and friends might not do quite what you expect, because of the the CanBuildFrom stuff that I'll get to in a minute, but bear with me).
The Nuclear Option
If inheritance fails us, we always have a nuclear option to fall back on: composition. We can create our own Set subclass that delegates its responsibilities to a delegate, as such:
import scala.collection.SetLike
import scala.collection.mutable.Builder
import scala.collection.generic.CanBuildFrom
class UserSet(delegate: Set[Int]) extends Set[Int] with SetLike[Int, UserSet] {
override def contains(key: Int) = delegate.contains(key)
override def iterator = delegate.iterator
override def +(elem: Int) = new UserSet(delegate + elem)
override def -(elem: Int) = new UserSet(delegate - elem)
override def empty = new UserSet(Set.empty)
override def newBuilder = UserSet.newBuilder
override def foreach[U](f: Int => U) = delegate.foreach(f) // Optional
override def size = delegate.size // Optional
}
object UserSet {
def apply(users: Int*) = (newBuilder ++= users).result()
def newBuilder = new Builder[Int, UserSet] {
private var delegateBuilder = Set.newBuilder[Int]
override def +=(elem: Int) = {
delegateBuilder += elem
this
}
override def clear() = delegateBuilder.clear()
override def result() = new UserSet(delegateBuilder.result())
}
implicit object UserSetCanBuildFrom extends CanBuildFrom[UserSet, Int, UserSet] {
override def apply() = newBuilder
override def apply(from: UserSet) = newBuilder
}
}
This is arguably both too complicated and too simple at the same time. It's far more lines of code than we meant to write, and yet, it's still pretty naïve.
It'll work without the companion class, but without CanBuildFrom, map will return a plain Set, which may not be what you expect. We've also overridden the optional methods that the documentation for Set recommends we implement.
If we were being thorough, we'd have created a CanBuildFrom, and implemented empty for our mutable class, as this ensures that the handful of methods that create new instances will work as we expect.
But that sounds like a lot of work...
If that sounds like too much work, consider something like the following:
case class UserSet(users: Set[Int])
Sure, you have to type a few more letters to get at the set of users, but I think it separates concerns better than subclassing.

Scala singleton factories and class constants

OK, in the question about 'Class Variables as constants', I get the fact that the constants are not available until after the 'official' constructor has been run (i.e. until you have an instance). BUT, what if I need the companion singleton to make calls on the class:
object thing {
val someConst = 42
def apply(x: Int) = new thing(x)
}
class thing(x: Int) {
import thing.someConst
val field = x * someConst
override def toString = "val: " + field
}
If I create companion object first, the 'new thing(x)' (in the companion) causes an error. However, if I define the class first, the 'x * someConst' (in the class definition) causes an error.
I also tried placing the class definition inside the singleton.
object thing {
var someConst = 42
def apply(x: Int) = new thing(x)
class thing(x: Int) {
val field = x * someConst
override def toString = "val: " + field
}
}
However, doing this gives me a 'thing.thing' type object
val t = thing(2)
results in
t: thing.thing = val: 84
The only useful solution I've come up with is to create an abstract class, a companion and an inner class (which extends the abstract class):
abstract class thing
object thing {
val someConst = 42
def apply(x: Int) = new privThing(x)
class privThing(x: Int) extends thing {
val field = x * someConst
override def toString = "val: " + field
}
}
val t1 = thing(2)
val tArr: Array[thing] = Array(t1)
OK, 't1' still has type of 'thing.privThing', but it can now be treated as a 'thing'.
However, it's still not an elegant solution, can anyone tell me a better way to do this?
PS. I should mention, I'm using Scala 2.8.1 on Windows 7
First, the error you're seeing (you didn't tell me what it is) isn't a runtime error. The thing constructor isn't called when the thing singleton is initialized -- it's called later when you call thing.apply, so there's no circular reference at runtime.
Second, you do have a circular reference at compile time, but that doesn't cause a problem when you're compiling a scala file that you've saved on disk -- the compiler can even resolve circular references between different files. (I tested. I put your original code in a file and compiled it, and it worked fine.)
Your real problem comes from trying to run this code in the Scala REPL. Here's what the REPL does and why this is a problem in the REPL. You're entering object thing and as soon as you finish, the REPL tries to compile it, because it's reached the end of a coherent chunk of code. (Semicolon inference was able to infer a semicolon at the end of the object, and that meant the compiler could get to work on that chunk of code.) But since you haven't defined class thing it can't compile it. You have the same problem when you reverse the definitions of class thing and object thing.
The solution is to nest both class thing and object thing inside some outer object. This will defer compilation until that outer object is complete, at which point the compiler will see the definitions of class thing and object thing at the same time. You can run import thingwrapper._ right after that to make class thing and object thing available in global scope for the REPL. When you're ready to integrate your code into a file somewhere, just ditch the outer class thingwrapper.
object thingwrapper{
//you only need a wrapper object in the REPL
object thing {
val someConst = 42
def apply(x: Int) = new thing(x)
}
class thing(x: Int) {
import thing.someConst
val field = x * someConst
override def toString = "val: " + field
}
}
Scala 2.12 or more could benefit for sip 23 which just (August 2016) pass to the next iteration (considered a “good idea”, but is a work-in-process)
Literal-based singleton types
Singleton types bridge the gap between the value level and the type level and hence allow the exploration in Scala of techniques which would typically only be available in languages with support for full-spectrum dependent types.
Scala’s type system can model constants (e.g. 42, "foo", classOf[String]).
These are inferred in cases like object O { final val x = 42 }. They are used to denote and propagate compile time constants (See 6.24 Constant Expressions and discussion of “constant value definition” in 4.1 Value Declarations and Definitions).
However, there is no surface syntax to express such types. This makes people who need them, create macros that would provide workarounds to do just that (e.g. shapeless).
This can be changed in a relatively simple way, as the whole machinery to enable this is already present in the scala compiler.
type _42 = 42.type
type Unt = ().type
type _1 = 1 // .type is optional for literals
final val x = 1
type one = x.type // … but mandatory for identifiers

Pros and Cons of choosing def over val

I'm asking a slight different question than this one. Suppose I have a code snippet:
def foo(i : Int) : List[String] = {
val s = i.toString + "!" //using val
s :: Nil
}
This is functionally equivalent to the following:
def foo(i : Int) : List[String] = {
def s = i.toString + "!" //using def
s :: Nil
}
Why would I choose one over the other? Obviously I would assume the second has a slight disadvantages in:
creating more bytecode (the inner def is lifted to a method in the class)
a runtime performance overhead of invoking a method over accessing a value
non-strict evaluation means I could easily access s twice (i.e. unnecesasarily redo a calculation)
The only advantage I can think of is:
non-strict evaluation of s means it is only called if it is used (but then I could just use a lazy val)
What are peoples' thoughts here? Is there a significant dis-benefit to me making all inner vals defs?
1)
One answer I didn't see mentioned is that the stack frame for the method you're describing could actually be smaller. Each val you declare will occupy a slot on the JVM stack, however, the whenever you use a def obtained value it will get consumed in the first expression you use it in. Even if the def references something from the environment, the compiler will pass .
The HotSpot should optimize both these things, or so some people claim. See:
http://www.ibm.com/developerworks/library/j-jtp12214/
Since the inner method gets compiled into a regular private method behind the scene and it is usually very small, the JIT compiler might choose to inline it and then optimize it. This could save time allocating smaller stack frames (?), or, by having fewer elements on the stack, make local variables access quicker.
But, take this with a (big) grain of salt - I haven't actually made extensive benchmarks to backup this claim.
2)
In addition, to expand on Kevin's valid reply, the stable val provides also means that you can use it with path dependent types - something you can't do with a def, since the compiler doesn't check its purity.
3)
For another reason you might want to use a def, see a related question asked not so long ago:
Functional processing of Scala streams without OutOfMemory errors
Essentially, using defs to produce Streams ensures that there do not exist additional references to these objects, which is important for the GC. Since Streams are lazy anyway, the overhead of creating them is probably negligible even if you have multiple defs.
The val is strict, it's given a value as soon as you define the thing.
Internally, the compiler will mark it as STABLE, equivalent to final in Java. This should allow the JVM to make all sorts of optimisations - I just don't know what they are :)
I can see an advantage in the fact that you are less bound to a location when using a def than when using a val.
This is not a technical advantage but allows for better structuring in some cases.
So, stupid example (please edit this answer, if you’ve got a better one), this is not possible with val:
def foo(i : Int) : List[String] = {
def ret = s :: Nil
def s = i.toString + "!"
ret
}
There may be cases where this is important or just convenient.
(So, basically, you can achieve the same with lazy val but, if only called at most once, it will probably be faster than a lazy val.)
For a local declaration like this (with no arguments, evaluated precisely once and with no code evaluated between the point of declaration and the point of evaluation) there is no semantic difference. I wouldn't be surprised if the "val" version compiled to simpler and more efficient code than the "def" version, but you would have to examine the bytecode and possibly profile to be sure.
In your example I would use a val. I think the val/def choice is more meaningful when declaring class members:
class A { def a0 = "a"; def a1 = "a" }
class B extends A {
var c = 0
override def a0 = { c += 1; "a" + c }
override val a1 = "b"
}
In the base class using def allows the sub class to override with possibly a def that does not return a constant. Or it could override with a val. So that gives more flexibility than a val.
Edit: one more use case of using def over val is when an abstract class has a "val" for which the value should be provided by a subclass.
abstract class C { def f: SomeObject }
new C { val f = new SomeObject(...) }