Use of lazy val for caching string representation - scala

I encountered the following code in JAXMag's Scala special issue:
package com.weiglewilczek.gameoflife
case class Cell(x: Int, y: Int) {
override def toString = position
private lazy val position = "(%s, %s)".format(x, y)
}
Does the use of lazy val in the above code provide considerably more performance than the following code?
package com.weiglewilczek.gameoflife
case class Cell(x: Int, y: Int) {
override def toString = "(%s, %s)".format(x, y)
}
Or is it just a case of unnecessary optimization?

One thing to note about lazy vals is that, while they are only calculated once, every access to them is protected by a double-checked locking wrapper. This is necessary to prevent two different threads from attempting to initialize the value at the same time with hilarious results. Now double-checked locking is pretty efficient (now that it actually works in the JVM), and won't require lock acquisition in most cases, but there is more overhead than a simple value access.
Additionally (and somewhat obviously), by caching the string representation of your object, you are explicitly trading off CPU cycles for possibly large increases in memory usage. The strings in the "def" version can be garbage-collected, while those in the "lazy val" version will not be.
Finally, as is always the case with performance questions, theory-based hypotheses mean nearly nothing without fact-based benchmarking. You'll never know for sure without profiling, so might as well try it and see.

toString can be directly overriden with a lazy val.
scala> case class Cell(x: Int, y: Int) {
| override lazy val toString = {println("here"); "(%s, %s)".format(x, y)}
| }
defined class Cell
scala> {val c = Cell(1, 2); (c.toString, c.toString)}
here
res0: (String, String) = ((1, 2),(1, 2))
Note that a def may not override a val -- you can only make members more stable in the sub class.

In the first snippet position will be calculated just once, on demand, [when|if] toString method is called. In the second snippet, toString body will be re-evaluated every time the method is called. Given that x and y cannot be changed, it's senseless, and toString value should be stored.

Case classes are, by definition, immutable. Any value returned by toString will itself be immutable, too. Thus it makes sense to essentially "cache" this value by utilizing a lazy val. On the other hand, the provided toString implementation does little more than the default toString provided by all case classes. I would not be surprised if a vanilla case class toString used a lazy val underneath.

Looks like a micro-optimization to me. JVM is able enough to take care of such cases.

Related

why use a method with no parameter lists over a val

I came across this function in Scala def nullable: Boolean = true. I understand what does this function do, but I want to know is there specific name for this kind of function, and what's the motivation not using var
Firstly, I would be very precise in scala: use the word Function to only ever mean an instance of FunctionN and use the word Method when talking about a def (which may have zero or more parameter lists). Secondly, this most definitely does have a body (albeit not enclosed in braces). Its body is the expression true (i.e. a boolean literal).
I assume that you really mean to ask: "why use a method with no parameter lists over a val?"
When deciding whether to represent some property of your class, you can choose between a method and a value (advice: avoid using var). Often, if the property involves no side effects, we can use a def with no parameter lists (the scala idiom is that a def with a single, empty parameter list implies side-effects).
Hence we may choose any of the following, all of which are semantically equivalent at the use-site (except for performance characteristics):
case class Foo(s: String) {
//Eager - we calculate and store the value regardless of whether
// it is ever used
val isEmpty = s.isEmpty
}
case class Foo(s: String) {
//Lazy - we calculate and store the value when it
// it is first used
lazy val isEmpty = s.isEmpty
}
case class Foo(s: String) {
//Non-strict - we calculate the value each time
// it is used
def isEmpty = s.isEmpty
}
Hence we might take the following advice
If the value is computationally expensive to calculate and we are sure we will use it multiple times, use val
If the value is computationally expensive and we may use it zero or many times, use lazy val
If the value is space-expensive and we think it will be generally used a most once, use def
However, there is an additional consideration; using a val (or lazy val) is likely to be of benefit to debugging using an IDE which generally can show you in an inspection window the value of any in-scope vals
The primary difference of the use of def or var/val is the when the value will be executed.
the def defines a name for a value, the value on the right will be executed when it is called (called by name), meaning it is lazy
and var defines a name for a value, and it is execute it's evaluated, eagerly upon definition

When does it make sense to use implicit parameters in Scala, and what may be alternative scala idioms to consider?

Having used a Scala library that liberally exposes the reliance on implicits to the caller, I had experienced friction around this mechanism, as Scala makes it quite hard at times to debug implicit arguments, and because there's quite a bunch of places Scala would fill in values for implicit arguments from. (I could almost relate to it as "implicits hell" at one time).
At one time in my coding, Scala "complained" an implicit value could not be matched whereas in fact there was a "collision" of implicit values each coming from a different import.
Regardless of that perceived brittleness, it may at times feel borderline to an abuse of the context design pattern.
Why does it make sense to have implicit parameters in Scala?
In what scenarios would you use them and how would you avoid trouble?
As I'm not sure the experimentation-curve and potential for other team members getting totally confused are worth it, could you possibly suggest other scala idioms for sharing context between a multitude of Scala functions?
This questions is not for a specific implementation at hand, hopefully it's still a good fit for this site.
Generally, using a common type as an implicit parameter is a bad idea.
def badIdea(n: Int)(implicit s: String) = s * n
It doesn't take much to imagine why: you'll get conflicting implicits for the same thing if anyone else adopts this policy. Better to avoid it.
But who really wants to manually stuff in a scala.concurrent.ExecutionContext manually every time it's needed (which is practically everywhere)?
So the key is: when you have something with a specialized type, especially if it's bookkeeping that might need to be overridden manually but mostly should just do the right thing, then use implicit parameters. (This usually covers type classes as well.)
Then what do you do if you really need a string? Well, wrap it (at least formally--here it's a value class so in some contexts it will just pass the string around):
class MyWrappedString(val underlying: String) extends AnyVal {}
implicit val myString = new MyWrappedString("bird")
def decentIdea(n: Int)(implicit mws: MyWrappedString) = mws.underlying * n
scala> decentIdea(2) // In the bush?
res14: String = birdbird
Or if you think some additional logic is helpful, write a wrapper that takes an extra type parameter:
class ImplicitWithValue[K,V](val value: V) {
// Any extra generic logic goes here
}
object ImplicitWithValue {
class ValuePart[K] {
def apply[V](v: V) = new ImplicitWithValue[K,V](v)
}
private val genericValuePart = new ValuePart[Any]
private def typedValuePart[K] = genericValuePart.asInstanceOf[ValuePart[K]]
def apply[K] = typedValuePart[K]
}
Then you can
trait Marker1
implicit val implicit1 = ImplicitWithValue[Marker1]("fish")
def goodIdea(n: Int)(implicit ms: ImplicitWithValue[Marker1, String]) = ms.value * n
scala> goodIdea(3)
res17: String = fishfishfish

Simple example of extending a Scala collection

I'm looking for a very simple example of subclassing a Scala collection. I'm not so much interested in full explanations of how and why it all works; plenty of those are available here and elsewhere on the Internet. I'd like to know the simple way to do it.
The class below might be as simple an example as possible. The idea is, make a subclass of Set[Int] which has one additional method:
class SlightlyCustomizedSet extends Set[Int] {
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Obviously this is wrong. One problem is that there's no constructor to put things into the Set. A CanBuildFrom object must be built, preferably by calling some already-existing library code that knows how to build it. I've seen examples that implement several additional methods in the companion object, but they're showing how it all works or how to do something more complicated. I'd like to see how to leverage what's already in the libraries to knock this out in a couple lines of code. What's the smallest, simplest way to implement this?
If you just want to add a single method to a class, then subclassing may not be the way to go. Scala's collections library is somewhat complicated, and leaf classes aren't always amenable to subclassing (one might start by subclassing HashSet, but this would start you on a journey down a deep rabbit hole).
Perhaps a simpler way to achieve your goal would be something like:
implicit class SetPimper(val s: Set[Int]) extends AnyVal {
def findOdd: Option[Int] = s.find(_ % 2 == 1)
}
This doesn't actually subclass Set, but creates an implicit conversion that allows you to do things like:
Set(1,2,3).findOdd // Some(1)
Down the Rabbit Hole
If you've come from a Java background, it might be surprising that it's so difficult to extend standard collections - after all the Java standard library's peppered with j.u.ArrayList subclasses, for pretty much anything that can contain other things. However, Scala has one key difference: its first-choice collections are all immutable.
This means that they don't have add methods that modify them in-place. Instead, they have + methods that construct a new instance, with all the original items, plus the new item. If they'd implemented this naïvely, it'd be very inefficient, so they use various class-specific tricks to allow the new instances to share data with the original one. The + method may even return an object of a different type to the original - some of the collections classes use a different representation for small or empty collections.
However, this also means that if you want to subclass one of the immutable collections, then you need to understand the guts of the class you're subclassing, to ensure that your instances of your subclass are constructed in the same way as the base class.
By the way, none of this applies to you if you want to subclass the mutable collections. They're seen as second class citizens in the scala world, but they do have add methods, and rarely need to construct new instances. The following code:
class ListOfUsers(users: Int*) extends scala.collection.mutable.HashSet[Int] {
this ++= users
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Will probably do more-or-less what you expect in most cases (map and friends might not do quite what you expect, because of the the CanBuildFrom stuff that I'll get to in a minute, but bear with me).
The Nuclear Option
If inheritance fails us, we always have a nuclear option to fall back on: composition. We can create our own Set subclass that delegates its responsibilities to a delegate, as such:
import scala.collection.SetLike
import scala.collection.mutable.Builder
import scala.collection.generic.CanBuildFrom
class UserSet(delegate: Set[Int]) extends Set[Int] with SetLike[Int, UserSet] {
override def contains(key: Int) = delegate.contains(key)
override def iterator = delegate.iterator
override def +(elem: Int) = new UserSet(delegate + elem)
override def -(elem: Int) = new UserSet(delegate - elem)
override def empty = new UserSet(Set.empty)
override def newBuilder = UserSet.newBuilder
override def foreach[U](f: Int => U) = delegate.foreach(f) // Optional
override def size = delegate.size // Optional
}
object UserSet {
def apply(users: Int*) = (newBuilder ++= users).result()
def newBuilder = new Builder[Int, UserSet] {
private var delegateBuilder = Set.newBuilder[Int]
override def +=(elem: Int) = {
delegateBuilder += elem
this
}
override def clear() = delegateBuilder.clear()
override def result() = new UserSet(delegateBuilder.result())
}
implicit object UserSetCanBuildFrom extends CanBuildFrom[UserSet, Int, UserSet] {
override def apply() = newBuilder
override def apply(from: UserSet) = newBuilder
}
}
This is arguably both too complicated and too simple at the same time. It's far more lines of code than we meant to write, and yet, it's still pretty naïve.
It'll work without the companion class, but without CanBuildFrom, map will return a plain Set, which may not be what you expect. We've also overridden the optional methods that the documentation for Set recommends we implement.
If we were being thorough, we'd have created a CanBuildFrom, and implemented empty for our mutable class, as this ensures that the handful of methods that create new instances will work as we expect.
But that sounds like a lot of work...
If that sounds like too much work, consider something like the following:
case class UserSet(users: Set[Int])
Sure, you have to type a few more letters to get at the set of users, but I think it separates concerns better than subclassing.

Migrate from MurmurHash to MurmurHash3

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.
For instance, current code:
trait Foo {
type Bar
def id: Int
def path: Bar
override def hashCode = {
import util.MurmurHash._
var h = startHash(2)
val c = startMagicA
val k = startMagicB
h = extendHash(h, id, c, k)
h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
finalizeHash(h)
}
}
How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.
The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.
Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.
So minimally you'd have
override def hashCode = finalizeHash(mixLast(id, path.##), 0)
and "typically" you'd
// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")
// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)
Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.
Looking at the source code of MurmurHash3 suggests something like this:
override def hashCode = {
import util.hashing.MurmurHash3._
val h = symmetricSeed // I'm not sure which seed to use here
val h1 = mix(h, id)
val h2 = mixLast(h1, path ##)
finalizeHash(h2, 2)
}
or, in (almost) one line:
import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)

Pros and Cons of choosing def over val

I'm asking a slight different question than this one. Suppose I have a code snippet:
def foo(i : Int) : List[String] = {
val s = i.toString + "!" //using val
s :: Nil
}
This is functionally equivalent to the following:
def foo(i : Int) : List[String] = {
def s = i.toString + "!" //using def
s :: Nil
}
Why would I choose one over the other? Obviously I would assume the second has a slight disadvantages in:
creating more bytecode (the inner def is lifted to a method in the class)
a runtime performance overhead of invoking a method over accessing a value
non-strict evaluation means I could easily access s twice (i.e. unnecesasarily redo a calculation)
The only advantage I can think of is:
non-strict evaluation of s means it is only called if it is used (but then I could just use a lazy val)
What are peoples' thoughts here? Is there a significant dis-benefit to me making all inner vals defs?
1)
One answer I didn't see mentioned is that the stack frame for the method you're describing could actually be smaller. Each val you declare will occupy a slot on the JVM stack, however, the whenever you use a def obtained value it will get consumed in the first expression you use it in. Even if the def references something from the environment, the compiler will pass .
The HotSpot should optimize both these things, or so some people claim. See:
http://www.ibm.com/developerworks/library/j-jtp12214/
Since the inner method gets compiled into a regular private method behind the scene and it is usually very small, the JIT compiler might choose to inline it and then optimize it. This could save time allocating smaller stack frames (?), or, by having fewer elements on the stack, make local variables access quicker.
But, take this with a (big) grain of salt - I haven't actually made extensive benchmarks to backup this claim.
2)
In addition, to expand on Kevin's valid reply, the stable val provides also means that you can use it with path dependent types - something you can't do with a def, since the compiler doesn't check its purity.
3)
For another reason you might want to use a def, see a related question asked not so long ago:
Functional processing of Scala streams without OutOfMemory errors
Essentially, using defs to produce Streams ensures that there do not exist additional references to these objects, which is important for the GC. Since Streams are lazy anyway, the overhead of creating them is probably negligible even if you have multiple defs.
The val is strict, it's given a value as soon as you define the thing.
Internally, the compiler will mark it as STABLE, equivalent to final in Java. This should allow the JVM to make all sorts of optimisations - I just don't know what they are :)
I can see an advantage in the fact that you are less bound to a location when using a def than when using a val.
This is not a technical advantage but allows for better structuring in some cases.
So, stupid example (please edit this answer, if you’ve got a better one), this is not possible with val:
def foo(i : Int) : List[String] = {
def ret = s :: Nil
def s = i.toString + "!"
ret
}
There may be cases where this is important or just convenient.
(So, basically, you can achieve the same with lazy val but, if only called at most once, it will probably be faster than a lazy val.)
For a local declaration like this (with no arguments, evaluated precisely once and with no code evaluated between the point of declaration and the point of evaluation) there is no semantic difference. I wouldn't be surprised if the "val" version compiled to simpler and more efficient code than the "def" version, but you would have to examine the bytecode and possibly profile to be sure.
In your example I would use a val. I think the val/def choice is more meaningful when declaring class members:
class A { def a0 = "a"; def a1 = "a" }
class B extends A {
var c = 0
override def a0 = { c += 1; "a" + c }
override val a1 = "b"
}
In the base class using def allows the sub class to override with possibly a def that does not return a constant. Or it could override with a val. So that gives more flexibility than a val.
Edit: one more use case of using def over val is when an abstract class has a "val" for which the value should be provided by a subclass.
abstract class C { def f: SomeObject }
new C { val f = new SomeObject(...) }