Scala - handling initialization of objects (_ vs Option[T]) - scala

I know there are multiple questions addressing related problems, but I'm not sure it does attack exactly what I'm looking for. I'm still new to Scala, after several years of Java development. I'm looking for the best way to test if an object has been initialized, and if not, initialize it then. For example, in Java:
private MyObject myObj = null;
and at some point in the future:
public void initMyObj(){
if (myObj == null){
myObj = new MyObj();
}
// do something with myObj
}
After this, I might reassign myObj to a different object, but it is unlikely. In Scala, I have this:
class Test {
var myObj: MyObj = _
}
I've read that I could use Option instead, something like:
var myObj = None : Option[MyObj]
and then my check:
myObj match {
case None => ...
case Some(value) => ...
}
but it feels ackward to use this pattern when I might not make this kind of check anywhere else at any other time - though being so new to Scala, I might be wrong. Is this the best way to achieve what I want or is there any other option not involving Option?

It is not generally ideal practice in Scala to leave partially-constructed objects lying around. You would normally rethink how your objects were getting instantiated to see if you can't use a different pattern that is less fragile. For example, instead of setting uninitialized variables in methods:
class Foo { var a: String = null; var b: String = null }
def initFooA(s: String, f: Foo) { if (f.a == null) f.a = s }
def initFooB(s: String, f: Foo) { if (f.b == null) f.b = s }
f
initFooA("salmon", f)
// Do stuff
initFooB("herring", f)
you would attempt to restructure your code to generate the values you need on demand, and delay the instantiation of foo until then:
case class Bar(a: String, b: String) {}
def initBarA(s: String) = s
def initBarB(s: String) = s
val iba = initBarA("halibut")
// Do stuff
val ibb = initBarB("cod")
Bar(iba, ibb)
Because Scala has easy access to tuples (and type inference), this can be a lot less painful than in Java.
Another thing you can do is defer the late initialization to someone else.
case class Baz(a: String)(bMaker: => String) {
lazy val b = bMaker
}
Now you pass in something that will make parameter b, and arrange for it to handle any late initialization stuff that needs to be handled. This doesn't always avoid needing to set vars, but it can help push it out of your class code into your initialization logic (which is usually a better place for it).
Doing this with vars is a little less straightforward. Realistically, you're probably best off just devoting a class to it e.g. by:
class LazyVar[A](initial: => A) {
private[this] var loaded = false
private[this] var variable: A = _
def apply() = { if (!loaded) { loaded = true; variable = initial }; variable }
def update(a: A) { loaded = true; variable = a }
}
where you then (sadly) have to use () on every read and write.
scala> val lv = new LazyVar({ println("Hi!"); 5 })
lv: LazyVar[Int] = LazyVar#2626ea08
scala> lv()
Hi!
res2: Int = 5
scala> lv() = 7
scala> lv()
res4: Int = 7
Then you use an instance of this class instead of the actual var and pass through the lazy initializer. (lazy val is very much like this under the hood; the compiler just protects you from noticing.)
Finally, if you want to have a fully-functional object that is occasionally missing a value, var x: Option[X] is the construct you want to use; if you can't find a way around the standard Java creation patterns (and you don't want to try something more exotic like objects that create each other with more and more information, either because performance is critical and you can't afford it, or you dislike writing that much boilerplate to allow type-checking to verify that your object is properly created) but you otherwise want to use it, var x: X = null is what I'd choose, not _. If X is a primitive, you probably need to choose the correct value wisely anyway (for example, Double.NaN instead of 0.0, -1 rather than 0 for Int) to indicate I-am-not-initialized. If it's generic code and you want Any instead of AnyRef, asInstanceOf-ing back and forth between Any and AnyRef is probably the best way out of poorly typechecked situation (assuming you really, really can't use Option, which at that point is much clearer).

Maybe a lazy variable is what you need.
lazy val myObj: MyObj = //here you put the object creation code
In this way the object creation is postponed to the first time the code tries to access it.

Related

Why do each new instance of case classes evaluate lazy vals again in Scala?

From what I have understood, scala treats val definitions as values.
So, any instance of a case class with same parameters should be equal.
But,
case class A(a: Int) {
lazy val k = {
println("k")
1
}
val a1 = A(5)
println(a1.k)
Output:
k
res1: Int = 1
println(a1.k)
Output:
res2: Int = 1
val a2 = A(5)
println(a1.k)
Output:
k
res3: Int = 1
I was expecting that for println(a2.k), it should not print k.
Since this is not the required behavior, how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once. Do I need some memoization technique or Scala can handle this on its own?
I am very new to Scala and functional programming so please excuse me if you find the question trivial.
Assuming you're not overriding equals or doing something ill-advised like making the constructor args vars, it is the case that two case class instantiations with same constructor arguments will be equal. However, this does not mean that two case class instantiations with same constructor arguments will point to the same object in memory:
case class A(a: Int)
A(5) == A(5) // true, same as `A(5).equals(A(5))`
A(5) eq A(5) // false
If you want the constructor to always return the same object in memory, then you'll need to handle this yourself. Maybe use some sort of factory:
case class A private (a: Int) {
lazy val k = {
println("k")
1
}
}
object A {
private[this] val cache = collection.mutable.Map[Int, A]()
def build(a: Int) = {
cache.getOrElseUpdate(a, A(a))
}
}
val x = A.build(5)
x.k // prints k
val y = A.build(5)
y.k // doesn't print anything
x == y // true
x eq y // true
If, instead, you don't care about the constructor returning the same object, but you just care about the re-evaluation of k, you can just cache that part:
case class A(a: Int) {
lazy val k = A.kCache.getOrElseUpdate(a, {
println("k")
1
})
}
object A {
private[A] val kCache = collection.mutable.Map[Int, Int]()
}
A(5).k // prints k
A(5).k // doesn't print anything
The trivial answer is "this is what the language does according to the spec". That's the correct, but not very satisfying answer. It's more interesting why it does this.
It might be clearer that it has to do this with a different example:
case class A[B](b: B) {
lazy val k = {
println(b)
1
}
}
When you're constructing two A's, you can't know whether they are equal, because you haven't defined what it means for them to be equal (or what it means for B's to be equal). And you can't statically intitialize k either, as it depends on the passed in B.
If this has to print twice, it would be entirely intuitive if that would only be the case if k depends on b, but not if it doesn't depend on b.
When you ask
how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once
that's a trickier question than it sounds. You make "the same parameters" sound like something that can be known at compile time without further information. It's not, you can only know it at runtime.
And if you only know that at runtime, that means you have to keep all past uses of the instance A[B] alive. This is a built in memory leak - no wonder Scala has no built-in way to do this.
If you really want this - and think long and hard about the memory leak - construct a Map[B, A[B]], and try to get a cached instance from that map, and if it doesn't exist, construct one and put it in the map.
I believe case classes only consider the arguments to their constructor (not any auxiliary constructor) to be part of their equality concept. Consider when you use a case class in a match statement, unapply only gives you access (by default) to the constructor parameters.
Consider anything in the body of case classes as "extra" or "side effect" stuffs. I consider it a good tactic to make case classes as near-empty as possible and put any custom logic in a companion object. Eg:
case class Foo(a:Int)
object Foo {
def apply(s: String) = Foo(s.toInt)
}
In addition to dhg answer, I should say, I'm not aware of functional language that does full constructor memoizing by default. You should understand that such memoizing means that all constructed instances should stick in memory, which is not always desirable.
Manual caching is not that hard, consider this simple code
import scala.collection.mutable
class Doubler private(a: Int) {
lazy val double = {
println("calculated")
a * 2
}
}
object Doubler{
val cache = mutable.WeakHashMap.empty[Int, Doubler]
def apply(a: Int): Doubler = cache.getOrElseUpdate(a, new Doubler(a))
}
Doubler(1).double //calculated
Doubler(5).double //calculated
Doubler(1).double //most probably not calculated

Declare a null var in Scala

I have read that null should not be used in scala.
How can I leave myVar uninitialized without the use of null?
class TestClass {
private var myVar: MyClass = null
}
I understand I can just make a dummy MyClass, that is never used in place of the null. But this can and does reduce the code's understandability.
As Rado has explained I can change null as shown below. I understand that I can now check to see if the variable is set during run-time, however, if I don't program that check then there is no benefit of using Option in this case.
Coming from Java, I feel there should be a way to simply leave the var uninitialized at compile-time and let it set during run-time without using the Option class, because as I mentioned above, if I don't code for the unset case then why use Option?
class TestClass {
private var myVar: Option[MyClass] = None
private def createVar() {
myVar = Some(new MyClass)
x: MyClass = myVar.get
}
}
I am thinking the only other way of doing what I am asking is:
class TestClass {
// Using dummy MyClass that will never be used.
private var myVar: MyClass = new MyClass
private def process(myVar: MyClass) {
this.myVar = myVar
myVar.useVarMethod()
}
}
The Scala way is to declare the variable as Option[MyClass]:
class TestClass {
private var myVar: Option[MyClass] = None
private def createVar() {
myVar = Some(new MyClass)
}
// Usage example:
def useMyVar(): Unit = {
myVar match {
case Some(myClass) => {
// Use myClass here ...
println(myClass.toString)
}
case None => // What to do if myVar is undefined?
}
}
}
That way you avoid NullPointerException. You make it explicit that the variable can be in undefined state. Everytime you use the myVar you have to define what to do if it is undefined.
http://www.scala-lang.org/api/current/index.html#scala.Option
I need myVar to be of type MyClass not Option[MyClass]. I see that I
could use Rado's updated answer and then use the get method, but is
there any other way?
When you use Option you can telling the compiler and everyone else who will read/use your code that it's okay not to define this value and the code will handle that condition and not fail at runtime.
The other way of dealing with is to do null checks every time before you access the variable because it could be null and therefore throw an exception at runtime.
When you use Option, the compiler will tell you if at compile time that you have not handled a condition where the value of a variable maybe undefined.
If you think about it, it's really a big deal. you have converted a runtime exception (which is deterministic) to a compile-time error.
If you want to extract the value out of something like an Option (which supports map and also flatMap), then you don't necessarily have to keep doing pattern matching on whether or not the Option contains a value (i.e. is a "Some") or not (i.e. is a "None").
Two methods are very useful - if you want just alter (or "map") the value within the Option then you can use the map method, which takes a function with a general type of:
f: A => B
so in your case at compile time would end up being:
f: MyClass => B
When you map an option, if the option is a "Some" then the contained value is passed through to the mapping function, and the function is applied (to change the MyClass to a B if you like...) and the result is passed back wrapped in an Option. If your Option is a None, then you just get a None back.
Here's a simple example:
scala> case class MyClass(value : String)
defined class MyClass
scala> val emptyOption : Option[MyClass] = None
emptyOption: Option[MyClass] = None
scala> val nonEmptyOption = Some(new MyClass("Some value"))
nonEmptyOption: Option[MyClass] = Some(MyClass(Some value)
Try and extract the value from both option instances using map:
scala> nonEmptyOption map { s => s.value + " mapped" }
res10: Option[String] = Some(Some value mapped)
scala> emptyOption map { s => s.value }
res11: Option[String] = None
So basically, if you map across an empty option, you always get a None. Map across a non-empty Option and you can extract the contained value and transform it and get the result wrapped back in an Option. Notice that you don't need to explicitly check any patterns...the map method takes care of it.
Flatmap is a bit more challenging to explain, but it basically isn't massively different except that it takes a function which has type:
g: A => M[B]
which basically allows you to take the value out of the Option (the 'A' in the type signature above), do something to it (i.e. a computation) and then return the result ('B') - wrapped in another container such as another Option, List...
The same notion (across Option anyway) that if the Option is a None then nothing really happens still applies. flatMap and map form the basis of Scala "for comprehensions" which you can read about (and are done far more justice than I could!!) in lots of other places.

Generic synchronisation design

We are building some sync functionality using two-way json requests and this algorithm. All good and we have it running in prototype mode. Now I am trying to genericise the code, as we will be synching for several tables in the app. It would be cool to be able to define a class as "extends Synchable" and get the additional attributes and sync processing methods with a few specialisations/overrides. I have got this far:
abstract class Synchable [T<:Synchable[T]] (val ruid: String, val lastSyncTime: String, val isDeleted:Int) {
def contentEquals(Target: T): Boolean
def updateWith(target: T)
def insert
def selectSince(clientLastSyncTime: String): List[T]
def findByRuid(ruid: String): Option[T]
implicit val validator: Reads[T]
def process(clientLastSyncTime: String, updateRowList: List[JsObject]) = {
for (syncRow <- updateRowList) {
val validatedSyncRow = syncRow.validate[Synchable]
validatedSyncRow.fold(
valid = { result => // valid row
findByRuid(result.ruid) match { //- do we know about it?
case Some(knownRow) => knownRow.updateWith(result)
case None => result.insert
}
}... invalid, etc
I am new to Scala and know I am probably missing things - WIP!
Any pointers or suggestions on this approach would be much appreciated.
Some quick ones:
Those _ parameters you pass in and then immediately assign to vals: why not do it in one hit? e.g.
abstract class Synchable( val ruid: String = "", val lastSyncTime: String = "", val isDeleted: Int = 0) {
which saves you a line and is clearer in intent as well I think.
I'm not sure about your defaulting of Strings to "" - unless there's a good reason (and there often is), I think using something like ruid:Option[String] = None is more explicit and lets you do all sorts of nice monad-y things like fold, map, flatMap etc.
Looking pretty cool otherwise - the only other thing you might want to do is strengthen the typing with a bit of this.type magic so you'll prevent incorrect usage at compile-time. With your current abstract class, nothing prevents me from doing:
class SynchableCat extends Synchable { ... }
class SynchableDog extends Synchable { ... }
val cat = new SynchableCat
val dog = new SynchableDog
cat.updateWith(dog) // This won't end well
But if you just change your abstract method signatures to things like this:
def updateWith(target: this.type)
Then the change ripples down through the subclasses, narrowing down the types, and the compiler will omit a (relatively clear) error if I try the above update operation.

Scala case class copy with dynamic named parameter

For scala case class with number of parameters (21!!)
e.g. case class Car(type: String, brand: String, door: Int ....)
where type = jeep, brand = toyota, door = 4 ....etc
And there is a copy method which allow override with named parameter: Car.copy(brand = Kia)
where would become type = jeep, brand = Kia, door = 2...etc
My question is, is there anyway I can provide the named parameter dynamically?
def copyCar(key: String, name: String) = {
Car.copy("key" = "name") // this is something I make up and want to see if would work
}
Is scala reflection library could provide a help here?
The reason I am using copy method is that I don't want to repeat the 21 parameters assignment every time when I create a case class which only have 1 or 2 parameter changed.
Many Thanks!
FWIW, I've just implemented a Java reflection version: CaseClassCopy.scala. I tried a TypeTag version but it wasn't that useful; TypeTag was too restrictive for this purpose.
def copy(o: AnyRef, vals: (String, Any)*) = {
val copier = new Copier(o.getClass)
copier(o, vals: _*)
}
/**
* Utility class for providing copying of a designated case class with minimal overhead.
*/
class Copier(cls: Class[_]) {
private val ctor = cls.getConstructors.apply(0)
private val getters = cls.getDeclaredFields
.filter {
f =>
val m = f.getModifiers
Modifier.isPrivate(m) && Modifier.isFinal(m) && !Modifier.isStatic(m)
}
.take(ctor.getParameterTypes.size)
.map(f => cls.getMethod(f.getName))
/**
* A reflective, non-generic version of case class copying.
*/
def apply[T](o: T, vals: (String, Any)*): T = {
val byIx = vals.map {
case (name, value) =>
val ix = getters.indexWhere(_.getName == name)
if (ix < 0) throw new IllegalArgumentException("Unknown field: " + name)
(ix, value.asInstanceOf[Object])
}.toMap
val args = (0 until getters.size).map {
i =>
byIx.get(i)
.getOrElse(getters(i).invoke(o))
}
ctor.newInstance(args: _*).asInstanceOf[T]
}
}
It is not possible using case classes.
Copy method generated at compile time and named parameters handled on compile time to. There is no possibility to do it ar runtime.
Dynamic may help to solve your issue: http://hacking-scala.tumblr.com/post/49051516694/introduction-to-type-dynamic
Yes, you would need to use reflection to do that.
It is a bit involved, because copy is a synthetic method and you'll have to invoke the getters for all fields except the one you want to replace.
To give you an idea, the copy method in this class does exactly that, except using an argument index instead of name. It calls the companion object's apply method, but the effect is the same.
I'm a bit confused - how is the following not what you need?
car: Car = ... // Retrieve an instance of Car somehow.
car.copy(type = "jeep") // Copied instance, only the type has been changed.
car.copy(door = 4) // Copied instance, only the number of doors has changed.
// ...
Is it because you have a lot of parameters for the initial instance creation? In that case, can you not use default values?
case class Car(type: String = "Jeep", door: Int = 4, ...)
You seem to know about both these features and feel that they don't fit your need - could you explain why?

Why are the Scala libraries implemented with mutable state?

Why are some methods in Scala's standard libraries implemented with mutable state?
For instance, the find method as part of scala.Iterator class is implemented as
def find(p: A => Boolean): Option[A] = {
var res: Option[A] = None
while (res.isEmpty && hasNext) {
val e = next()
if (p(e)) res = Some(e)
}
res
}
Which could have been implemented as a #tailrec'd method, perhaps something like
def findNew(p: A => Boolean): Option[A] = {
#tailrec
def findRec(e: A): Option[A] = {
if (p(e)) Some(e)
else {
if (hasNext) findRec(next())
else None
}
}
if (hasNext) findRec(next())
else None
}
Now I suppose one argument could be the use of mutable state and a while loop could be more efficient, which is understandably very important in library code, but is that really the case over a #tailrec'd method?
There is no harm in having a mutable state as long as he is not shared.
In your example there is no way the mutable var could be accessed from outside, so it's not possible that this mutable variable change due to a side effect.
It's always good to enforce immutability as much as possible, but when performance matter there is nothing wrong in having some mutability as long as it's constrained in a safe way.
NOTE: Iterator is a data-structure which is not side-effect free and this could lead to some weird behavior, but this is an other story and in no way the reason for designing a method in such way. You'll find method like that in immutable data-structure too.
In this case the tailrec quite possibly has the same performance as the while loop. I would say that in this case the while loop solution is shorter and more concise.
But, iterators are a mutable abstraction anyway, so the gain of having a tail recursive method to avoid that var, which is local to that short code snippet, is questionable.
Scala is not designed for functional purity but for broadly useful capability. Part of this includes trying to have the most efficient implementations of basic library routines (certainly not universally true, but it often is).
As such, if you have two possible interfaces:
trait Iterator[A] { def next: A }
trait FunctionalIterator[A] { def next: (A, FunctionalIterator[A]) }
and the second one is awkward and slower, it's quite sensible to choose the first.
When a functionally pure implementation is superior for the bulk of use cases, you'll typically find the functionally pure one.
And when it comes to simply using a while loop vs. recursion, either one is easy enough to maintain so it's really up to the preferences of the coder. Note that find would have to be marked final in the tailrec case, so while preserves more flexibility:
trait Foo {
def next: Int
def foo: Int = {
var a = next
while (a < 0) a = next
a
}
}
defined trait Foo
trait Bar {
def next: Int
#tailrec def bar: Int = {
val a = next
if (a < 0) bar else a
}
}
<console>:10: error: could not optimize #tailrec annotated method bar:
it is neither private nor final so can be overridden
#tailrec def bar: Int = {
^
There are ways to get around this (nested methods, final, redirect to private method, etc.), but it tends to adds boilerplate to the point where the while is syntactically more compact.