#volatile usage unclear - sending an object with a `var` to another thread

#volatile usage unclear - sending an object with a `var` to another thread - scala

I am not sure I use #volatile correctly here. I have a buffer, like this:
final class BufD(val buf: Array[Double], #volatile var size: Int)
Which is sent between processes, whereby it might cross thread boundaries. The sender may update the size field just before sending it out. Therefore I want to make sure that the receiver under no circumstances can see a stale size value here. First question: Does #volatile ensure this or is it redundant?
Now I am introducing a trait:
trait BufLike {
#volatile var size: Int
}
final class BufD(val buf: Array[Double], #volatile var size: Int) extends BufLike
This gives me a compiler warning:
Warning:(6, 4) no valid targets for annotation on method size - it is discarded unused. You may specify targets with meta-annotations, e.g. #(volatile #getter)
#volatile var size: Int
^
Second question: Should I remove the #volatile here or change it in a different way?

I assume thread-A creates, updates, then passes the object-X to thread-B. If object-X and whatever it refers to directly or transitively (fields) are no further updated by thread-A, then volatile is redundant. The consistency of the object-X state at the receiving thread is guaranteed by JVM.
In other words, if logical ownership for object-X is passed from thread-A to thread-B, then volatile doesn't make sense. Conversely, on modern multicore systems, the performance implications of volatile can be more than that of thread-local garbage left by immutable case classes.
If object-X is supposed to be shared for writing, making a field volatile will help to share its value, but you will face another problem: non-atomic updates on the object-X, if fields' values depend on each other.
As #alf pointed out, to benefit from happens-before guarantees, the objects must be passed safely! This can be achieved using java.util.concurrent.** classes. High level constructs like Akka define their own mechanisms of "passing" objects safely.
References:
https://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html

As #tair points out, the real solution to your problem is to use an immutable case class:
The sender may update the size field just before sending it out.
It seems that receiver does not update the size; neither does sender update the size after if has already sent the BufD out. So for all practical reasons, recipient is much better off receiving an immutable object.
As for #volatile, it ensures visibility—the writes are indeed hitting the main memory instead of being cached in the thread local cache, and the reads include a memory barrier to ensure that the value read is not stale.
Without #volatile, the recipient thread is free to cache the value (it's not volatile, hence it should not be changed from the other thread, hence it's safe to cache) and re-use it instead of referring to the main memory. (SLS 11.2.1, JLS §8.3.1.4)
#volatile Marks a field which can change its value outside the control of the program; this is equivalent to the volatile modifier in Java.
and
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
The problem here is that either you don't need all that as the object is effectively immutable (and you're better off with a properly immutable one), or you want to see coordinated changes in buf and size on the recipient size. In the latter case, #volatile may be useful (while fragile), if writer appends (not overwrites!) to buf, and then updates the size. In this case, write to buf happens-before write to size, which in turn happens-before reader can read the updated value from size (by volatility), therefore if reader checks and re-checks the size, and writer only appends, you're probably fine. Having said that, I would not use this design.
As for the warning, it all compiles to Java, i.e. JVM, bytecode, and volatile is a JVM flag for fields. Traits cannot define a field—they only define methods, and it's up to the extending class to decide whether it'll be a proper variable or (a pair of) methods (SLS 4.2).
A variable declaration var x: T is equivalent to the declarations of both a getter function x and a setter function x_=:
def x: T
def x_= (y: T): Unit
A function cannot be #volatile, hence the warning.

Related

Is a class instance that updates only a single var thread safe in Scala?

I need to cache something in Scala in a multi-threaded environment.
Reading up on scalaz's Memo I found the following comment in the code for the immutable hash map memo:
As this memo uses a single var, it's thread-safe.
The code looks like this:
def immutableMapMemo[K, V](m: Map[K, V]): Memo[K, V] = {
var a = m
memo[K, V](f =>
k => {
a get k getOrElse {
val v = f(k)
a = a updated (k, v)
v
}
})
}
Saying that this is thread safe goes against what I have read and learned so far about thread-safety on the JVM-platform; Reference updates may be atomic, but as I have understood it, the compiler may try do certain optimisations that upsets the happens-before relationship if you don't have a memory barrier. See for instance this post and this.
But I'm sure the scalaz folks are pretty smart. Maybe there's something special about the scope of a.
Is what the comment claims true, and if so, why?

First of all, since the var is not marked #volatile, you might see different versions of a in different threads. So you might do a calculation multiple times on different threads. This kind of defeats the purpose of memoization, but other than that it does not cause any harm, provided that the function being memoized is without side-effects.
Also, on the x86 architecture you will almost always see changes done on one thread on all other threads.
Regarding internal consistency of the map: As far as I know, in this case it is not possible to observe the map stored in a in an inconsistent state, because Map is not just observably immutable, but all versions of Map (Map1, Map2, Map3, Map4, HashMap1, HashTrieMap, HashMapCollision1, EmptyMap) have only final fields and are therefore safe according to the java memory model. However, relying on this is extremely fragile.
For example if a would contain a List or a Vector, you would be able to observe it in an inconsistent state when quickly updating it from different threads. The reason for this is that these data structures are observably immutable, but do use mutable state internally for performance optimization.
So bottom line: don't rely on this for memoization in a multithreaded context.
See this thread on scala-user for a discussion of a very similar problem
See this thread for why even basic observably immutable data structures such as List and Vector can be observed in an inconsistent state unless using safe publishing via #volatile or another safe mechanism such as actors.

Hashcode doesn't change between reruns

object Main extends App {
var a = new AnyRef()
println(a hashCode)
}
I have this code in Intellij Idea. I noticed that hashcode does not change between reruns. Even more, it doesn't change if I restart idea, or do some light modifications to the code. I can rename variable a or add a few more variables and I still have the same hashcode.
Is it cached somewhere? Or it's just OS who allocated the same address to a variable? Any consequences of this?
I'd expect it to be new each time, as OS should allocate new address each run.

The implementation for Object.hashCode() can vary between JVMs as long as it obeys the contract, which doesn't require the numbers to be different between runs. For HotSpot there is even an option (-XX:hashCode) to change the implementation.
HotSpot's default is to use a random number generator, so if you are using that (with no -XX:hashCode option) then it seems it uses the same seed on each run, resulting in the same sequence of hash codes. There's nothing wrong with that.
lmm's answer is not correct unless maybe if you are using HotSpot with -XX:hashCode=4 or another JVM that uses this technique by default. But I'm not at all certain about that (you can try yourself by using HotSpot with -XX:hashCode=4 and see if you get another value which also stays the same between runs).
Check out the code for the different options:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/runtime/synchronizer.cpp#l555
There is a comment in there about making the "else" branch the default, which is the Xorshift pattern, which is indeed a pseudo-random number generator which will always provide the same sequence.
The answer from "apangin" on this question says that indeed this has become the default since JDK8 which explains the change from JDK7 you described in your comment.
I can confirm that this is correct, look at the JDK8 source:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/globals.hpp#l1127
--> Default value is now 5, which corresponds to the "else" branch (Xorshift).

Some experiment:
scala> class A extends AnyRef
defined class A
scala> val a1= new A
a1: A = A#5f6b1f19
scala> val a2 = new A
a2: A = A#d60aa4
scala> a1.hashCode
res19: Int = 1600855833
scala> a2.hashCode
res20: Int = 14027428
scala> val a3 = new AnyRef
a3: Object = java.lang.Object#16c3388e
scala> a3.hashCode
res21: Int = 381892750
So, it's obvious AnyRef hash code is equal to address of object. If we have equal hashes it's mean object address is the same on every rerun. And that is true for me with two repls.
API tells about AnyRef hashCode method:
The hashCode method for reference types. See hashCode in scala.Any.
And about Any method:
Calculate a hash code value for the object.
The default hashing algorithm is platform dependent.
I guess that platform determines location of object and therefore value of hashCode.

Any new process gets its own virtual address space from the OS. So while the process might exist at a different physical address each time the program runs, it will be mapped to the same virtual address each time. (ASLR exists, but I understand the JVM doesn't participate in it). You can see this with e.g. a small C program with a string constant in (you might have to deliberately disable ASLR for that program) - if you take a pointer to the string constant and print that pointer as an integer, it will be the same value every time.

hashCode() is not a random number. It is a digested result from analyzing some part of an object. Objects with the same values will, more than likely, have the same hash code. This is true for your case, since the "value" of an AnyRef with no fields is essentially empty.

def vs lazy val in case class

I have a DAO object which I defined as a case class.
case class StudentDAO(id: Int) {
def getGPA: Double = // Expensive database lookup goes here
def getRank: Int = // Another expensive database operation and computation goes here
def getScoreCard: File = // Expensive file lookup goes here
}
I would naturally make getGPA and getRank and getScoreCard defs and not vals because I don't want them to be computed before they may be used.
What would be the performance impact if I marked these methods as lazy vals instead of defs? The reason I want to make them lazy vals is: I do not want to recompute the rank each time for a Student with id "i".
I am hoping that this will not be marked as duplicate because there are several questions as below which are mostly about differences:
When to use val, def, and lazy val in Scala?
def or val or lazy val for grammar rules?
`def` vs `val` vs `lazy val` evaluation in Scala
Scala Lazy Val Question
This question is mainly aimed towards the expenses (tradeoffs between CPU vs. memory) in making a method a lazy val for costly operations and what would one suggest over other and why?
EDIT: Thank you for the comment #om-nom-nom. I should have been more clear with what I was looking for.
I read here:
Use of lazy val for caching string representation
that string representation of the object is cached (see #Dave Griffith's answer). More precisely I am looking at the impact of Garbage Collection if I made it a lazy val instead of def

Seems pretty straightforward to me:
I don't want them to be computed before they may be
used.
[...]
I do not want to recompute the rank each time for a Student with id "i".
Then use lazy val and that's it.
def is used when the value may change for each call, typically because you pass parameters, val won't change but will be computed right away.

A lazy val for an "ordinary" reference type (e.g., File) has the effect of creating a strong reference the first time it is evaluated. Thus, while it will avoid re-evaluations of an unchanging value, it has the obvious cost of keeping the computed value in memory.
For primitive values (or even lightweight objects, like File), this memory cost usually isn't much of an issue (unless you're holding lots of Student objects in memory). For a heavy reference, though (e.g., a large data structure), you might be better off using a weak reference, some other caching approach, or just computing the value on-demand.

Scala lazy collection growth

This question is a bit more theoretical.
I have an object which holds a private mutable list or map, whose growth is append only. I believe I could argue that the object itself is functional, being functionally transparent and to all appearances, immutable. So for example I have
import scala.collection.mutable.Map
case class Foo(bar:String)
object FooContainer {
private val foos = Map.empty[String, Foo]
def getFoo(fooName:String) = foos.getOrElseUpdate(fooName, Foo(fooName))
}
I could do a similar case with lists. Now I imagine to be truly functional, I would need to ensure thread safety, which I suppose I could do simply using locks, synchronise or atomic references. My questions being is this all necessary & sufficient, is it a common practice, are there established patterns for this behaviour?

I would say that "functional" is really the wrong term. The exact example shown above could be called "pure" in the sense that its output is only ever defined by its input, thus not having any (visible) side-effects. In reality it is impure though, since it has hidden internal state.
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices. Wikipedia
The apparent impurity vanishes though, if you make the Foo class mutable:
case class Foo(bar:String) {
private var mutable:Int = 1
def setFoo(x:Int) = mutable = x
def foo = mutable
}
The mutability of the Foo class results in getFoo being impure:
scala> FooContainer.getFoo("test").foo
res5: Int = 1
scala> FooContainer.getFoo("bla").setFoo(5)
scala> FooContainer.getFoo("bla").foo
res7: Int = 5
In order for the function to be apparently pure, FooContainer.getFoo("bla").foo must always return the same value. Therefore, the "purity" of the described construct is fragile at best.

As a general rule, you're better off with a var holding an immutable collection type which is replaced when updated than you are with a val holding a mutable type. The former never deals in values that may change after being created (other than the class which holds the evolving value itself). Then you only need to be moderately careful about when updated values held by the var are published to it, but since updating a reference is generally atomic (apart from things like storing a long or a double to RAM on a 32-bit machine), there's little risk.

val-mutable versus var-immutable in Scala

Are there any guidelines in Scala on when to use val with a mutable collection versus using var with an immutable collection? Or should you really aim for val with an immutable collection?
The fact that there are both types of collection gives me a lot of choice, and often I don't
know how to make that choice.

Pretty common question, this one. The hard thing is finding the duplicates.
You should strive for referential transparency. What that means is that, if I have an expression "e", I could make a val x = e, and replace e with x. This is the property that mutability break. Whenever you need to make a design decision, maximize for referential transparency.
As a practical matter, a method-local var is the safest var that exists, since it doesn't escape the method. If the method is short, even better. If it isn't, try to reduce it by extracting other methods.
On the other hand, a mutable collection has the potential to escape, even if it doesn't. When changing code, you might then want to pass it to other methods, or return it. That's the kind of thing that breaks referential transparency.
On an object (a field), pretty much the same thing happens, but with more dire consequences. Either way the object will have state and, therefore, break referential transparency. But having a mutable collection means even the object itself might lose control of who's changing it.

If you work with immutable collections and you need to "modify" them, for example, add elements to them in a loop, then you have to use vars because you need to store the resulting collection somewhere. If you only read from immutable collections, then use vals.
In general, make sure that you don't confuse references and objects. vals are immutable references (constant pointers in C). That is, when you use val x = new MutableFoo(), you'll be able to change the object that x points to, but you won't be able to change to which object x points. The opposite holds if you use var x = new ImmutableFoo(). Picking up my initial advice: if you don't need to change to which object a reference points, use vals.

The best way to answer this is with an example. Suppose we have some process simply collecting numbers for some reason. We wish to log these numbers, and will send the collection to another process to do this.
Of course, we are still collecting numbers after we send the collection to the logger. And let's say there is some overhead in the logging process that delays the actual logging. Hopefully you can see where this is going.
If we store this collection in a mutable val, (mutable because we are continuously adding to it), this means that the process doing the logging will be looking at the same object that's still being updated by our collection process. That collection may be updated at any time, and so when it's time to log we may not actually be logging the collection we sent.
If we use an immutable var, we send an immutable data structure to the logger. When we add more numbers to our collection, we will be replacing our var with a new immutable data structure. This doesn't mean collection sent to the logger is replaced! It's still referencing the collection it was sent. So our logger will indeed log the collection it received.

I think the examples in this blog post will shed more light, as the question of which combo to use becomes even more important in concurrency scenarios: importance of immutability for concurrency. And while we're at it, note the preferred use of synchronised vs #volatile vs something like AtomicReference: three tools

var immutable vs. val mutable
In addition to many excellent answers to this question. Here is a simple example, that illustrates potential dangers of val mutable:
Mutable objects can be modified inside methods, that take them as parameters, while reassignment is not allowed.
import scala.collection.mutable.ArrayBuffer
object MyObject {
def main(args: Array[String]) {
val a = ArrayBuffer(1,2,3,4)
silly(a)
println(a) // a has been modified here
}
def silly(a: ArrayBuffer[Int]): Unit = {
a += 10
println(s"length: ${a.length}")
}
}
Result:
length: 5
ArrayBuffer(1, 2, 3, 4, 10)
Something like this cannot happen with var immutable, because reassignment is not allowed:
object MyObject {
def main(args: Array[String]) {
var v = Vector(1,2,3,4)
silly(v)
println(v)
}
def silly(v: Vector[Int]): Unit = {
v = v :+ 10 // This line is not valid
println(s"length of v: ${v.length}")
}
}
Results in:
error: reassignment to val
Since function parameters are treated as val this reassignment is not allowed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse