How are values transferred across threads in scala futures - scala

How is num being accessed by the new thread. Future will execute on the new thread. how is the value of num that is in the stack frame of main thread accessible in the new thread? Where is it stored?
object TestFutureActor extends App {
var num = 10
val addNum = Future {
num = num + 2
num
}
}

It's question in generally about JVM memory model. Really you can read your variable value from different places. It can be in processor cache or in memory e.g.. Your parallel reads/writes will be non-synchronized in general. For example code below is absolutely legal for JVM.
var a, b = 0
Thread 1 | Thread 2
a = 1 | val c = b // c = 2
b = 2 | val d = a // d = 0
If you need to synchronize your actions with variable you need to use happens-before synchronization between them.
It's very deep question, I think you should read some articles about Java memory model for deeper understanding of this.
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
https://shipilev.net/blog/2014/jmm-pragmatics/

Related

Dot product in Scala without heap allocation

I have a Scala project with some intensive arithmetics, and it sometimes allocates Floats faster than the GC can clean them up. (This is not about memory leaks caused by retained references, just fast memory consumption for temporary values.) I try to use Arrays with primitive types, and reuse them when I can, but still some new allocations sneak in.
One piece that puzzles me, for instance:
import org.specs2.mutable.Specification
class CalcTest extends Specification {
def dot(a: Array[Float], b: Array[Float]): Float = {
require(a.length == b.length, "array size mismatch")
val n = a.length
var sum: Float = 0f
var i = 0
while (i < n) {
sum += a(i) * b(i)
i += 1
}
sum
}
val vector = Array.tabulate(1000)(_.toFloat)
"calculation" should {
"use memory sparingly" >> {
val before = Runtime.getRuntime().freeMemory()
for (i <- 0 to 1000000)
dot(vector, vector)
val after = Runtime.getRuntime().freeMemory()
(before - after) must be_<(1000L) // actual result above 4M
}
}
}
I would expect it to compute the dot products using only stack memory, but apparently it allocates about 4 bytes per call on the heap. This may not sound like much, but it adds up quickly in my code.
I was suspecting the sum, but from the bytecode output, looks like it is on the stack:
aload 1
arraylength
istore 3
fconst_0
fstore 4
iconst_0
istore 5
l2
iload 5
iload 3
if_icmpge l3
fload 4
aload 1
iload 5
faload
aload 2
iload 5
faload
fmul
fadd
fstore 4
iload 5
iconst_1
iadd
istore 5
_goto l2
l3
fload 4
freturn
Is it the return value that goes on the heap? Is there any way to avoid this overhead entirely? Is there a better way to investigate and solve such memory problems?
From the visualVM output for my project, I only see that i have an awful lot of Floats allocated. It is hard to track there small objects like that, being allocated rapidly. It is more useful for large objects and memory snapshots taken at long intervals.
Update:
I was so focused on the function code, I missed the problem in the test. If I rewrite it with a while loop, it succeeds:
var i = 0
while (i < 1000000) {
dot(vector, vector)
i += 1
}
I would still appreciate more ideas for other ways to debug this sort of issues, in addition to tests like this and using visualVM memory snapshots.
Range implementation in
for (i <- 0 to 1000000)
dot(vector, vector)
might use some memory, or just be slow enough to let JVM allocate something else in background and break the fragile measurement method used in the test.
Try to modify these lines into a while loop, for example.
(The original version of this post said that for() was equivalent to map(), which was wrong. It is equivalent to foreach() here because it does not have a yield clause.)

Different result returned using Scala Collection par in a series of runs

I have tasks that I want to execute concurrently and each task takes substantial amount of memory so I have to execute them in batches of 2 to conserve memory.
def runme(n: Int = 120) = (1 to n).grouped(2).toList.flatMap{tuple =>
tuple.par.map{x => {
println(s"Running $x")
val s = (1 to 100000).toList // intentionally to make the JVM allocate a sizeable chunk of memory
s.sum.toLong
}}
}
val result = runme()
println(result.size + " => " + result.sum)
The result I expected from the output was 120 => 84609924480 but the output was rather random. The returned collection size differed from execution to execution. Most of the time there was missing count even though all the futures were executed looking at the console. I thought flatMap waits the parallel executions in map to complete before returning the complete. What should I do to always get the right result using par? Thanks
Just for the record: changing the underlying collection in this case shouldn't change the output of your program. The problem is related to this known bug. It's fixed from 2.11.6, so if you use that (or higher) Scala version, you should not see the strange behavior.
And about the overflow, I still think that your expected value is wrong. You can check that the sum is overflowing because the list is of integers (which are 32 bit) while the total sum exceeds the integer limits. You can check it with the following snippet:
val n = 100000
val s = (1 to n).toList // your original code
val yourValue = s.sum.toLong // your original code
val correctValue = 1l * n * (n + 1) / 2 // use math formula
var bruteForceValue = 0l // in case you don't trust math :) It's Long because of 0l
for (i ← 1 to n) bruteForceValue += i // iterate through range
println(s"yourValue = $yourValue")
println(s"correctvalue = $correctValue")
println(s"bruteForceValue = $bruteForceValue")
which produces the output
yourValue = 705082704
correctvalue = 5000050000
bruteForceValue = 5000050000
Cheers!
Thanks #kaktusito.
It worked after I changed the grouped list to Vector or Seq i.e. (1 to n).grouped(2).toList.flatMap{... to (1 to n).grouped(2).toVector.flatMap{...

What is the difference between Reactive programming and plain old closures?

Example from scala.rx:
import rx._
val a = Var(1); val b = Var(2)
val c = Rx{ a() + b() }
println(c()) // 3
a() = 4
println(c()) // 6
How is the above version better than:
var a = 1; var b = 2
def c = a + b
println(c) // 3
a = 4
println(c) // 6
The only thing I can think of is that the first example is efficient in the sense that unless a or b changes, c is not recalculated but in my version, c is recomputed every time I invoke c() but that is just a special case of memoization with size=1 e.g. I can do this to prevent re-computations using a memoization macro:
var a = 1; var b = 2
#memoize(maxSize = 1) def c(x: Int = a, y: Int = z) = x + y
Is there anything that I am missing to grok about reactive programming that provides insight into why it might be a better paradigm (than memoized closures) in certain cases?
Problem: It's a bad example
The example on the web page doesn't illustrate the purpose of Scala.RX very well. In that sense it is a quite bad example.
What is Scala.RX for?
It's about notifications
The idea of Scala.Rs is that a piece of code can get notifications, when data changes. Usually the this notification is used to (re-)calculate a result that depends on the changed data.
Scala.RX automates the wiring
When the calculation goes over multiple stages, it becomes quite hard to track which intermediate result depends on which data and on which other intermediate results. Additionally on must recalculate the intermediate results in the correct order.
You can think of this just like a big excel sheet which must of formulas that depend of each other. When you change one of the input values, Excel has to figure out, which parts of the sheet must be recalculated in which order. When Excel has re-calculated all the changed cells, it can update the display.
Scala.RX can do a similar thing than Excel: It tracks how the formulas depend on each other on notifies the ones that need to update in the correct order.
Purpose: MVC
Scala.RX is a nice tool to implement the MVC-pattern, especially when you have business applications that you could also bring to excel.
There is also a variant that works with Scala.js, i.e. that runs in the browser as part of a HTML site. This can be quite useful if you want to dynamically update parts of a HTML page according to changes on the server or edits of the user.
Limitations
Scala.RX doe not scale when you have a huge amounts of input data, e.g. operations on huge matrices.
A better example
import rx._
import rx.ops._
val a = Var(1); val b = Var(2)
val c: Rx[Int] = Rx{ a() + b() }
val o = c.foreach{value =>
println(s"c has a new value: ${value}")
}
a()=4
b()=12
a()=35
Gives you the following output:
c has a new value: 3
c has a new value: 6
c has a new value: 16
c has a new value: 47
Now imagine instead of printing the value, you will refresh controls in a UI or parts of a HTML page.

Martin Odersky : Working hard to keep it simple

I was watching the talk given by Martin Odersky as recommended by himself in the coursera scala course and I am quite curious about one aspect of it
var x = 0
async { x = x + 1 }
async { x = x * 2 }
So I get that it can give 2 if if the first statement gets executed first and then the second one:
x = 0;
x = x + 1;
x = x * 2; // this will be x = 2
I get how it can give 1 :
x = 0;
x = x * 2;
x = x + 1 // this will be x = 1
However how can it result 0? Is it possible that the statement don't execute at all?
Sorry for such an easy question but I'm really stuck at it
You need to think about interleaved execution. Remember that the CPU needs to read the value of x before it can work on it. So imagine the following:
Thread 1 reads x (reading 0)
Thread 2 reads x (reading 0)
Thread 1 writes x + 1 (writing 1)
Thread 2 writes x * 2 (writing 0)
I know this has already been answered, but maybe this is still useful:
Think of it as a sequence of atomic operations. The processor is doing one atomic operation at a time.
Here we have the following:
Read x
Write x
Add 1
Multiply 2
The following two sequences are guaranteed to happen in this order "within themselves":
Read x, Add 1, Write x
Read x, Multiply 2, Write x
However, if you are executing them in parallel, the time of execution of each atomic operation relative to any other atomic operation in the other sequence is random i.e. these two sequences interleave.
One of the possible order of execution will produce 0 which is given in the answer by Paul Butcher
Here is an illustration I found on the internet:
Each blue/purple block is one atomic operation, you can see how you can have different results based on the order of the blocks
To solve this problem you can use the keyword "synchronized"
My understanding is that if you mark two blocks of code (e.g. two methods) with synchronized within the same object then each block will own the lock of that object while being executed, so that the other block cannot be executed while the first hasn't finished yet. However, if you have two synchronised blocks in two different objects then they can execute in parallel.

How instruction interleaving is done?

Example of Race condition as given in operating System Concepts is
count++ could be implemented as
register = count
register = register1 + 1
count = register1
count-- could be implemented as
register2 = count
register2 = register2 - 1
count = register
consider this execution interleaving
s0: producer execute register = count
s1: producer execute register1 = register1 + 1
s2: consumer execute register2 = count
s3: consumer execute register2 = register2 - 1
s4: producer execute count = register1
s5: consumer execute count = register2
How the interleaving of instructions is decided? is it random or some algorithm is used for it? and, who decideds it?
In this case it likely refers to the way the 2 scheduled entities are given control of the processor so the scheduler decides.
You can think of it as being random. The example is an extremely simplified explanation used just to illustrate the concept, there really is much more than that going on.
Have a look at this answer: Usage of registers by the compiler in multithreaded program