How does Scala manage to avoid racing in reactive programming? - scala

As I have realized up until right now the basic idea for functional reactive programming in Scala is to define a signal that extends the DynamicVariable class of Scala but I could not understand something written in explanation of that class it said:
DynamicVariables provide a binding mechanism where the current value is found through dynamic scope, but where access to the variable itself is resolved through static scope.
If I am not wrong, the dynamic scope is when a called function sees a variable from the scope of the caller program and the static scope is when it sees a variable from its own scope like the pseudo-code that follows:
let x = 1
func dynamic (y: Int) = y + x
func static (w: Int) = y + x
func main() = {
let x = 2
dynamic (3) //returns 5
static (3) //returns 4
}
So the question is how is the meaning of accessing the variable itself, and if it implies writing to it, how Scala prevent racing when some functions have each a copy and want to write to the variable?

In the following, dv1 is defined in the normal fashion, and is only accessible through the usual scope rules. The function checkDv1 sees dv1 because dv1 is defined in its scope.
val dv1: DynamicVariable[Int] = new DynamicVariable(42)
def checkDv1 = println( dv1.value )
However, when checkDv1 is invoked inside the withValue() dynamic scope, the value it gets back is different, as in it will be the newly bound value.
def testDv = dv1.withValue(41) {
checkDv1
}
checkDv1
testDv
checkDv1
So, the output from those three function invocations should be:
42
41
42
as the dynamic scope changes.
For your other question, a DynamicVariable has a binding within the context of a thread. When a new thread is created, the current binding is copied to the new thread, and they have no further interaction with each other. Hence, no race conditions.
DynamicVariable has very little to do with reactive programming, except that its behavior is well defined in a multi-threaded environment.

Related

Does method parameter trigger serialization in Spark?

I read the Spark programming guide about passing functions and wonder what happens when the function reference the outer method parameter/local variable.
For example, I have this object
object Main {
def main(args: Array[String]): Unit = {
val ds: Dataset[String] = ???
ds.map(_ + args(0))
}
}
Does Spark have to serialize Main? What if args is a local variable inside main?
No, in both these cases Spark won't serialize the Main object. Method arguments and local variables (which are pretty much the same thing from the semantics perspective) do not "belong" to the enclosing object or class, they are associated with a particular method invocation and can therefore be captured by the closure directly.
As a general rule, if you need to have a reference to some object in order to access some value, then this reference will be captured and, therefore, serialized:
class Application(n: Int) {
val x = "internal state " + n
def doSomething(ds: Dataset[String], param: String): Unit = {
ds.map(_ + x + param)
}
}
Note that here in order to access x, which is an instance member, you necessarily have to have the enclosing instance to be available, because it depends on the actual parameters that the instance was constructed with. Another way to see it is to remember that when you use x in the example above, it is actually a shortcut for this.x:
ds.map(_ + this.x + param)
Compared to this, the param value has no such dependency - it is passed as is to the method, and there is no need to access any other enclosing object to use it. Therefore, param will be captured and serialized directly.
That's why there is that advice to put instance members to local variables in order not to capture the entire object: when you put a value to a local variable, it no longer requires accessing the enclosing instance:
val localX = this.x
ds.map(_ + localX + param)
Of course, if you have references to the enclosing instance inside the object that you want to capture, like here:
class Inner(app: Application)
class Application {
val x = new Inner(this)
def doSomething(ds: Dataset[String]): Unit = {
val localX = x
ds.map(_ + localX.toString)
}
}
then storing it to a local variable would not help, because Spark would still need to serialize the app field of the Inner class, which points back to the Application instance. That's why you have to be careful if you have complex object graphs that you use in Spark methods which are going to be sent to executors.

Scala macros - storing global state?

Is it possible for a macro implementation to maintain some form of global state (during the entire compilation run)? Specifically, I want to create a separate instance of IMain, but I don't want to create it anew in every macro expansion, so I would like to have a form of lazy val, ThreadLocal or anything where I can cache that instance. For simplicity, just imagine I want to share an object during compilation between all expansions of the same macro:
object Foo {
def next: Int = macro ???
}
trait Test {
val a = Foo.next
val b = Foo.next
val c = Foo.next
assert(a == 1 && b == 2 && c == 3)
}
Since in the actual case, the state is quite complex and not serializable, reading and writing to disk is not an option.
I can't seem to see any way to achieve that through the only context provided, scala.reflect.macros.blackbox.Context. Does that mean I have to write a full-fledged compiler plugin? Can I trick sbt into giving me some object I can write to?
Use an object and a var. As long as the file it's in doesn't use your macros, it should work. I'm not sure if it's ever guaranteed anywhere that scalac will keep the state of macros between units, but this seems to work for you.
object Foo {
def next: Int = macro next_impl
var state = ???;
def next_impl(...): ...
}

Why is it allowed to put methods inside blocks, and statements inside objects in Scala?

I'm learning Scala and I don't really understand the following example :
object Test extends App {
def method1 = println("method 1")
val x = {
def method2 = "method 2" // method inside a block
"this is " + method2
}
method1 // statement inside an object
println(x) // same
}
I mean, it feels inconsistent to me because here I see two different concepts :
Objects/Classes/Traits, which contains members.
Blocks, which contains statements, the last statement being the value of the block.
But here we have a method part of a block, and statements part of an object. So, does it mean that blocks are objects too ? And how are handled the statements part of an object, are they members too ?
Thanks.
Does it mean that blocks are objects too?
No, blocks are not objects. Blocks are used for scoping the binding of variables. Scala enables not only defining expressions inside blocks but also to define methods. If we take your example and compile it, we can see what the compiler does:
object Test extends Object {
def method1(): Unit = scala.Predef.println("method 1");
private[this] val x: String = _;
<stable> <accessor> def x(): String = Test.this.x;
final <static> private[this] def method2$1(): String = "method 2";
def <init>(): tests.Test.type = {
Test.super.<init>();
Test.this.x = {
"this is ".+(Test.this.method2$1())
};
Test.this.method1();
scala.Predef.println(Test.this.x());
()
}
}
What the compiler did is extract method2 to an "unnamed" method on method2$1 and scoped it to private[this] which is scoped to the current instance of the type.
And how are handled the statements part of an object, are they members
too?
The compiler took method1 and println and calls them inside the constructor when the type is initialized. So you can see val x and the rest of the method calls are invoked at construction time.
method2 is actually not a method. It is a local function. Scala allows you to create named functions inside local scopes for organizing your code into functions without polluting the namespace.
It is most often used to define local tail-recursive helper functions. Often, when making a function tail-recursive, you need to add an additional parameter to carry the "state" on the call stack, but this additional parameter is a private internal implementation detail and shouldn't be exposed to clients. In languages without local functions, you would make this a private helper alongside the primary method, but then it would still be within the namespace of the class and callable by all other methods of the class, when it is really only useful for that particular method. So, in Scala, you can instead define it locally inside the method:
// non tail-recursive
def length[A](ls: List[A]) = ls match {
case Nil => 0
case x :: xs => length(xs) + 1
}
//transformation to tail-recursive, Java-style:
def length[A](ls: List[A]) = lengthRec(ls, 0)
private def lengthRec[A](ls: List[A], len: Int) = ls match {
case Nil => len
case x :: xs => lengthRec(xs, len + 1)
}
//tail-recursive, Scala-style:
def length[A](ls: List[A]) = {
//note: lengthRec is nested and thus can access `ls`, there is no need to pass it
def lengthRec(len: Int) = ls match {
case Nil => len
case x :: xs => lengthRec(xs, len + 1)
}
lengthRec(ls, 0)
}
Now you might say, well I see the value in defining local functions inside methods, but what's the value in being able to define local functions in blocks? Scala tries to as simple as possible and have as few corner cases as possible. If you can define local functions inside methods, and local functions inside local functions … then why not simplify that rule and just say that local functions behave just like local fields, you can simply define them in any block scope. Then you don't need different scope rules for local fields and local functions, and you have simplified the language.
The other thing you mentioned, being able to execute code in the bode of a template, that's actually the primary constructor (so to speak … it's technically more like an initializer). Remember: the primary constructor's signature is defined with parentheses after the class name … but where would you put the code for the constructor then? Well, you put it in the body of the class!

How to track nested functions in Scala

I'd like to have some basic knowledge of how deeply my function call is nested. Consider the following:
scala> def decorate(f: => Unit) : Unit = { println("I am decorated") ; f }
decorate: (f: => Unit)Unit
scala> decorate { println("foo") }
I am decorated
foo
scala> decorate { decorate { println("foo") } }
I am decorated
I am decorated
foo
For the last call, I'd like to be able to get the following:
I am decorated 2x
I am decorated 1x
foo
The idea is that the decorate function knows how deeply its nested. Ideas?
Update: As Nikita had thought, my example doesn't represent what I'm really after. The goal is not to produce the strings so much as to be able to pass some state through a series of calls to the same nested function. I think Régis Jean-Gilles is pointing me in the right direction.
You can use the dynamic scope pattern. More prosaically this means using a thread local variable (scala's DynamicVariable is done just for that) to store the current nesting level. See my answer to this other question for a partical example of this pattern: How to define a function that takes a function literal (with an implicit parameter) as an argument?
This is suitable only if you want to know the nesting level for a very specific method though. If you want a generic mecanism that works for any method then this won't work (as you'd need a distinct variable for each method). In this case the only alternative I can think of is to inspect the stack, but not only is it not very reliable, it is also extremely slow.
UPDATE: actually, there is a way to apply the dynamic scope pattern in a generic way (for any possible method). The important part is to be able to implicitly get a unique id for each method. from there, it is just a matter of using this id as a key to associate a DynamicVariable to the method:
import scala.util.DynamicVariable
object FunctionNestingHelper {
private type FunctionId = Class[_]
private def getFunctionId( f: Function1[_,_] ): FunctionId = {
f.getClass // That's it! Beware, implementation dependant.
}
private val currentNestings = new DynamicVariable( Map.empty[FunctionId, Int] )
def withFunctionNesting[T]( body: Int => T ): T = {
val id = getFunctionId( body )
val oldNestings = currentNestings.value
val oldNesting = oldNestings.getOrElse( id, 0 )
val newNesting = oldNesting + 1
currentNestings.withValue( oldNestings + ( id -> newNesting) ) {
body( newNesting )
}
}
}
Usage:
import FunctionNestingHelper._
def decorate(f: => Unit) = withFunctionNesting { nesting: Int =>
println("I am decorated " + nesting + "x") ; f
}
To get a unique id for the method, I actually get an id for a the closure passed to withFunctionNesting (which you must call in the method where you need to retrieve the current nesting). And that's where I err on the implementation dependant side: the id is just the class of the function instance. This does work as expected as of now (because every unary function literal is implemented as exactly one class implementing Function1 so the class acts as a unique id), but the reality is that it might well break (although unlikely) in a future version of scala. So use it at your own risk.
Finally, I suggest that you first evaluate seriously if Nikita Volkov's suggestion of going more functional would not be a better solution overall.
You could return a number from the function and count how many levels you are in on the way back up the stack. But there is no easy way to count on the way down like you have given example output for.
Since your question is tagged with "functional programming" following are functional solutions. Sure the program logic changes completely, but then your example code was imperative.
The basic principle of functional programming is that there is no state. What you're used to have as a shared state in imperative programming with all the headache involved (multithreading issues and etc.) - it is all achieved by passing immutable data as arguments in functional programming.
So, assuming the "state" data you wanted to pass was the current cycle number, here's how you'd implement a function using recursion:
def decorated ( a : String, cycle : Int ) : String
= if( cycle <= 0 ) a
else "I am decorated " + cycle + "x\n" + decorated(a, cycle - 1)
println(decorated("foo", 3))
Alternatively you could make your worker function non-recursive and "fold" it:
def decorated ( a : String, times : Int )
= "I am decorated " + times + "x\n" + a
println( (1 to 3).foldLeft("foo")(decorated) )
Both codes above will produce the following output:
I am decorated 3x
I am decorated 2x
I am decorated 1x
foo

What is the difference between a var and val definition in Scala?

What is the difference between a var and val definition in Scala and why does the language need both? Why would you choose a val over a var and vice versa?
As so many others have said, the object assigned to a val cannot be replaced, and the object assigned to a var can. However, said object can have its internal state modified. For example:
class A(n: Int) {
var value = n
}
class B(n: Int) {
val value = new A(n)
}
object Test {
def main(args: Array[String]) {
val x = new B(5)
x = new B(6) // Doesn't work, because I can't replace the object created on the line above with this new one.
x.value = new A(6) // Doesn't work, because I can't replace the object assigned to B.value for a new one.
x.value.value = 6 // Works, because A.value can receive a new object.
}
}
So, even though we can't change the object assigned to x, we could change the state of that object. At the root of it, however, there was a var.
Now, immutability is a good thing for many reasons. First, if an object doesn't change internal state, you don't have to worry if some other part of your code is changing it. For example:
x = new B(0)
f(x)
if (x.value.value == 0)
println("f didn't do anything to x")
else
println("f did something to x")
This becomes particularly important with multithreaded systems. In a multithreaded system, the following can happen:
x = new B(1)
f(x)
if (x.value.value == 1) {
print(x.value.value) // Can be different than 1!
}
If you use val exclusively, and only use immutable data structures (that is, avoid arrays, everything in scala.collection.mutable, etc.), you can rest assured this won't happen. That is, unless there's some code, perhaps even a framework, doing reflection tricks -- reflection can change "immutable" values, unfortunately.
That's one reason, but there is another reason for it. When you use var, you can be tempted into reusing the same var for multiple purposes. This has some problems:
It will be more difficult for people reading the code to know what is the value of a variable in a certain part of the code.
You may forget to re-initialize the variable in some code path, and end up passing wrong values downstream in the code.
Simply put, using val is safer and leads to more readable code.
We can, then, go the other direction. If val is that better, why have var at all? Well, some languages did take that route, but there are situations in which mutability improves performance, a lot.
For example, take an immutable Queue. When you either enqueue or dequeue things in it, you get a new Queue object. How then, would you go about processing all items in it?
I'll go through that with an example. Let's say you have a queue of digits, and you want to compose a number out of them. For example, if I have a queue with 2, 1, 3, in that order, I want to get back the number 213. Let's first solve it with a mutable.Queue:
def toNum(q: scala.collection.mutable.Queue[Int]) = {
var num = 0
while (!q.isEmpty) {
num *= 10
num += q.dequeue
}
num
}
This code is fast and easy to understand. Its main drawback is that the queue that is passed is modified by toNum, so you have to make a copy of it beforehand. That's the kind of object management that immutability makes you free from.
Now, let's covert it to an immutable.Queue:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
def recurse(qr: scala.collection.immutable.Queue[Int], num: Int): Int = {
if (qr.isEmpty)
num
else {
val (digit, newQ) = qr.dequeue
recurse(newQ, num * 10 + digit)
}
}
recurse(q, 0)
}
Because I can't reuse some variable to keep track of my num, like in the previous example, I need to resort to recursion. In this case, it is a tail-recursion, which has pretty good performance. But that is not always the case: sometimes there is just no good (readable, simple) tail recursion solution.
Note, however, that I can rewrite that code to use an immutable.Queue and a var at the same time! For example:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
var qr = q
var num = 0
while (!qr.isEmpty) {
val (digit, newQ) = qr.dequeue
num *= 10
num += digit
qr = newQ
}
num
}
This code is still efficient, does not require recursion, and you don't need to worry whether you have to make a copy of your queue or not before calling toNum. Naturally, I avoided reusing variables for other purposes, and no code outside this function sees them, so I don't need to worry about their values changing from one line to the next -- except when I explicitly do so.
Scala opted to let the programmer do that, if the programmer deemed it to be the best solution. Other languages have chosen to make such code difficult. The price Scala (and any language with widespread mutability) pays is that the compiler doesn't have as much leeway in optimizing the code as it could otherwise. Java's answer to that is optimizing the code based on the run-time profile. We could go on and on about pros and cons to each side.
Personally, I think Scala strikes the right balance, for now. It is not perfect, by far. I think both Clojure and Haskell have very interesting notions not adopted by Scala, but Scala has its own strengths as well. We'll see what comes up on the future.
val is final, that is, cannot be set. Think final in java.
In simple terms:
var = variable
val = variable + final
val means immutable and var means mutable.
Full discussion.
The difference is that a var can be re-assigned to whereas a val cannot. The mutability, or otherwise of whatever is actually assigned, is a side issue:
import collection.immutable
import collection.mutable
var m = immutable.Set("London", "Paris")
m = immutable.Set("New York") //Reassignment - I have change the "value" at m.
Whereas:
val n = immutable.Set("London", "Paris")
n = immutable.Set("New York") //Will not compile as n is a val.
And hence:
val n = mutable.Set("London", "Paris")
n = mutable.Set("New York") //Will not compile, even though the type of n is mutable.
If you are building a data structure and all of its fields are vals, then that data structure is therefore immutable, as its state cannot change.
Thinking in terms of C++,
val x: T
is analogous to constant pointer to non-constant data
T* const x;
while
var x: T
is analogous to non-constant pointer to non-constant data
T* x;
Favoring val over var increases immutability of the codebase which can facilitate its correctness, concurrency and understandability.
To understand the meaning of having a constant pointer to non-constant data consider the following Scala snippet:
val m = scala.collection.mutable.Map(1 -> "picard")
m // res0: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard)
Here the "pointer" val m is constant so we cannot re-assign it to point to something else like so
m = n // error: reassignment to val
however we can indeed change the non-constant data itself that m points to like so
m.put(2, "worf")
m // res1: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard, 2 -> worf)
"val means immutable and var means mutable."
To paraphrase, "val means value and var means variable".
A distinction that happens to be extremely important in computing (because those two concepts define the very essence of what programming is all about), and that OO has managed to blur almost completely, because in OO, the only axiom is that "everything is an object". And that as a consequence, lots of programmers these days tend not to understand/appreciate/recognize, because they have been brainwashed into "thinking the OO way" exclusively. Often leading to variable/mutable objects being used like everywhere, when value/immutable objects might/would often have been better.
val means immutable and var means mutable
you can think val as java programming language final key world or c++ language const key world。
Val means its final, cannot be reassigned
Whereas, Var can be reassigned later.
It's as simple as it name.
var means it can vary
val means invariable
Val - values are typed storage constants. Once created its value cant be re-assigned. a new value can be defined with keyword val.
eg. val x: Int = 5
Here type is optional as scala can infer it from the assigned value.
Var - variables are typed storage units which can be assigned values again as long as memory space is reserved.
eg. var x: Int = 5
Data stored in both the storage units are automatically de-allocated by JVM once these are no longer needed.
In scala values are preferred over variables due to stability these brings to the code particularly in concurrent and multithreaded code.
Though many have already answered the difference between Val and var.
But one point to notice is that val is not exactly like final keyword.
We can change the value of val using recursion but we can never change value of final. Final is more constant than Val.
def factorial(num: Int): Int = {
if(num == 0) 1
else factorial(num - 1) * num
}
Method parameters are by default val and at every call value is being changed.
In terms of javascript , it same as
val -> const
var -> var