Migrate from MurmurHash to MurmurHash3 - scala

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.
For instance, current code:
trait Foo {
type Bar
def id: Int
def path: Bar
override def hashCode = {
import util.MurmurHash._
var h = startHash(2)
val c = startMagicA
val k = startMagicB
h = extendHash(h, id, c, k)
h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
finalizeHash(h)
}
}
How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.

The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.
Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.
So minimally you'd have
override def hashCode = finalizeHash(mixLast(id, path.##), 0)
and "typically" you'd
// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")
// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)
Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.

Looking at the source code of MurmurHash3 suggests something like this:
override def hashCode = {
import util.hashing.MurmurHash3._
val h = symmetricSeed // I'm not sure which seed to use here
val h1 = mix(h, id)
val h2 = mixLast(h1, path ##)
finalizeHash(h2, 2)
}
or, in (almost) one line:
import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)

Related

What is the impact on memory When we override dataframes and Rdds in apache spark? [duplicate]

What is the difference between a var and val definition in Scala and why does the language need both? Why would you choose a val over a var and vice versa?
As so many others have said, the object assigned to a val cannot be replaced, and the object assigned to a var can. However, said object can have its internal state modified. For example:
class A(n: Int) {
var value = n
}
class B(n: Int) {
val value = new A(n)
}
object Test {
def main(args: Array[String]) {
val x = new B(5)
x = new B(6) // Doesn't work, because I can't replace the object created on the line above with this new one.
x.value = new A(6) // Doesn't work, because I can't replace the object assigned to B.value for a new one.
x.value.value = 6 // Works, because A.value can receive a new object.
}
}
So, even though we can't change the object assigned to x, we could change the state of that object. At the root of it, however, there was a var.
Now, immutability is a good thing for many reasons. First, if an object doesn't change internal state, you don't have to worry if some other part of your code is changing it. For example:
x = new B(0)
f(x)
if (x.value.value == 0)
println("f didn't do anything to x")
else
println("f did something to x")
This becomes particularly important with multithreaded systems. In a multithreaded system, the following can happen:
x = new B(1)
f(x)
if (x.value.value == 1) {
print(x.value.value) // Can be different than 1!
}
If you use val exclusively, and only use immutable data structures (that is, avoid arrays, everything in scala.collection.mutable, etc.), you can rest assured this won't happen. That is, unless there's some code, perhaps even a framework, doing reflection tricks -- reflection can change "immutable" values, unfortunately.
That's one reason, but there is another reason for it. When you use var, you can be tempted into reusing the same var for multiple purposes. This has some problems:
It will be more difficult for people reading the code to know what is the value of a variable in a certain part of the code.
You may forget to re-initialize the variable in some code path, and end up passing wrong values downstream in the code.
Simply put, using val is safer and leads to more readable code.
We can, then, go the other direction. If val is that better, why have var at all? Well, some languages did take that route, but there are situations in which mutability improves performance, a lot.
For example, take an immutable Queue. When you either enqueue or dequeue things in it, you get a new Queue object. How then, would you go about processing all items in it?
I'll go through that with an example. Let's say you have a queue of digits, and you want to compose a number out of them. For example, if I have a queue with 2, 1, 3, in that order, I want to get back the number 213. Let's first solve it with a mutable.Queue:
def toNum(q: scala.collection.mutable.Queue[Int]) = {
var num = 0
while (!q.isEmpty) {
num *= 10
num += q.dequeue
}
num
}
This code is fast and easy to understand. Its main drawback is that the queue that is passed is modified by toNum, so you have to make a copy of it beforehand. That's the kind of object management that immutability makes you free from.
Now, let's covert it to an immutable.Queue:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
def recurse(qr: scala.collection.immutable.Queue[Int], num: Int): Int = {
if (qr.isEmpty)
num
else {
val (digit, newQ) = qr.dequeue
recurse(newQ, num * 10 + digit)
}
}
recurse(q, 0)
}
Because I can't reuse some variable to keep track of my num, like in the previous example, I need to resort to recursion. In this case, it is a tail-recursion, which has pretty good performance. But that is not always the case: sometimes there is just no good (readable, simple) tail recursion solution.
Note, however, that I can rewrite that code to use an immutable.Queue and a var at the same time! For example:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
var qr = q
var num = 0
while (!qr.isEmpty) {
val (digit, newQ) = qr.dequeue
num *= 10
num += digit
qr = newQ
}
num
}
This code is still efficient, does not require recursion, and you don't need to worry whether you have to make a copy of your queue or not before calling toNum. Naturally, I avoided reusing variables for other purposes, and no code outside this function sees them, so I don't need to worry about their values changing from one line to the next -- except when I explicitly do so.
Scala opted to let the programmer do that, if the programmer deemed it to be the best solution. Other languages have chosen to make such code difficult. The price Scala (and any language with widespread mutability) pays is that the compiler doesn't have as much leeway in optimizing the code as it could otherwise. Java's answer to that is optimizing the code based on the run-time profile. We could go on and on about pros and cons to each side.
Personally, I think Scala strikes the right balance, for now. It is not perfect, by far. I think both Clojure and Haskell have very interesting notions not adopted by Scala, but Scala has its own strengths as well. We'll see what comes up on the future.
val is final, that is, cannot be set. Think final in java.
In simple terms:
var = variable
val = variable + final
val means immutable and var means mutable.
Full discussion.
The difference is that a var can be re-assigned to whereas a val cannot. The mutability, or otherwise of whatever is actually assigned, is a side issue:
import collection.immutable
import collection.mutable
var m = immutable.Set("London", "Paris")
m = immutable.Set("New York") //Reassignment - I have change the "value" at m.
Whereas:
val n = immutable.Set("London", "Paris")
n = immutable.Set("New York") //Will not compile as n is a val.
And hence:
val n = mutable.Set("London", "Paris")
n = mutable.Set("New York") //Will not compile, even though the type of n is mutable.
If you are building a data structure and all of its fields are vals, then that data structure is therefore immutable, as its state cannot change.
Thinking in terms of C++,
val x: T
is analogous to constant pointer to non-constant data
T* const x;
while
var x: T
is analogous to non-constant pointer to non-constant data
T* x;
Favoring val over var increases immutability of the codebase which can facilitate its correctness, concurrency and understandability.
To understand the meaning of having a constant pointer to non-constant data consider the following Scala snippet:
val m = scala.collection.mutable.Map(1 -> "picard")
m // res0: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard)
Here the "pointer" val m is constant so we cannot re-assign it to point to something else like so
m = n // error: reassignment to val
however we can indeed change the non-constant data itself that m points to like so
m.put(2, "worf")
m // res1: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard, 2 -> worf)
"val means immutable and var means mutable."
To paraphrase, "val means value and var means variable".
A distinction that happens to be extremely important in computing (because those two concepts define the very essence of what programming is all about), and that OO has managed to blur almost completely, because in OO, the only axiom is that "everything is an object". And that as a consequence, lots of programmers these days tend not to understand/appreciate/recognize, because they have been brainwashed into "thinking the OO way" exclusively. Often leading to variable/mutable objects being used like everywhere, when value/immutable objects might/would often have been better.
val means immutable and var means mutable
you can think val as java programming language final key world or c++ language const key world。
Val means its final, cannot be reassigned
Whereas, Var can be reassigned later.
It's as simple as it name.
var means it can vary
val means invariable
Val - values are typed storage constants. Once created its value cant be re-assigned. a new value can be defined with keyword val.
eg. val x: Int = 5
Here type is optional as scala can infer it from the assigned value.
Var - variables are typed storage units which can be assigned values again as long as memory space is reserved.
eg. var x: Int = 5
Data stored in both the storage units are automatically de-allocated by JVM once these are no longer needed.
In scala values are preferred over variables due to stability these brings to the code particularly in concurrent and multithreaded code.
Though many have already answered the difference between Val and var.
But one point to notice is that val is not exactly like final keyword.
We can change the value of val using recursion but we can never change value of final. Final is more constant than Val.
def factorial(num: Int): Int = {
if(num == 0) 1
else factorial(num - 1) * num
}
Method parameters are by default val and at every call value is being changed.
In terms of javascript , it same as
val -> const
var -> var

Pattern for generating negative Scalacheck scenarios: Using property based testing to test validation logic in Scala

We are looking for a viable design pattern for building Scalacheck Gen (generators) that can produce both positive and negative test scenarios. This will allow us to run forAll tests to validate functionality (positive cases), and also verify that our case class validation works correctly by failing on all invalid combinations of data.
Making a simple, parameterized Gen that does this on a one-off basis is pretty easy. For example:
def idGen(valid: Boolean = true): Gen[String] = Gen.oneOf(ID.values.toList).map(s => if (valid) s else Gen.oneOf(simpleRandomCode(4), "").sample.get)
With the above, I can get a valid or invalid ID for testing purposes. The valid one, I use to make sure business logic succeeds. The invalid one, I use to make sure our validation logic rejects the case class.
Ok, so -- problem is, on a large scale, this becomes very unwieldly. Let's say I have a data container with, oh, 100 different elements. Generating a "good" one is easy. But now, I want to generate a "bad" one, and furthermore:
I want to generate a bad one for each data element, where a single data element is bad (so at minimum, at least 100 bad instances, testing that each invalid parameter is caught by validation logic).
I want to be able to override specific elements, for instance feeding in a bad ID or a bad "foobar." Whatever that is.
One pattern we can look to for inspiration is apply and copy, which allows us to easily compose new objects while specifying overridden values. For example:
val f = Foo("a", "b") // f: Foo = Foo(a,b)
val t = Foo.unapply(f) // t: Option[(String, String)] = Some((a,b))
Foo(t.get._1, "c") // res0: Foo = Foo(a,c)
Above we see the basic idea of creating a mutating object from the template of another object. This is more easily expressed in Scala as:
val f = someFoo copy(b = "c")
Using this as inspiration we can think about our objectives. A few things to think about:
First, we could define a map or a container of key/values for the data element and generated value. This could be used in place of a tuple to support named value mutation.
Given a container of key/value pairs, we could easily select one (or more) pairs at random and change a value. This supports the objective of generating a data set where one value is altered to create failure.
Given such a container, we can easily create a new object from the invalid collection of values (using either apply() or some other technique).
Alternatively, perhaps we can develop a pattern that uses a tuple and then just apply() it, kind of like the copy method, as long as we can still randomly alter one or more values.
We can probably explore developing a reusable pattern that does something like this:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => invalidV = v.invalidate(1); validate(invalidV) must beFalse }
In the above code, we have a generator thingGen that returns (valid) Things. Then for all instances returned, we invoke a generic method invalidate(count: Int) which will randomly invalidate count values, returning an invalid object. We can then use that to ascertain whether our validation logic works correctly.
This would require defining an invalidate() function that, given a parameter (either by name, or by position) can then replace the identified parameter with a value that is known to be bad. This implies have an "anti-generator" for specific values, for instance, if an ID must be 3 characters, then it knows to create a string that is anything but 3 characters long.
Of course to invalidate a known, single parameter (to inject bad data into a test condition) we can simply use the copy method:
def thingGen(invalidValueCount: Int): Gen[Thing] = ???
def someTest = forAll(thingGen) { v => v2 = v copy(id = "xxx"); validate(v2) must beFalse }
That is the sum of my thinking to date. Am I barking up the wrong tree? Are there good patterns out there that handle this kind of testing? Any commentary or suggestions on how best to approach this problem of testing our validation logic?
We can combine a valid instance and an set of invalid fields (so that every field, if copied, would cause validation failure) to get an invalid object using shapeless library.
Shapeless allows you to represent your class as a list of key-value pairs that are still strongly typed and support some high-level operations, and converting back from this representation to your original class.
In example below I'll be providing an invalid instance for each single field provided
import shapeless._, record._
import shapeless.labelled.FieldType
import shapeless.ops.record.Updater
A detailed intro
Let's pretend we have a data class, and a valid instance of it (we only need one, so it can be hardcoded)
case class User(id: String, name: String, about: String, age: Int) {
def isValid = id.length == 3 && name.nonEmpty && age >= 0
}
val someValidUser = User("oo7", "Frank", "A good guy", 42)
assert(someValidUser.isValid)
We can then define a class to be used for invalid values:
case class BogusUserFields(name: String, id: String, age: Int)
val bogusData = BogusUserFields("", "1234", -5)
Instances of such classes can be provided using ScalaCheck. It's much easier to write a generator where all fields would cause failure. Order of fields doesn't matter, but their names and types do. Here we excluded about from User set of fields so we can do what you asked for (feeding only a subset of fields you want to test)
We then use LabelledGeneric[T] to convert User and BogusUserFields to their corresponding record value (and later we will convert User back)
val userLG = LabelledGeneric[User]
val bogusLG = LabelledGeneric[BogusUserFields]
val validUserRecord = userLG.to(someValidUser)
val bogusRecord = bogusLG.to(bogusData)
Records are lists of key-value pairs, so we can use head to get a single mapping, and the + operator supports adding / replacing field to another record. Let's pick every invalid field into our user one at a time. Also, here's the conversion back in action:
val invalidUser1 = userLG.from(validUserRecord + bogusRecord.head)// invalid name
val invalidUser2 = userLG.from(validUserRecord + bogusRecord.tail.head)// invalid ID
val invalidUser3 = userLG.from(validUserRecord + bogusRecord.tail.tail.head) // invalid age
assert(List(invalidUser1, invalidUser2, invalidUser3).forall(!_.isValid))
Since we basically are applying the same function (validUserRecord + _) to every key-value pair in our bogusRecord, we can also use map operator, except we use it with an unusual - polymorphic - function. We can also easily convert it to List, because every element will be of a same type now.
object polymerge extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[userLG.Repr, FieldType[K, V]]) =
at[FieldType[K, V]](upd(validUserRecord, _))
}
val allInvalidUsers = bogusRecord.map(polymerge).toList.map(userLG.from)
assert(allInvalidUsers == List(invalidUser1, invalidUser2, invalidUser3))
Generalizing and removing all the boilerplate
Now the whole point of this was that we can generalize it to work for any two arbitrary classes. The encoding of all relationships and operations is a bit cumbersome and it took me a while to get it right with all the implicit not found errors, so I'll skip the details.
class Picks[A, AR <: HList](defaults: A)(implicit lgA: LabelledGeneric.Aux[A, AR]) {
private val defaultsRec = lgA.to(defaults)
object mergeIntoTemplate extends Poly1 {
implicit def caseField[K, V](implicit upd: Updater[AR, FieldType[K, V]]) =
at[FieldType[K, V]](upd(defaultsRec, _))
}
def from[B, BR <: HList, MR <: HList, F <: Poly](options: B)
(implicit
optionsLG: LabelledGeneric.Aux[B, BR],
mapper: ops.hlist.Mapper.Aux[mergeIntoTemplate.type, BR, MR],
toList: ops.hlist.ToTraversable.Aux[MR, List, AR]
) = {
optionsLG.to(options).map(mergeIntoTemplate).toList.map(lgA.from)
}
}
So, here it is in action:
val cp = new Picks(someValidUser)
assert(cp.from(bogusData) == allInvalidUsers)
Unfortunately, you cannot write new Picks(someValidUser).from(bogusData) because implicit for mapper requires a stable identifier. On the other hand, cp instance can be reused with other types:
case class BogusName(name: String)
assert(cp.from(BogusName("")).head == someValidUser.copy(name = ""))
And now it works for all types! And bogus data is required to be any subset of class fields, so it will work even for class itself
case class Address(country: String, city: String, line_1: String, line_2: String) {
def isValid = Seq(country, city, line_1, line_2).forall(_.nonEmpty)
}
val acp = new Picks(Address("Test country", "Test city", "Test line 1", "Test line 2"))
val invalidAddresses = acp.from(Address("", "", "", ""))
assert(invalidAddresses.forall(!_.isValid))
You can see the code running at ScalaFiddle

What are good examples of: "operation of a program should map input values to output values rather than change data in place"

I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.

Pros and Cons of choosing def over val

I'm asking a slight different question than this one. Suppose I have a code snippet:
def foo(i : Int) : List[String] = {
val s = i.toString + "!" //using val
s :: Nil
}
This is functionally equivalent to the following:
def foo(i : Int) : List[String] = {
def s = i.toString + "!" //using def
s :: Nil
}
Why would I choose one over the other? Obviously I would assume the second has a slight disadvantages in:
creating more bytecode (the inner def is lifted to a method in the class)
a runtime performance overhead of invoking a method over accessing a value
non-strict evaluation means I could easily access s twice (i.e. unnecesasarily redo a calculation)
The only advantage I can think of is:
non-strict evaluation of s means it is only called if it is used (but then I could just use a lazy val)
What are peoples' thoughts here? Is there a significant dis-benefit to me making all inner vals defs?
1)
One answer I didn't see mentioned is that the stack frame for the method you're describing could actually be smaller. Each val you declare will occupy a slot on the JVM stack, however, the whenever you use a def obtained value it will get consumed in the first expression you use it in. Even if the def references something from the environment, the compiler will pass .
The HotSpot should optimize both these things, or so some people claim. See:
http://www.ibm.com/developerworks/library/j-jtp12214/
Since the inner method gets compiled into a regular private method behind the scene and it is usually very small, the JIT compiler might choose to inline it and then optimize it. This could save time allocating smaller stack frames (?), or, by having fewer elements on the stack, make local variables access quicker.
But, take this with a (big) grain of salt - I haven't actually made extensive benchmarks to backup this claim.
2)
In addition, to expand on Kevin's valid reply, the stable val provides also means that you can use it with path dependent types - something you can't do with a def, since the compiler doesn't check its purity.
3)
For another reason you might want to use a def, see a related question asked not so long ago:
Functional processing of Scala streams without OutOfMemory errors
Essentially, using defs to produce Streams ensures that there do not exist additional references to these objects, which is important for the GC. Since Streams are lazy anyway, the overhead of creating them is probably negligible even if you have multiple defs.
The val is strict, it's given a value as soon as you define the thing.
Internally, the compiler will mark it as STABLE, equivalent to final in Java. This should allow the JVM to make all sorts of optimisations - I just don't know what they are :)
I can see an advantage in the fact that you are less bound to a location when using a def than when using a val.
This is not a technical advantage but allows for better structuring in some cases.
So, stupid example (please edit this answer, if you’ve got a better one), this is not possible with val:
def foo(i : Int) : List[String] = {
def ret = s :: Nil
def s = i.toString + "!"
ret
}
There may be cases where this is important or just convenient.
(So, basically, you can achieve the same with lazy val but, if only called at most once, it will probably be faster than a lazy val.)
For a local declaration like this (with no arguments, evaluated precisely once and with no code evaluated between the point of declaration and the point of evaluation) there is no semantic difference. I wouldn't be surprised if the "val" version compiled to simpler and more efficient code than the "def" version, but you would have to examine the bytecode and possibly profile to be sure.
In your example I would use a val. I think the val/def choice is more meaningful when declaring class members:
class A { def a0 = "a"; def a1 = "a" }
class B extends A {
var c = 0
override def a0 = { c += 1; "a" + c }
override val a1 = "b"
}
In the base class using def allows the sub class to override with possibly a def that does not return a constant. Or it could override with a val. So that gives more flexibility than a val.
Edit: one more use case of using def over val is when an abstract class has a "val" for which the value should be provided by a subclass.
abstract class C { def f: SomeObject }
new C { val f = new SomeObject(...) }

What is the difference between a var and val definition in Scala?

What is the difference between a var and val definition in Scala and why does the language need both? Why would you choose a val over a var and vice versa?
As so many others have said, the object assigned to a val cannot be replaced, and the object assigned to a var can. However, said object can have its internal state modified. For example:
class A(n: Int) {
var value = n
}
class B(n: Int) {
val value = new A(n)
}
object Test {
def main(args: Array[String]) {
val x = new B(5)
x = new B(6) // Doesn't work, because I can't replace the object created on the line above with this new one.
x.value = new A(6) // Doesn't work, because I can't replace the object assigned to B.value for a new one.
x.value.value = 6 // Works, because A.value can receive a new object.
}
}
So, even though we can't change the object assigned to x, we could change the state of that object. At the root of it, however, there was a var.
Now, immutability is a good thing for many reasons. First, if an object doesn't change internal state, you don't have to worry if some other part of your code is changing it. For example:
x = new B(0)
f(x)
if (x.value.value == 0)
println("f didn't do anything to x")
else
println("f did something to x")
This becomes particularly important with multithreaded systems. In a multithreaded system, the following can happen:
x = new B(1)
f(x)
if (x.value.value == 1) {
print(x.value.value) // Can be different than 1!
}
If you use val exclusively, and only use immutable data structures (that is, avoid arrays, everything in scala.collection.mutable, etc.), you can rest assured this won't happen. That is, unless there's some code, perhaps even a framework, doing reflection tricks -- reflection can change "immutable" values, unfortunately.
That's one reason, but there is another reason for it. When you use var, you can be tempted into reusing the same var for multiple purposes. This has some problems:
It will be more difficult for people reading the code to know what is the value of a variable in a certain part of the code.
You may forget to re-initialize the variable in some code path, and end up passing wrong values downstream in the code.
Simply put, using val is safer and leads to more readable code.
We can, then, go the other direction. If val is that better, why have var at all? Well, some languages did take that route, but there are situations in which mutability improves performance, a lot.
For example, take an immutable Queue. When you either enqueue or dequeue things in it, you get a new Queue object. How then, would you go about processing all items in it?
I'll go through that with an example. Let's say you have a queue of digits, and you want to compose a number out of them. For example, if I have a queue with 2, 1, 3, in that order, I want to get back the number 213. Let's first solve it with a mutable.Queue:
def toNum(q: scala.collection.mutable.Queue[Int]) = {
var num = 0
while (!q.isEmpty) {
num *= 10
num += q.dequeue
}
num
}
This code is fast and easy to understand. Its main drawback is that the queue that is passed is modified by toNum, so you have to make a copy of it beforehand. That's the kind of object management that immutability makes you free from.
Now, let's covert it to an immutable.Queue:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
def recurse(qr: scala.collection.immutable.Queue[Int], num: Int): Int = {
if (qr.isEmpty)
num
else {
val (digit, newQ) = qr.dequeue
recurse(newQ, num * 10 + digit)
}
}
recurse(q, 0)
}
Because I can't reuse some variable to keep track of my num, like in the previous example, I need to resort to recursion. In this case, it is a tail-recursion, which has pretty good performance. But that is not always the case: sometimes there is just no good (readable, simple) tail recursion solution.
Note, however, that I can rewrite that code to use an immutable.Queue and a var at the same time! For example:
def toNum(q: scala.collection.immutable.Queue[Int]) = {
var qr = q
var num = 0
while (!qr.isEmpty) {
val (digit, newQ) = qr.dequeue
num *= 10
num += digit
qr = newQ
}
num
}
This code is still efficient, does not require recursion, and you don't need to worry whether you have to make a copy of your queue or not before calling toNum. Naturally, I avoided reusing variables for other purposes, and no code outside this function sees them, so I don't need to worry about their values changing from one line to the next -- except when I explicitly do so.
Scala opted to let the programmer do that, if the programmer deemed it to be the best solution. Other languages have chosen to make such code difficult. The price Scala (and any language with widespread mutability) pays is that the compiler doesn't have as much leeway in optimizing the code as it could otherwise. Java's answer to that is optimizing the code based on the run-time profile. We could go on and on about pros and cons to each side.
Personally, I think Scala strikes the right balance, for now. It is not perfect, by far. I think both Clojure and Haskell have very interesting notions not adopted by Scala, but Scala has its own strengths as well. We'll see what comes up on the future.
val is final, that is, cannot be set. Think final in java.
In simple terms:
var = variable
val = variable + final
val means immutable and var means mutable.
Full discussion.
The difference is that a var can be re-assigned to whereas a val cannot. The mutability, or otherwise of whatever is actually assigned, is a side issue:
import collection.immutable
import collection.mutable
var m = immutable.Set("London", "Paris")
m = immutable.Set("New York") //Reassignment - I have change the "value" at m.
Whereas:
val n = immutable.Set("London", "Paris")
n = immutable.Set("New York") //Will not compile as n is a val.
And hence:
val n = mutable.Set("London", "Paris")
n = mutable.Set("New York") //Will not compile, even though the type of n is mutable.
If you are building a data structure and all of its fields are vals, then that data structure is therefore immutable, as its state cannot change.
Thinking in terms of C++,
val x: T
is analogous to constant pointer to non-constant data
T* const x;
while
var x: T
is analogous to non-constant pointer to non-constant data
T* x;
Favoring val over var increases immutability of the codebase which can facilitate its correctness, concurrency and understandability.
To understand the meaning of having a constant pointer to non-constant data consider the following Scala snippet:
val m = scala.collection.mutable.Map(1 -> "picard")
m // res0: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard)
Here the "pointer" val m is constant so we cannot re-assign it to point to something else like so
m = n // error: reassignment to val
however we can indeed change the non-constant data itself that m points to like so
m.put(2, "worf")
m // res1: scala.collection.mutable.Map[Int,String] = HashMap(1 -> picard, 2 -> worf)
"val means immutable and var means mutable."
To paraphrase, "val means value and var means variable".
A distinction that happens to be extremely important in computing (because those two concepts define the very essence of what programming is all about), and that OO has managed to blur almost completely, because in OO, the only axiom is that "everything is an object". And that as a consequence, lots of programmers these days tend not to understand/appreciate/recognize, because they have been brainwashed into "thinking the OO way" exclusively. Often leading to variable/mutable objects being used like everywhere, when value/immutable objects might/would often have been better.
val means immutable and var means mutable
you can think val as java programming language final key world or c++ language const key world。
Val means its final, cannot be reassigned
Whereas, Var can be reassigned later.
It's as simple as it name.
var means it can vary
val means invariable
Val - values are typed storage constants. Once created its value cant be re-assigned. a new value can be defined with keyword val.
eg. val x: Int = 5
Here type is optional as scala can infer it from the assigned value.
Var - variables are typed storage units which can be assigned values again as long as memory space is reserved.
eg. var x: Int = 5
Data stored in both the storage units are automatically de-allocated by JVM once these are no longer needed.
In scala values are preferred over variables due to stability these brings to the code particularly in concurrent and multithreaded code.
Though many have already answered the difference between Val and var.
But one point to notice is that val is not exactly like final keyword.
We can change the value of val using recursion but we can never change value of final. Final is more constant than Val.
def factorial(num: Int): Int = {
if(num == 0) 1
else factorial(num - 1) * num
}
Method parameters are by default val and at every call value is being changed.
In terms of javascript , it same as
val -> const
var -> var