Performance difference between def and val - scala

Consider the below code where I am passing a method and a function as a parameter to map()
val list1:List[Int]=List(10,20,30)
def func1(x:Int):Int={
x+10
}
list1.map(func1)
list1.map(_+10)
I have few questions about ETA expansion:
Is there a performance difference in using a method in place of a function, especially since the method is internally getting converted into a function?
Is there a performance difference between def x:Int=10 and val x:Int=10?
I have read that the call-by-name parameter is actually a method which does not accept any parameter. Now, if methods are not objects, how are we using a method as a parameter value?

There is no significant difference between the expressions you're asking about.
val x incurs a private field.
Note that vs.map(_+10) inlines the function, as compared to vs.map(x => f(x)). But you have to create a function object in any case.
A call-by-name argument => X is a () => X under the hood.
From the REPL, use javap to show code. -c for code, -v for verbose.
scala> vs.map(f)
res0: List[Int] = List(2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
scala> :javap -pv -
[snip]

One of the differences is, val values are measured when class is loaded whilst def are measured when called.
A simple example is, say you have 100K val variables in the class (for argument's sake), the system might take long time to start. But if you have a def A in which declares 100K val. The performance will be hit only when the A is called.

Related

The differences between using `val` and `def` for function definition in Scala REPL?

I defined two functions(method) in Scala REPL:
scala> val b=(x:Int)=>x+1
b: Int => Int = <function1>
scala> def c(x:Int)=x+1
c: (x: Int)Int
And the usage:
scala> b(1)
res4: Int = 2
scala> c(1)
res5: Int = 2
While both definition works, it seems that b and c have different type. And I was wondering whether there are some differences between them. Why doesn't Scala use the same type for b and c? Does anyone have ideas about this?
Not duplicate:
This question is not a duplicate of the linked question. Even though
it asks about the difference between using def and val to define a
function, the code example makes it clear that the asker is confused
about the difference between methods and functions in Scala. The
example doesn't use a def to define a function at all. – Aaron
Novstrup 7 hours ago
The use of def creates a method (in the case of the REPL it will create a method in some global invisible object), val instead will create an anonymous function and assign it to the symbol you specified.
When invoking those they are pretty much the same thing; when you pass them around there is a difference but Scala hides it from you by performing the ETA expansion transparently. As an example if you define this:
def isEven(i: Int): Boolean = i % 2 == 0
And then call
list.filter(isEven)
Scala is transforming that for you in a way that is similar to using the val way instead; take it as a pseudo-code as I don't know so well the scala internals but at at high level this is what happens:
list.filter((i: Int) => isEven(i))

What does Predef.identity do in scala?

Here is documentation about Predef, but there is no word about identity. What is this function used for? And what it does?
It's just an instance of the identity function, predefined for convenience, and perhaps to prevent people from redefining it on their own a whole bunch of times. identity simply returns its argument. It can be handy sometimes to pass to higher-order functions. You could do something like:
scala> def squareIf(test: Boolean) = List(1, 2, 3, 4, 5).map(if (test) x => x * x else identity)
squareIf: (test: Boolean)List[Int]
scala> squareIf(true)
res4: List[Int] = List(1, 4, 9, 16, 25)
scala> squareIf(false)
res5: List[Int] = List(1, 2, 3, 4, 5)
I've also seen it used as a default argument value at times. Obviously, you could just say x => x any place you might use identity, and you'd even save a couple characters, so it doesn't buy you much, but it can be self-documenting.
Besides what acjay have already mentioned, Identity function is extremely useful in conjunction with implicit parameters.
Suppose you have some function like this:
implicit def foo[B](b: B)(implicit converter: B => A) = ...
In this case, Identity function will be used as an implicit converter when some instance of B <: A is passed as a function first argument.
If you are not familiar with implicit conversions and how to use implicit parameters to chain them, read this: http://docs.scala-lang.org/tutorials/FAQ/chaining-implicits.html

The easiest way to write {1, 2, 4, 8, 16 } in Scala

I was advertising Scala to a friend (who uses Java most of the time) and he asked me a challenge: what's the way to write an array {1, 2, 4, 8, 16} in Scala.
I don't know functional programming that well, but I really like Scala. However, this is a iterative array formed by (n*(n-1)), but how to keep track of the previous step? Is there a way to do it easily in Scala or do I have to write more than one line of code to achieve this?
Array.iterate(1, 5)(2 * _)
or
Array.iterate(1, 5)(n => 2 * n)
Elaborating on this as asked for in comment. Don't know what you want me to elaborate on, hope you will find what you need.
This is the function iterate(start,len)(f) on object Array (scaladoc). That would be a static in java.
The point is to fill an array of len elements, from first value start and always computing the next element by passing the previous one to function f.
A basic implementation would be
import scala.reflect.ClassTag
def iterate[A: ClassTag](start: A, len: Int)(f: A => A): Array[A] = {
val result = new Array[A](len)
if (len > 0) {
var current = start
result(0) = current
for (i <- 1 until len) {
current = f(current)
result(i) = current
}
}
result
}
(the actual implementation, not much different can be found here. It is a little different mostly because the same code is used for different data structures, e.g List.iterate)
Beside that, the implementation is very straightforward . The syntax may need some explanations :
def iterate[A](...) : Array[A] makes it a generic methods, usable for any type A. That would be public <A> A[] iterate(...) in java.
ClassTag is just a technicality, in scala as in java, you normally cannot create an array of a generic type (java new E[]), and the : ClassTag asks the compiler to add some magic which is very similar to adding at method declaration, and passing at call site, a class<A> clazz parameter in java, which can then be used to create the array by reflection. If you do e.g List.iterate rather than Array.iterate, it is not needed.
Maybe more surprising, the two parameters lists, one with start and len, and then in a separate parentheses, the one with f. Scala allows a method to have severals parameters lists. Here the reason is the peculiar way scala does type inference : Looking at the first parameter list, it will determine what is A, based on the type of start. Only afterwards, it will look at the second list, and then it knows what type A is. Otherwise, it would need to be told, so if there had been only one parameter list, def iterate[A: ClassTag](start: A, len: Int, f: A => A),
then the call should be either
Array.iterate(1, 5, n : Int => 2 * n)
Array.iterate[Int](1, 5, n => 2 * n)
Array.iterate(1, 5, 2 * (_: int))
Array.iterate[Int](1, 5, 2 * _)
making Int explicit one way or another. So it is common in scala to put function arguments in a separate argument list. The type might be much longer to write than just 'Int'.
A => A is just syntactic sugar for type Function1[A,A]. Obviously a functional language has functions as (first class) values, and a typed functional language has types for functions.
In the call, iterate(1, 5)(n => 2 * n), n => 2 * n is the value of the function. A more complete declaration would be {n: Int => 2 * n}, but one may dispense with Int for the reason stated above. Scala syntax is rather flexible, one may also dispense with either the parentheses or the brackets. So it could be iterate(1, 5){n => 2 * n}. The curlies allow a full block with several instruction, not needed here.
As for immutability, Array is basically mutable, there is no way to put a value in an array except to change the array at some point. My implementation (and the one in the library) also use a mutable var (current) and a side-effecting for, which is not strictly necessary, a (tail-)recursive implementation would be only a little longer to write, and just as efficient. But a mutable local does not hurt much, and we are already dealing with a mutable array anyway.
always more than one way to do it in Scala:
scala> (0 until 5).map(1<<_).toArray
res48: Array[Int] = Array(1, 2, 4, 8, 16)
or
scala> (for (i <- 0 to 4) yield 1<<i).toArray
res49: Array[Int] = Array(1, 2, 4, 8, 16)
or even
scala> List.fill(4)(1).scanLeft(1)(2*_+0*_).toArray
res61: Array[Int] = Array(1, 2, 4, 8, 16)
The other answers are fine if you happen to know in advance how many entries will be in the resulting list. But if you want to take all of the entries up to some limit, you should create an Iterator, use takeWhile to get the prefix you want, and create an array from that, like so:
scala> Iterator.iterate(1)(2*_).takeWhile(_<=16).toArray
res21: Array[Int] = Array(1, 2, 4, 8, 16)
It all boils down to whether what you really want is more correctly stated as
the first 5 powers of 2 starting at 1, or
the powers of 2 from 1 to 16
For non-trivial functions you almost always want to specify the end condition and let the program figure out how many entries there are. Of course your example was simple, and in fact the real easiest way to create that simple array is just to write it out literally:
scala> Array(1,2,4,8,16)
res22: Array[Int] = Array(1, 2, 4, 8, 16)
But presumably you were asking for a general technique you could use for arbitrarily complex problems. For that, Iterator and takeWhile are generally the tools you need.
You don't have to keep track of the previous step. Also, each element is not formed by n * (n - 1). You probably meant f(n) = f(n - 1) * 2.
Anyway, to answer your question, here's how you do it:
(0 until 5).map(math.pow(2, _).toInt).toArray

Scala newbie: recursion and stackoverflow error

As a Scala newbie, I'm reading books, docs and try to solve problems found on http://aperiodic.net/phil/scala/s-99/ . It seems correct Scala code is based on immutable values (val) and on recursion rather than on loops and variables in order to make parallelism safer and to avoid need to use locks.
For example, a possible solution for exercise P22 ( http://aperiodic.net/phil/scala/s-99/p22.scala ) is :
// Recursive.
def rangeRecursive(start: Int, end: Int): List[Int] =
if (end < start) Nil
else start :: rangeRecursive(start + 1, end)
Of course this code is compact and looks smart but, of course, if the number of recursion is high, you'll face a StackOverflow error (rangeRecusrsive(1,10000) for example with no JVM tuning). If you look at the source of the built in List.range (https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/immutable/List.scala#L1), you'll see that loops and vars are used.
My question is how to manage the influence of the Scala learning stuff which is promoting vals and recursion knowing that such code can break due to the number of recursion?
The nice thing about Scala is that you can easy your way into it. Starting out, you can write loops, and do more with recursion as you grow more comfortable with the language. You can't do this with the more 'pure' functional languages such as Clojure or Haskell. In other words, you can get comfortable with immutability and val, and move on to recursion later.
When you do start with recursion, you should look up tail call recursion. If the recursive call is the last call in the function, the Scala compiler will optimize this into a loop in bytecode. That way, you won't get StackOverflowErrors. Also, if you add the #tailrec annotation to your recursive function, the compiler will warn you if your function is not tail call recursive.
For example, the function in your question is not tail call recursive. It looks like the call to rangeRecursive is the last one in the function, but when this call returns, it still has to append start to the result of the call. Therefore, it cannot be tail call recursive: it still has to do work when the call returns.
Here's an example of making that method tail recursive. The #tailrec annotation isn't necessary, the compiler will optimize without it. But having it makes the compiler flag an error when it can't do the optimization.
scala> def rangeRecursive(start: Int, end: Int): List[Int] = {
| #scala.annotation.tailrec
| def inner(accum : List[Int], start : Int) : List[Int] = {
| if (end < start) accum.reverse
| else inner(start :: accum, start + 1)
| }
|
| inner(Nil, start)
| }
rangeRecursive: (start: Int,end: Int)List[Int]
scala> rangeRecursive(1,10000)
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,...
It uses a common technique called "accumulator passing style" where intermediate results are accumulated and passed to the next step in the recursion. The bottom most step is responsible for returning the accumulated result. In this case the accumulator happens to build its result backwards so the base case has to reverse it.
If you rewrite the above so that it is tail recursive the compiler will optimize the code into a while loop. Addititonally you can use the #tailrec annotation to get an error when the method it is annotating is not tail recursive. Thus enabling you to know "when you got it right".
Here is an alternative to James Iry's answer, with the same behaviour:
def rangeRecursive(start: Int, end: Int): List[Int] = {
def inner(start : Int) : Stream[Int] = {
if (end < start) Stream.empty
else start #:: inner(start + 1)
}
inner(start).toList
}
scala> rangeRecursive(1,10000)
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,...
This does not throw a StackOverflowError because the Stream.cons-operator (#::) stores the tail by reference. In other words, the Stream elements are not computed until stream.toList is invoked.
In my opinion, this is more readable than the accumulator pattern because it most closely resembles the naive initial algorithm (just replace :: by #:: and Nil by Stream.empty). Also, there is no need for accum.reverse, which could easily be forgotten.

Should Scala's map() behave differently when mapping to the same type?

In the Scala Collections framework, I think there are some behaviors that are counterintuitive when using map().
We can distinguish two kinds of transformations on (immutable) collections. Those whose implementation calls newBuilder to recreate the resulting collection, and those who go though an implicit CanBuildFrom to obtain the builder.
The first category contains all transformations where the type of the contained elements does not change. They are, for example, filter, partition, drop, take, span, etc. These transformations are free to call newBuilder and to recreate the same collection type as the one they are called on, no matter how specific: filtering a List[Int] can always return a List[Int]; filtering a BitSet (or the RNA example structure described in this article on the architecture of the collections framework) can always return another BitSet (or RNA). Let's call them the filtering transformations.
The second category of transformations need CanBuildFroms to be more flexible, as the type of the contained elements may change, and as a result of this, the type of the collection itself maybe cannot be reused: a BitSet cannot contain Strings; an RNA contains only Bases. Examples of such transformations are map, flatMap, collect, scanLeft, ++, etc. Let's call them the mapping transformations.
Now here's the main issue to discuss. No matter what the static type of the collection is, all filtering transformations will return the same collection type, while the collection type returned by a mapping operation can vary depending on the static type.
scala> import collection.immutable.TreeSet
import collection.immutable.TreeSet
scala> val treeset = TreeSet(1,2,3,4,5) // static type == dynamic type
treeset: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2, 3, 4, 5)
scala> val set: Set[Int] = TreeSet(1,2,3,4,5) // static type != dynamic type
set: Set[Int] = TreeSet(1, 2, 3, 4, 5)
scala> treeset.filter(_ % 2 == 0)
res0: scala.collection.immutable.TreeSet[Int] = TreeSet(2, 4) // fine, a TreeSet again
scala> set.filter(_ % 2 == 0)
res1: scala.collection.immutable.Set[Int] = TreeSet(2, 4) // fine
scala> treeset.map(_ + 1)
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 3, 4, 5, 6) // still fine
scala> set.map(_ + 1)
res3: scala.collection.immutable.Set[Int] = Set(4, 5, 6, 2, 3) // uh?!
Now, I understand why this works like this. It is explained there and there. In short: the implicit CanBuildFrom is inserted based on the static type, and, depending on the implementation of its def apply(from: Coll) method, may or may not be able to recreate the same collection type.
Now my only point is, when we know that we are using a mapping operation yielding a collection with the same element type (which the compiler can statically determine), we could mimic the way the filtering transformations work and use the collection's native builder. We can reuse BitSet when mapping to Ints, create a new TreeSet with the same ordering, etc.
Then we would avoid cases where
for (i <- set) {
val x = i + 1
println(x)
}
does not print the incremented elements of the TreeSet in the same order as
for (i <- set; x = i + 1)
println(x)
So:
Do you think this would be a good idea to change the behavior of the mapping transformations as described?
What are the inevitable caveats I have grossly overlooked?
How could it be implemented?
I was thinking about something like an implicit sameTypeEvidence: A =:= B parameter, maybe with a default value of null (or rather an implicit canReuseCalleeBuilderEvidence: B <:< A = null), which could be used at runtime to give more information to the CanBuildFrom, which in turn could be used to determine the type of builder to return.
I looked again at it, and I think your problem doesn't arise from a particular deficiency of Scala collections, but rather a missing builder for TreeSet. Because the following does work as intended:
val list = List(1,2,3,4,5)
val seq1: Seq[Int] = list
seq1.map( _ + 1 ) // yields List
val vector = Vector(1,2,3,4,5)
val seq2: Seq[Int] = vector
seq2.map( _ + 1 ) // yields Vector
So the reason is that TreeSet is missing a specialised companion object/builder:
seq1.companion.newBuilder[Int] // ListBuffer
seq2.companion.newBuilder[Int] // VectorBuilder
treeset.companion.newBuilder[Int] // Set (oops!)
So my guess is, if you take proper provision for such a companion for your RNA class, you may find that both map and filter work as you wish...?