lazy val function vs def method - scala

When calling to a function from external class, in case of many calls, what will give me a better performance, lazy val function or def method?
So far, what I understood is:
def method-
Defined and tied to a class, needed to be declare inside "object" in order to be called as java static style.
Call-by-name, evaluated only when accessed, and every accessed.
lazy val lambda expression -
Tied to object Function1/2...22
Call-by-value, evaluated the first time get accessed and evaluated only one time.
Is actually def apply method tied to a class.
So, it may seem that using lazy val will reduce the need to evaluate the function every time, should it be preferred ?
I faced that when i'm producing UDF for Spark code, and i'm trying to understand which approach is better.
object sql {
def emptyStringToNull(str: String): Option[String] = {
Option(str).getOrElse("").trim match {
case "" => None
case "[]" => None
case "null" => None
case _ => Some(str.trim)
}
}
def udfEmptyStringToNull: UserDefinedFunction = udf(emptyStringToNull _)
def repairColumn_method(dataFrame: DataFrame, colName: String): DataFrame = {
dataFrame.withColumn(colName, udfEmptyStringToNull(col(colName)))
}
lazy val repairColumn_fun: (DataFrame, String) => DataFrame = { (df,colName) =>
df.withColumn(colName, udfEmptyStringToNull(col(colName)))
}
}

There's no need for you to use a lazy val in this specific case. When you assign a function to a lazy val, its results are not memoized, as you seem to think they are. Since the function itself is a plain function literal and not the result of an expensive computation (regardless of what goes on inside it), making it lazy is not useful. All it does is add overhead when accessing and calling it. A simple val would be better, but making it a proper method would be best.
If you want memoization, see Is there a generic way to memoize in Scala? instead.
Ignoring your specific example, if the def in question didn't take any arguments and both it and the lazy val were simple values that were expensive to compute, I would go with the lazy val if you're going to call it many times to avoid computing it over and over again.
If they were values that were very cheap to compute and you're not going to call it many times, or if they're expensive to compute but you're only going to call them once, I would go with a def instead. There wouldn't be much difference if you used a lazy val instead, but it would avoid making a couple of fields.
If they're somewhat cheap to compute but they're being called many times, it may be better to use a lazy val simply because they'll be cached. However, you might want to look at your overall design before looking at such micro-optimizations.

Related

why use a method with no parameter lists over a val

I came across this function in Scala def nullable: Boolean = true. I understand what does this function do, but I want to know is there specific name for this kind of function, and what's the motivation not using var
Firstly, I would be very precise in scala: use the word Function to only ever mean an instance of FunctionN and use the word Method when talking about a def (which may have zero or more parameter lists). Secondly, this most definitely does have a body (albeit not enclosed in braces). Its body is the expression true (i.e. a boolean literal).
I assume that you really mean to ask: "why use a method with no parameter lists over a val?"
When deciding whether to represent some property of your class, you can choose between a method and a value (advice: avoid using var). Often, if the property involves no side effects, we can use a def with no parameter lists (the scala idiom is that a def with a single, empty parameter list implies side-effects).
Hence we may choose any of the following, all of which are semantically equivalent at the use-site (except for performance characteristics):
case class Foo(s: String) {
//Eager - we calculate and store the value regardless of whether
// it is ever used
val isEmpty = s.isEmpty
}
case class Foo(s: String) {
//Lazy - we calculate and store the value when it
// it is first used
lazy val isEmpty = s.isEmpty
}
case class Foo(s: String) {
//Non-strict - we calculate the value each time
// it is used
def isEmpty = s.isEmpty
}
Hence we might take the following advice
If the value is computationally expensive to calculate and we are sure we will use it multiple times, use val
If the value is computationally expensive and we may use it zero or many times, use lazy val
If the value is space-expensive and we think it will be generally used a most once, use def
However, there is an additional consideration; using a val (or lazy val) is likely to be of benefit to debugging using an IDE which generally can show you in an inspection window the value of any in-scope vals
The primary difference of the use of def or var/val is the when the value will be executed.
the def defines a name for a value, the value on the right will be executed when it is called (called by name), meaning it is lazy
and var defines a name for a value, and it is execute it's evaluated, eagerly upon definition

'tee' operation on Scala's option type?

Is there some sort of 'tee' operation on Option in Scala's standard library available? The best I could find is foreach, however its return type is Unit, therefore it cannot be chained.
This is what I am looking for: given an Option instance, perform some operation with side effects on its value if the option is not empty (Some[A]), otherwise do nothing; return the option in any case.
I have a custom implementation using an implicit class, but I am wondering whether there is a more common way to do this without implicit conversion:
object OptionExtensions {
implicit class TeeableOption[A](value: Option[A]) {
def tee(action: A => Unit): Option[A] = {
value foreach action
value
}
}
}
Example code:
import OptionExtensions._
val option: Option[Int] = Some(42)
option.tee(println).foreach(println) // will print 42 twice
val another: Option[Int] = None
another.tee(println).foreach(println) // does nothing
Any suggestions?
In order to avoid implicit conversion, instead of using method chaining you can use function composition with k-combinator.
k-combinator gives you an idiomatic way to communicate the fact that you are going to perform a side effect.
Here is a short example:
object KCombinator {
def tap[A](a: A)(action: A => Any): A = {
action(a)
a
}
}
import KCombinator._
val func = ((_: Option[Int]).getOrElse(0))
.andThen(tap(_)(println))
.andThen(_ + 3)
.andThen(tap(_)(println))
If we call our func with an argument of Option(3) the result will be an Int with the value of 6
and this is how the console will look like:
3
6
There is not an existing way to accomplish this in the standard library, because side effects are minimized and isolated in functional programming. Depending on what your actual goals are, there are a couple different ways to idiomatically accomplish your task.
In the case of doing a lot of println commands, instead of sprinkling them throughout your algorithm, you would typically gather them in a collection, then do one foreach println at the end. This minimizes the side effects to the smallest possible impact. That goes with any other side effect. Try to find a way to squeeze it into the smallest possible space.
If you are trying to chain a series of "actions," you should look into futures. Futures basically treat an action as a value, and provide a lot of useful functions to work with them.
Simply use map, and make your side effecting functions conform to action: A => A rather than action: A => Unit.
def tprintln[A](a: A): A = {
println(a)
a
}
another.map(tprintln).foreach(println)

Yield mutable.seq from mutable.traversable type in Scala

I have a variable underlying of type Option[mutable.Traversable[Field]]
All I wanted todo in my class was provide a method to return this as Sequence in the following way:
def toSeq: scala.collection.mutable.Seq[Field] = {
for {
f <- underlying.get
} yield f
}
This fails as it complains that mutable.traversable does not conform to mutable.seq. All it is doing is yielding something of type Field - in my mind this should work?
A possible solution to this is:
def toSeq: Seq[Field] = {
underlying match {
case Some(x) => x.toSeq
case None =>
}
}
Although I have no idea what is actually happening when x.toSeq is called and I imagine there is more memory being used here that actually required to accomplish this.
An explanation or suggestion would be much appreciated.
I am confused why you say that "I imagine there is more memory being used here than actually required to accomplish". Scala will not copy your Field values when doing x.toSeq, it is simply going to create an new Seq which will have pointers to the same Field values that underlying is pointing to. Since this new structure is exactly what you want there is no avoiding the additional memory associated with the extra pointers (but the amount of additional memory should be small). For a more in-depth discussion see the wiki on persistent data structures.
Regarding your possible solution, it could be slightly modified to get the result you're expecting:
def toSeq : Seq[Field] =
underlying
.map(_.toSeq)
.getOrElse(Seq.empty[Field])
This solution will return an empty Seq if underlying is a None which is safer than your original attempt which uses get. I say it's "safer" because get throws a NoSuchElementException if the Option is a None whereas my toSeq can never fail to return a valid value.
Functional Approach
As a side note: when I first started programming in scala I would write many functions of the form:
def formatSeq(seq : Seq[String]) : Seq[String] =
seq map (_.toUpperCase)
This is less functional because you are expecting a particular collection type, e.g. formatSeq won't work on a Future.
I have found that a better approach is to write:
def formatStr(str : String) = str.toUpperCase
Or my preferred coding style:
val formatStr = (_ : String).toUpperCase
Then the user of your function can apply formatStr in any fashion they want and you don't have to worry about all of the collection casting:
val fut : Future[String] = ???
val formatFut = fut map formatStr
val opt : Option[String] = ???
val formatOpt = opt map formatStr

Using getOrElseUpdate of TrieMap in Scala

I am using the getOrElseUpdate method of scala.collection.concurrent.TrieMap (from 2.11.6)
// simplified for clarity
val trie = new TrieMap[Int, Future[String]]
def foo(): String = ... // a very long process
val fut: Future[String] = trie.getOrElseUpdate(id, Future(foo()))
As I understand, if I invoke the getOrElseUpdate in multiple threads without any synchronization the foo is invoked just once.
Is it correct ?
The current implementation is that it will be invoked zero or one times. It may be invoked without the result being inserted, however. (This is standard behavior for CAS-based maps as opposed to ones that use synchronized.)

Scala: Thread safe mutable lazy Iterator with append

For an immutable flavour, Iterator does the job.
val x = Iterator.fill(100000)(someFn)
Now I want to implement a mutable version of Iterator, with three guarantees:
thread-safe on all transformations(fold, foldLeft, ..) and append
lazy evaluated
traversable only once! Once used, an object from this Iterator should be destroyed.
Is there an existing implementation to give me these guarantees? Any library or framework example would be great.
Update
To illustrate the desired behaviour.
class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
def add(thing: SomeThing): Test = {
new Test(list ++ Iterator(thing))
}
}
(new Test()).add(new SomeThing).add(new SomeThing);
In this example, SomeThing is an expensive construct, it needs to be lazy.
Re-iterating over list is never required, Iterator is a good fit.
This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.
You don't need a mutable Iterator for this, just daisy-chain the immutable form:
class SomeThing {}
case class Test(val list: Iterator[SomeThing]) {
def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}
(new Test()).add(new SomeThing).add(new SomeThing)
Although you don't really need the extra boilerplate of Test here:
Iterator(new SomeThing) ++ Iterator(new SomeThing)
Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.
You might also want to try this, to avoid building intermediate Iterators:
Iterator.continually(new SomeThing) take 2
UPDATE
If you don't know the size in advance, then I'll often use a tactic like this:
def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }
The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None
Of course... If you're really pushing out the boat, you can even use the dreaded null:
def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)
Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:
abstract class AppendableIterator[A] extends Iterator[A]{
protected var inner: Iterator[A]
def hasNext = inner.hasNext
def next() = inner next ()
def append(that: Iterator[A]) = synchronized{
inner = new JoinedIterator(inner, that)
}
}
//You might need to add some more things, this is a skeleton
class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
def hasNext = first.hasNext || second.hasNext
def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
}
So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom.
This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.
Have you looked at the mutable.ParIterable in the collection.parallel package?
To access an iterator over elements you can do something like
val x = ParIterable.fill(100000)(someFn).iterator
From the docs:
Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.
...
The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.