Describe a computation - scala

I have the following code, that I would like to know, why the variable number gets evaluated twice:
import cats.effect.IO
import scala.util.Random
object Main {
private val number: IO[Int] =
IO(Random.between(3, 300))
def main(args: Array[String]): Unit = {
number
.flatMap { e =>
println(e);
number
}.flatMap { e =>
println(e);
IO.unit
}.unsafeRunSync()
}
}
the program prints two different numbers, although the number is an assignment. I know here, I describe a computation not a execution, and at the end of the universe, I run the program.
The question is, why it prints out two different numbers?

There is a difference between
private val number: IO[Int] = IO(Random.nextInt())
and
private val number2: Int = Random.nextInt()
number is a value that when evaluated computes a random number. When evaluated multiple times this value of type IO (aka this computation) is run multiple times resulting in multiple different random numbers.
Whereas number2 when evaluated is just a single number.
It is very similar to the distinction between a lambda (val lambda = () => Random.nextInt()) and a value (val value = Random.nextInt()).

IO is a bit similar to the following scenario
final case class SuspendedComputation[T](f: () => T) {
def run: T = f()
}
val v = SuspendedComputation(Random.nextInt)
v.run
v.run
which outputs something like
v: SuspendedComputation[Int] = SuspendedComputation(<function>
res2: Int = -1062309211
res3: Int = 765640585
Note how SuspendedComputation internally stores computation as () => Random.nextInt and then uses run method to actually evaluate computation f.
Similarly, IO.apply accepts an argument by-name : => A and eventually constructs Delay object that stores the un-evaluated computation in a field as () => A and then uses unsafeRunSync to actually evaluate computation.

Related

Confused about cats-effect Async.memoize

I'm fairly new to cats-effect, but I think I am getting a handle on it. But I have come to a situation where I want to memoize the result of an IO, and it's not doing what I expect.
The function I want to memoize transforms String => String, but the transformation requires a network call, so it is implemented as a function String => IO[String]. In a non-IO world, I'd simply save the result of the call, but the defining function doesn't actually have access to it, as it doesn't execute until later. And if I save the constructed IO[String], it won't actually help, as that IO would repeat the network call every time it's used. So instead, I try to use Async.memoize, which has the following documentation:
Lazily memoizes f. For every time the returned F[F[A]] is bound, the
effect f will be performed at most once (when the inner F[A] is bound
the first time).
What I expect from memoize is a function that only ever executes once for a given input, AND where the contents of the returned IO are only ever evaluated once; in other words, I expect the resulting IO to act as if it were IO.pure(result), except the first time. But that's not what seems to be happening. Instead, I find that while the called function itself only executes once, the contents of the IO are still evaluated every time - exactly as would occur if I tried to naively save and reuse the IO.
I constructed an example to show the problem:
def plus1(num: Int): IO[Int] = {
println("foo")
IO(println("bar")) *> IO(num + 1)
}
var fooMap = Map[Int, IO[IO[Int]]]()
def mplus1(num: Int): IO[Int] = {
val check = fooMap.get(num)
val res = check.getOrElse {
val plus = Async.memoize(plus1(num))
fooMap = fooMap + ((num, plus))
plus
}
res.flatten
}
println("start")
val call1 = mplus1(2)
val call2 = mplus1(2)
val result = (call1 *> call2).unsafeRunSync()
println(result)
println(fooMap.toString)
println("finish")
The output of this program is:
start
foo
bar
bar
3
Map(2 -> <function1>)
finish
Although the plus1 function itself only executes once (one "foo" printed), the output "bar" contained within the IO is printed twice, when I expect it to also print only once. (I have also tried flattening the IO returned by Async.memoize before storing it in the map, but that doesn't do much).
Consider following examples
Given the following helper methods
def plus1(num: Int): IO[IO[Int]] = {
IO(IO(println("plus1")) *> IO(num + 1))
}
def mPlus1(num: Int): IO[IO[Int]] = {
Async.memoize(plus1(num).flatten)
}
Let's build a program that evaluates plus1(1) twice.
val program1 = for {
io <- plus1(1)
_ <- io
_ <- io
} yield {}
program1.unsafeRunSync()
This produces the expected output of printing plus1 twice.
If you do the same but instead using the mPlus1 method
val program2 = for {
io <- mPlus1(1)
_ <- io
_ <- io
} yield {}
program2.unsafeRunSync()
It will print plus1 just once confirming that memoization is working.
The trick with the memoization is that it should be evaluated only once to have the desired effect. Consider now the following program that highlights it.
val memIo = mPlus1(1)
val program3 = for {
io1 <- memIo
io2 <- memIo
_ <- io1
_ <- io2
} yield {}
program3.unsafeRunSync()
And it outputs plus1 twice as io1 and io2 are memoized separately.
As for your example, the foo is printed once because you're using a map and update the value when it's not found and this happens only once. The bar is printed every time when IO is evaluated as you lose the memoization effect by calling res.flatten.

Cats Writer Vector is empty

I wrote this simple program in my attempt to learn how Cats Writer works
import cats.data.Writer
import cats.syntax.applicative._
import cats.syntax.writer._
import cats.instances.vector._
object WriterTest extends App {
type Logged2[A] = Writer[Vector[String], A]
Vector("started the program").tell
val output1 = calculate1(10)
val foo = new Foo()
val output2 = foo.calculate2(20)
val (log, sum) = (output1 + output2).pure[Logged2].run
println(log)
println(sum)
def calculate1(x : Int) : Int = {
Vector("came inside calculate1").tell
val output = 10 + x
Vector(s"Calculated value ${output}").tell
output
}
}
class Foo {
def calculate2(x: Int) : Int = {
Vector("came inside calculate 2").tell
val output = 10 + x
Vector(s"calculated ${output}").tell
output
}
}
The program works and the output is
> run-main WriterTest
[info] Compiling 1 Scala source to /Users/Cats/target/scala-2.11/classes...
[info] Running WriterTest
Vector()
50
[success] Total time: 1 s, completed Jan 21, 2017 8:14:19 AM
But why is the vector empty? Shouldn't it contain all the strings on which I used the "tell" method?
When you call tell on your Vectors, each time you create a Writer[Vector[String], Unit]. However, you never actually do anything with your Writers, you just discard them. Further, you call pure to create your final Writer, which simply creates a Writer with an empty Vector. You have to combine the writers together in a chain that carries your value and message around.
type Logged[A] = Writer[Vector[String], A]
val (log, sum) = (for {
_ <- Vector("started the program").tell
output1 <- calculate1(10)
foo = new Foo()
output2 <- foo.calculate2(20)
} yield output1 + output2).run
def calculate1(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate1").tell
output = 10 + x
_ <- Vector(s"Calculated value ${output}").tell
} yield output
class Foo {
def calculate2(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate2").tell
output = 10 + x
_ <- Vector(s"calculated ${output}").tell
} yield output
}
Note the use of for notation. The definition of calculate1 is really
def calculate1(x: Int): Logged[Int] = Vector("came inside calculate1").tell.flatMap { _ =>
val output = 10 + x
Vector(s"calculated ${output}").tell.map { _ => output }
}
flatMap is the monadic bind operation, which means it understands how to take two monadic values (in this case Writer) and join them together to get a new one. In this case, it makes a Writer containing the concatenation of the logs and the value of the one on the right.
Note how there are no side effects. There is no global state by which Writer can remember all your calls to tell. You instead make many Writers and join them together with flatMap to get one big one at the end.
The problem with your example code is that you're not using the result of the tell method.
If you take a look at its signature, you'll see this:
final class WriterIdSyntax[A](val a: A) extends AnyVal {
def tell: Writer[A, Unit] = Writer(a, ())
}
it is clear that tell returns a Writer[A, Unit] result which is immediately discarded because you didn't assign it to a value.
The proper way to use a Writer (and any monad in Scala) is through its flatMap method. It would look similar to this:
println(
Vector("started the program").tell.flatMap { _ =>
15.pure[Logged2].flatMap { i =>
Writer(Vector("ended program"), i)
}
}
)
The code above, when executed will give you this:
WriterT((Vector(started the program, ended program),15))
As you can see, both messages and the int are stored in the result.
Now this is a bit ugly, and Scala actually provides a better way to do this: for-comprehensions. For-comprehension are a bit of syntactic sugar that allows us to write the same code in this way:
println(
for {
_ <- Vector("started the program").tell
i <- 15.pure[Logged2]
_ <- Vector("ended program").tell
} yield i
)
Now going back to your example, what I would recommend is for you to change the return type of compute1 and compute2 to be Writer[Vector[String], Int] and then try to make your application compile using what I wrote above.

what is the difference between passing parameters () => T and just T

specifically, what is the difference for following two definitions:
def func(f: () => String) = f()
def func1(s: String) = s
I wrote some codes to test them, it seems they produce the same result; are those two definitions just the same in this scenario; or they do have some difference?
var x = 1
def f() = {
x = x + 1
s"$x"
}
println(func1(f))
println(func1(f))
println(func1(f))
println(func(f))
println(func(f))
println(func(f))
They are perhaps the same in THIS scenario, but there are lots of other scenarios when a () => A and a A are much different. () => A is referred to as a thunk and is used to pass a piece of delayed computation to a function. The body of the "thunk" doesn't get evaluated until the function called decides to evaluate it. Otherwitse, the value of the argument being passed in is evaluated by the caller.
Consider this example, where there IS a difference between the version that takes a thunk and the version that just takes a value:
object Thunk {
def withThunk(f: () ⇒ String): Unit = {
println("withThunk before")
println("the thunk's value is: " + f())
println("now the thunk's value is: " + f())
}
def withoutThunk(f: String): Unit = {
println("withoutThunk before")
println("now the value's value is: " + f)
}
def main(argv: Array[String]): Unit = {
withThunk { () ⇒ println("i'm inside a thunk"); "thunk value" }
println("------------")
withoutThunk { println("i'm not inside a thunk"); "just a value" }
}
}
This program will demonstrate some differences. In the thunk version, you see "withThunk before" get printed before the first time "i'm inside a thunk" gets printed, which gets printed twice, since f() is evaluated twice. In the non-thunk version, the "I'm not inside a thunk" gets printed before "withoutThunk before", since this is evaluated before being sent to the function as an argument.
def func(f: () => String) = f()
This one accepts as parameter a function that returns a string.
def func1(s: String) = s
While this one simple requires a String as parameter
Aside from the minor technical difference above, in this scenario they seem to function the same. However, the function parameter is potentially more powerful as it is a function that can derive its return value from several other operations. I think however, the main difference that the function parameter allows you to decide when the value is derived.
def func(f: () => String) = f()
def func1(s: String) = s
println(func1(f)) // in this case f is evaluated first, its value is used in func1.
println(func(f)) // in this case f is NOT evaluated, but passed as it is to func.
// It is upto func to call f whenever needed, or even not call it.
It is this "lazy" evaluation of f, that makes func more useful, for instance f can be passed to some other higher order functions, or it can be called asynchronously.
I want to add an example for the use of () => String.
def printFuncResult(f: () => String) = println(f() + " " + f())
def ran = () => Math.random.toString
printFuncResult(ran)
printFuncResult(Math.random.toString)
When you pass a function like ran then you will most likely have two different values printed (randomness is involved here). When you pass a fixed Random number then it will be printed twice.
As you can see: when you have a function as a parameter it might result in a different value each time it is used in printFuncResult. This is not possible when you just put a String parameter.

What does "code: => Unit" mean in scala?

Does anyone know the type of => Unit in scala? I don't know the meaning of => Unit and how to use it. I defined a function like below:
def test(code: => Unit){
print("start ...")
code
print("end ....")
}
test(print(1))
Does it means a function with any arguments returning Unit?
Thanks
This kind of parameter are called by-name parameter
=> B represents a block a of code which return a B value, their purpose is that they are evaluated only when you call the parameter.
def foo(code: => Int) {
println("Not yet evaluated")
val result = code
println("parameter evaluated %s, is it an int ? %s " format (
result, result.isInstanceOf[Int]) )
}
And you can call foo in the following way :
foo(1)
or
val a = 3
val b = 5
foo {
val c = a * a
c * b
}
The other style of passing parameter is by-value : parameters are evaluated before they are sent to the method
def foo(code : Int) {
println("Parameter already evaluated")
val result = code
println("parameter evaluated : " + result)
}
References
Extract from the Book Functionnal Programming in Scala
More differences between by-name parameter and by-value parameter illustrated
In x: => Type the x is a call by name parameter. That's different than taking an argument which is a function taking no arguments: x: () => Type
This is called a by name parameter, as related to call-by-name parameter evaluation strategy. Please see the linked wikipedia article for similar, but not identical, ways of passing parameters.
To explain it better, let's consider first the two most common parameter evaluation strategies: call by value and call by reference.
Call by value is by far the most common evaluation strategy. It is the sole strategy in Java, for instance, and the default strategy in C. Consider, for instance, this simple Java program:
public class ByValue {
static public void inc(int y) {
y++;
}
static public void main(String... args) {
int x = 0;
inc(x);
System.out.println(x);
}
}
It will print 0, because x's value is copied to y, so that when y is incremented it doesn't change the original value in x. Contrast this with this C++ program with call-by-reference:
#include <stdio.h>
void inc(int &y) {
y++;
}
int main() {
int x = 0;
inc(x);
printf("%d\n", x);
}
This will print 1, because the reference to x is passed to inc, instead of x's value.
Note that Java passes objects references by value, which leads some to claim it does call by reference. This is not true, to the extent that if you were to assign a new object to a parameter of a function, it would not be reflected in the function's caller.
So, what does a call by name looks like? In a call by name, neither value nor reference is passed. Instead, the whole code is passed, and everywhere the parameter is used, the code is executed and its result used. For example:
object ByName {
def incIfZero(y: => Int): Int = if (y == 0) y + 1 else y
def main(args: Array[String]) {
var x = 0
x = incIfZero( { val tmp = x; x += 1; tmp } )
println(x)
}
}
This example prints 2 instead of 1, because the block of code passed as parameter is evaluted twice. When executing, it's as if the second line in the main was written like this:
x = if ({ val tmp = x; x += 1; tmp }) { val tmp = x; x += 1; tmp } + 1 else { val tmp = x; x += 1; tmp }
Now, by name parameters have at least three interesting uses:
It can be used to delay the execution of something until the proper time.
It can be used to avoid the execution in some situations.
It can be used to execute some block of code multiple times.
The first and last cases are pretty obvious, I think. Here's an example of the second case:
implicit def fromBoolean(b: Boolean) = new {
def and(that: => Boolean): Boolean = if (b) that else b }
val x = 0
(x > 0) and (10 / x > 0)
If that was not a by name parameter, there would be an exception thrown at the last line. As it is, it will just return false.
It means call-by-name, which basically means that the value is calculated when used in your function. As opposed to call-by-value which is default in Java (and Scala with regular type syntax), where the value is calculated before the method is called. A Unit type would not make much sense in call-by-value I guess..
This is typically used to create functions that behave as custom control structures, or things like timing or logging as in your example. Stuff you often use AOP to do in Java.

How do I create a partial function with generics in scala?

I'm trying to write a performance measurements library for Scala. My idea is to transparently 'mark' sections so that the execution time can be collected. Unfortunately I wasn't able to bend the compiler to my will.
An admittedly contrived example of what I have in mind:
// generate a timing function
val myTimer = mkTimer('myTimer)
// see how the timing function returns the right type depending on the
// type of the function it is passed to it
val act = actor {
loop {
receive {
case 'Int =>
val calc = myTimer { (1 to 100000).sum }
val result = calc + 10 // calc must be Int
self reply (result)
case 'String =>
val calc = myTimer { (1 to 100000).mkString }
val result = calc + " String" // calc must be String
self reply (result)
}
Now, this is the farthest I got:
trait Timing {
def time[T <: Any](name: Symbol)(op: => T) :T = {
val start = System.nanoTime
val result = op
val elapsed = System.nanoTime - start
println(name + ": " + elapsed)
result
}
def mkTimer[T <: Any](name: Symbol) : (() => T) => () => T = {
type c = () => T
time(name)(_ : c)
}
}
Using the time function directly works and the compiler correctly uses the return type of the anonymous function to type the 'time' function:
val bigString = time('timerBigString) {
(1 to 100000).mkString("-")
}
println (bigString)
Great as it seems, this pattern has a number of shortcomings:
forces the user to reuse the same symbol at each invocation
makes it more difficult to do more advanced stuff like predefined project-level timers
does not allow the library to initialize once a data structure for 'timerBigString
So here it comes mkTimer, that would allow me to partially apply the time function and reuse it. I use mkTimer like this:
val myTimer = mkTimer('aTimer)
val myString= myTimer {
(1 to 100000).mkString("-")
}
println (myString)
But I get a compiler error:
error: type mismatch;
found : String
required: () => Nothing
(1 to 100000).mkString("-")
I get the same error if I inline the currying:
val timerBigString = time('timerBigString) _
val bigString = timerBigString {
(1 to 100000).mkString("-")
}
println (bigString)
This works if I do val timerBigString = time('timerBigString) (_: String), but this is not what I want. I'd like to defer typing of the partially applied function until application.
I conclude that the compiler is deciding the return type of the partial function when I first create it, chosing "Nothing" because it can't make a better informed choice.
So I guess what I'm looking for is a sort of late-binding of the partially applied function. Is there any way to do this? Or maybe is there a completely different path I could follow?
Well, thanks for reading this far
-teo
The usual pattern when you want "lazy" generics is to use a class with an apply method
class Timer(name: Symbol) {
def apply[T](op: => T) = time(name)(op)
}
def mkTimer(name: Symbol) = new Timer(name)