What is the difference between while(true) and loop? - scala

I don't get the actual (semantic) difference between the two "expressions".
It is said "loop" fits to "react" and "while(true)" to "receive", because "react" does not return and "loop" is a function which calls the body again all over again (it least this is what I deduct from the sources - I am not really familiar with the used "andThen"). "Receive" blocks one Thread from the pool, "react" does not. However, for "react" a Thread is looked up which the function can be attached to.
So the question is: why can't I use "loop" with "receive"? It also seems to behave different (and better!) than the "while(true)" variant, at least this is what I observe in a profiler.
Even more strange is that calling a ping-pong with "-Dactors.maxPoolSize=1 -Dactors.corePoolSize=1" with "while(true)" and "receive" blocks immediately (that's what I would expect) - however, with "loop" and "receive", it works without problems - in one Thread - how's this?
Thanks!

The critical difference between while and loop is that while restricts the loop iterations to occur in the same thread. The loop construct (as described by Daniel) enables the actor sub-system to invoke the reactions on any thread it chooses.
Hence using a combination of receive within while (true) ties an actor to a single thread. Using loop and react allows you run support many actors on a single thread.

The method loop is defined in the object Actor:
private[actors] trait Body[a] {
def andThen[b](other: => b): Unit
}
implicit def mkBody[a](body: => a) = new Body[a] {
def andThen[b](other: => b): Unit = self.seq(body, other)
}
/**
* Causes <code>self</code> to repeatedly execute
* <code>body</code>.
*
* #param body the code block to be executed
*/
def loop(body: => Unit): Unit = body andThen loop(body)
This is confusing, but what happens is that the block that comes after loop (the thing between { and }) is passed to the method seq as first argument, and a new loop with that block is passed as the second argument.
As for the method seq, in the trait Actor, we find:
private def seq[a, b](first: => a, next: => b): Unit = {
val s = Actor.self
val killNext = s.kill
s.kill = () => {
s.kill = killNext
// to avoid stack overflow:
// instead of directly executing `next`,
// schedule as continuation
scheduleActor({ case _ => next }, 1)
throw new SuspendActorException
}
first
throw new KillActorException
}
So, the new loop is scheduled for the next action after a kill, then the block gets executed, and then an exception of type KillActorException is thrown, which will cause the loop to be executed again.
So, a while loop performs much faster that a loop, as it throws no exceptions, does no scheduling, etc. On the other hand, the scheduler gets the opportunity to schedule something else between two executions of a loop.

Related

Mocked methods created within `withObjectMocked` not invoked when called from an `Actor`

I have published a minimal project showcasing my problem at https://github.com/Zwackelmann/mockito-actor-test
In my project I refactored a couple of components from classes to objects in all cases where the class did not really have a meaningful state. Since some of these objects establish connections to external services that need to be mocked, I was happy to see that mockito-scala introduced the withObjectMocked context function, which allows mocking objects within the scope of the function.
This feature worked perfectly for me until I introduced Actors in the mix, which would ignore the mocked functions despite being in the withObjectMocked context.
For an extended explanation what I did check out my github example project from above which is ready to be executed via sbt run.
My goal is to mock the doit function below. It should not be called during tests, so for this demonstration it simply throws a RuntimeException.
object FooService {
def doit(): String = {
// I don't want this to be executed in my tests
throw new RuntimeException(f"executed real impl!!!")
}
}
The FooService.doit function is only called from the FooActor.handleDoit function. This function is called by the FooActor after receiving the Doit message or when invoked directly.
object FooActor {
val outcome: Promise[Try[String]] = Promise[Try[String]]()
case object Doit
def apply(): Behavior[Doit.type] = Behaviors.receiveMessage { _ =>
handleDoit()
Behaviors.same
}
// moved out actual doit behavior so I can compare calling it directly with calling it from the actor
def handleDoit(): Unit = {
try {
// invoke `FooService.doit()` if mock works correctly it should return the "mock result"
// otherwise the `RuntimeException` from the real implementation will be thrown
val res = FooService.doit()
outcome.success(Success(res))
} catch {
case ex: RuntimeException =>
outcome.success(Failure(ex))
}
}
}
To mock Foo.doit I used withObjectMocked as follows. All following code is within this block. To ensure that the block is not left due to asynchronous execution, I Await the result of the FooActor.outcome Promise.
withObjectMocked[FooService.type] {
// mock `FooService.doit()`: The real method throws a `RuntimeException` and should never be called during tests
FooService.doit() returns {
"mock result"
}
// [...]
}
I now have two test setups: The first simply calls FooActor.handleDoit directly
def simpleSetup(): Try[String] = {
FooActor.handleDoit()
val result: Try[String] = Await.result(FooActor.outcome.future, 1.seconds)
result
}
The second setup triggers FooActor.handleDoit via the Actor
def actorSetup(): Try[String] = {
val system: ActorSystem[FooActor.Doit.type] = ActorSystem(FooActor(), "FooSystem")
// trigger actor to call `handleDoit`
system ! FooActor.Doit
// wait for `outcome` future. The 'real' `FooService.doit` impl results in a `Failure`
val result: Try[String] = Await.result(FooActor.outcome.future, 1.seconds)
system.terminate()
result
}
Both setups wait for the outcome promise to finish before exiting the block.
By switching between simpleSetup and actorSetup I can test both behaviors. Since both are executed within the withObjectMocked context, I would expect that both trigger the mocked function. However actorSetup ignores the mocked function and calls the real method.
val result: Try[String] = simpleSetup()
// val result: Try[String] = actorSetup()
result match {
case Success(res) => println(f"finished with result: $res")
case Failure(ex) => println(f"failed with exception: ${ex.getMessage}")
}
// simpleSetup prints: finished with result: mock result
// actorSetup prints: failed with exception: executed real impl!!!
Any suggestions?
withObjectMock relies on the code exercising the mock executing in the same thread as withObjectMock (see Mockito's implementation and see ThreadAwareMockHandler's check of the current thread).
Since actors execute on the threads of the ActorSystem's dispatcher (never in the calling thread), they cannot see such a mock.
You may want to investigate testing your actor using the BehaviorTestKit, which itself effectively uses a mock/stub implementation of the ActorContext and ActorSystem. Rather than spawning an actor, an instance of the BehaviorTestKit encapsulates a behavior and passes it messages which are processed synchronously in the testing thread (via the run and runOne methods). Note that the BehaviorTestKit has some limitations: certain categories of behaviors aren't really testable via the BehaviorTestKit.
More broadly, I'd tend to suggest that mocking in Akka is not worth the effort: if you need pervasive mocks, that's a sign of a poor implementation. ActorRef (especially of the typed variety) is IMO the ultimate mock: encapsulate exactly what needs to be mocked into its own actor with its own protocol and inject that ActorRef into the behavior under test. Then you validate that the behavior under test holds up its end of the protocol correctly. If you want to validate the encapsulation (which should be as simple as possible to the extent that it's obviously correct, but if you want/need to spend effort on getting those coverage numbers up...) you can do the BehaviorTestKit trick as above (and since the only thing the behavior is doing is exercising the mocked functionality, it almost certainly won't be in the category of behaviors which aren't testable with the BehaviorTestKit).

Can a synchronized block with a future cause a deadlock

Say I do the following:
def foo: Future[Int] = ...
var cache: Option[Int] = None
def getValue: Future[Int] = synchronized {
cache match {
case Some(value) => Future(value)
case None =>
foo.map { value =>
cache = Some(value)
value
}
}
}
Is there a risk of deadlock with the above code? Or can I assume that the synchronzied block applies even within the future map block?
For a deadlock to exist, at least two different lock operations are to be called (in a possibly out of order sequence).
From what you show here (but we do not see what the foo implementation is), this is not the case. Only one lock exist and it is reentrant (if you try to enter twice on the same syncrhronized block from the same thread, you won't lock yourself out).
Therefore, no deadlock is possible from the code you've shown.
Still, I question this design. Maybe it is a simplification of your actual code, but from what I understand, you have
A function that can generate a int
You want to call this function only once and cache its result
I'd simplify your implementation greatly if that's the case :
def expensiveComputation: Int = ???
val foo = Future { expensiveComputation() }
def getValue: Future[Int] = foo
You'd have a single call to expensiveComputation (per instance of your enclosing object), and a synchronized cache on its return value, because Future is in and of itself a concurrency-safe construct.
Note that Future itself functions as a cache (see GPI's answer). However, GPI's answer isn't quite equivalent to your code: your code will only cache a successful value and will retry, while if the initial call to expensiveComputation in GPI's answer fails, getValue will always fail.
This however, gives us retry until successful:
def foo: Future[Int] = ???
private def retryFoo(): Future[Int] = foo.recoverWith{ case _ => retryFoo() }
lazy val getValue: Future[Int] = retryFoo()
In general, anything related to Futures which is asynchronous will not respect the synchronized block, unless you happen to Await on the asynchronous part within the synchronized block (which kind of defeats the point). In your case, it's absolutely possible for the following sequence (among many others) to occur:
Initial state: cache = None
Thread A calls getValue, obtains lock
Thread A pattern matches to None, calls foo to get a Future[Int] (fA0), schedules a callback to run in some thread B on fA0's successful completion (fA1)
Thread A releases lock
Thread A returns fA1
Thread C calls getValue, obtains lock
Thread C patter matches to None, calls foo to get a Future[Int] (fC0), schedules a callback to run in some thread D on fC0's successful completion (fC1)
fA0 completes successfully with value 42
Thread B runs callback on fA0, sets cache = Some(42), completes successfully with value 42
Thread C releases lock
Thread C returns fC1
fC1 completes successfull with value 7
Thread D runs callback on fC0, sets cache = Some(7), completes successfully with value 7
The code above can't deadlock, but there's no guarantee that foo will successfully complete exactly once (it could successfully complete arbitrarily many times), nor is there any guarantee as to which particular value of foo will be returned by a given call to getValue.
EDIT to add: You could also replace
cache = Some(value)
value
with
cache.synchronized { cache = cache.orElse(Some(value)) }
cache.get
Which would prevent cache from being assigned to multiple times (i.e. it would always contain the value returned by the first map callback to execute on a future returned by foo). It probably still wouldn't deadlock (I find that if I have to reason about a deadlock, my time is probably better spent reasoning about a better abstraction), but is this elaborate/verbose machinery better than just using a retry-on-failure Future as a cache?
No, but synchronized isn't actually doing much here. getValue returns almost immediately with a Future (which may or may not be completed yet), so the lock on getValue is extremely short-lived. It does not wait for foo.map to evaluate before releasing the lock, because that is executed only after foo is completed, which will almost certainly happen after getValue returns.

Atomic compareAndSet parameters are evaluated even if it's not used

I have the following code that set the Atomic variable (both java.util.concurrent.atomic and monix.execution.atomic behaves the same:
class Foo {
val s = AtomicAny(null: String)
def foo() = {
println("called")
/* Side Effects */
"foo"
}
def get(): String = {
s.compareAndSet(null, foo())
s.get
}
}
val f = new Foo
f.get //Foo.s set from null to foo, print called
f.get //Foo.s not updated, but still print called
The second time it compareAndSet, it did not update the value, but still foo is called. This is causing problem because foo is having side effects (in my real code, it creates an Akka actor and give me error because it tries to create duplicate actors).
How can I make sure the second parameter is not evaluated unless it is actually used? (Preferably not using synchronized)
I need to pass implicit parameter to foo so lazy val would not work. E.g.
lazy val s = get() //Error cannot provide implicit parameter
def foo()(implicit context: Context) = {
println("called")
/* Side Effects */
"foo"
}
def get()(implicit context: Context): String = {
s.compareAndSet(null, foo())
s.get
}
Updated answer
The quick answer is to put this code inside an actor and then you don't have to worry about synchronisation.
If you are using Akka Actors you should never need to do your own thread synchronisation using low-level primitives. The whole point of the actor model is to limit the interaction between threads to just passing asynchronous messages. This provides all the thread synchronisation that you need and guarantees that an actor processes a single message at a time in a single-threaded manner.
You should definitely not have a function that is accessed simultaneously by multiple threads that creates a singleton actor. Just create the actor when you have the information you need and pass the ActorRef to any other actors that need it using dependency injection or a message. Or create the actor at the start and initialise it when the first message arrives (using context.become to manage the actor state).
Original answer
The simplest solution is just to use a lazy val to hold your instance of foo:
class Foo {
lazy val foo = {
println("called")
/* Side Effects */
"foo"
}
}
This will create foo the first time it is used and after that will just return the same value.
If this is not possible for some reason, use an AtomicInteger initialised to 0 and then call incrementAndGet. If this returns 1 then it is the first pass through this code and you can call foo.
Explanation:
Atomic operations such as compareAndSet require support from the CPU instruction set, and modern processors have single atomic instructions for such operations. In some cases (e.g. cache line is held exclusively by this processor) the operation can be very fast. Other cases (e.g. cache line also in cache of another processor) the operation can be significantly slower and can impact other threads.
The result is that the CPU must be holding the new value before the atomic instruction is executed. So the value must be computed before it is known whether it is needed or not.

Why is it not recommended to retrieve value from Scala's Future?

I started working on Scala very recently and came across its feature called Future. I had posted a question for help with my code and some help from it.
In that conversation, I was told that it is not recommended to retrieve the value from a Future.
I understand that it is a parallel process when executed but if the value of a Future is not recommended to be retrieved, how/when do I access the result of it ? If the purpose of Future is to run a thread/process independent of main thread, why is it that it is not recommended to access it ? Will the Future automatically assign its output to its caller ? If so, how would we know when to access it ?
I wrote the below code to return a Future with a Map[String, String].
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
If I have to use the value which is returned from the method getBounds, can I access it in the below way ?
val tmsobj = new MinMaxVals(spark, commonParams)
tmsobj.getBounds(incLogIds) onComplete ({
case Success(Map) => val boundsMap = tmsobj.getBounds(incLogIds)
case Failure(value) => println("Future failed..")
})
Could anyone care to clear my doubts ?
As the others have pointed out, waiting to retrieve a value from a Future defeats the whole point of launching the Future in the first place.
But onComplete() doesn't cause the rest of your code to wait, it just attaches extra instructions to be carried out as part of the Future thread while the rest of your code goes on its merry way.
So what's wrong with your proposed code to access the result of getBounds()? Let's walk through it.
tmsobj.getBounds(incLogIds) onComplete { //launch Future, when it completes ...
case Success(m) => //if Success then store the result Map in local variable "m"
val boundsMap = tmsobj.getBounds(incLogIds) //launch a new and different Future
//boundsMap is a local variable, it disappears after this code block
case Failure(value) => //if Failure then store error in local variable "value"
println("Future failed..") //send some info to STDOUT
}//end of code block
You'll note that I changed Success(Map) to Success(m) because Map is a type (it's a companion object) and can't be used to match the result of your Future.
In conclusion: onComplete() doesn't cause your code to wait on the Future, which is good, but it is somewhat limited because it returns Unit, i.e. it has no return value with which it can communicate the result of the Future.
TLDR; Futures are not meant to manage shared state but they are good for composing asynchronous pieces of code. You can use map, flatMap and many other operations to combine Futures.
The computation that the Future represents will be executed using the given ExecutionContext (usually given implicitly), which will usually be on a thread-pool, so you are right to assume that the Future computation happens in parallel. Because of this concurrency, it is generally not advised to mutate state that is shared from inside the body of the Future, for example:
var i: Int = 0
val f: Future[Unit] = Future {
// Some computation
i = 42
}
Because you then run the risk of also accessing/modifying i in another thread (maybe the "main" one). In this kind of concurrent access situation, Futures would probably not be the right concurrency model, and you could imagine using monitors or message-passing instead.
Another possibility that is tempting but also discouraged is to block the main thread until the result becomes available:
val f: Future[Init] = Future { 42 }
val i: Int = Await.result(f)
The reason this is bad is that you will completely block the main thread, annealing the benefits of having concurrent execution in the first place. If you do this too much, you might also run in trouble because of a large number of threads that are blocked and hogging resources.
How do you then know when to access the result? You don't and it's actually the reason why you should try to compose Futures as much as possible, and only subscribe to their onComplete method at the very edge of your application. It's typical for most of your methods to take and return Futures, and only subscribe to them in very specific places.
It is not recommended to wait for a Future using Await.result because this blocks the execution of the current thread until some unknown point in the future, possibly forever.
It is perfectly OK to process the value of a Future by passing a processing function to a call such as map on the Future. This will call your function when the future is complete. The result of map is another Future, which can, in turn, be processed using map, onComplete or other methods.

Scalaz Task not starting

I'm trying to make an asyncronous call using scalaz Task.
For some strange reason whenever I attempt to call methodA, although I get a scalaz.concurrent.Task returned I never see the 2nd print statement which leads me to believe that the asynchronous call is never returned. Is there something that I'm missing that is required in order for me to start the task?
for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}
def methodA(parameterA: SomeType): Task[User]
First off, Scalaz Task is lazy: it won't start when you create it or get from some method. It's by design and allows Tasks to combine well. It's not intended to start until you fully assemble your entire program into single main Task, so when you're done, you need to manually start an aggregated Task using one of the perform methods, like unsafePerformSync that will wait for the result of aggregate async computation on a current thread:
(for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}).unsafePerformSync
As you can see from your original code, you've never started any Task. However, you've mentioned that first println printed its message to console. This may be related to asAsyncSuccess method: I've not found it in Scalaz, so I assume it's in implicit class somewhere in your code:
implicit class TaskOps[A](val a: A) extends AnyVal {
def asAsyncSuccess(): Task[A] = Task(a)
}
If you write a helper implicit class like this, it won't convert an effectful expression into lazy Task action, despite the fact that Task.apply
is call-by-name. It will only convert its result, which happens to be just Unit, because its own parameter a is not call-by-name. To work as intended, it needs to be like the following:
implicit class TaskOps[A](a: => A) {
def asAsyncSuccess(): Task[A] = Task(a)
}
You may also ask why first println prints something while second isn't. This is related to for-comprehensions desugaring into method calls. Specifically, compiler turns your code into this:
println("we are in this for loop!").asAsyncSuccess().flatMap(foo =>
methodA(parameterA).flatMap(authenticatedUser =>
println("we are out of this this call!").asAsyncSuccess().map(foo2 =>
authenticatedUser)))
If asAsyncSuccess doesn't respect intended Task semantics, first println gets executed immediately. But its result is then wrapped into Task that is, essentially, just a wrapper over start function, and flatMap implementation on it just combines such functions together without running them. So, methodA or second println won't be executed until you call unsafePerformSync or other perform helper method. Note, however, that second println will still be executed earlier than it should according to Task semantics, just not that earlier.