why does this scala by-name parameter behave weirdly - scala

OK the question might not say much, but here's the deal:
I'm learning scala and decided to make an utility class "FuncThread" with a method which receives a by-name parameter function (I guess its called that because it's a function but without a parameter list) and then starts a thread with a runable which in turn executes the passed function, I wrote such a class as follows:
class FuncThread
{
def runInThread( func: => Unit)
{
val thread = new Thread(new Runnable()
{
def run()
{
func
}
}
thread.start()
}
}
Then I wrote a junit test as follows:
#Test
def weirdBehaivorTest()
{
var executed = false
val util = new FuncThread()
util.runInThread
{
executed = true
}
//the next line makes the test pass....
//val nonSense : () => Unit = () => { Console println "???" }
assertTrue(executed)
}
If I uncomment the second commented line, the test passes but if it remains commented the test fails, is this the correct behaviour? how and when do by-name parameter functions get executed?
I know Scala has the actors library but I wanted to try this since I've always wanted to do this in Java

Is this just a race condition? runInThread starts the thread but your assertion tests 'executed' before the other thread sets it to true. Adding your extra line means more code (and so time) is executed before the test, making it more likely that 'executed' has been set to true

It's also worth noting that (as of Scala 2.8), the construct you were trying to write is available in the standard library
import scala.actors.Futures._
future{
executed = true
}
This construct is actually more powerful than what you're describing, the thread calculation can return a value, and which can be waited for.
import scala.actors.Futures._
//forks off expensive calculation
val expensiveToCalculateNumber:Future[Int] = future{
bigExpensiveCalculation()
}
// do a lot of other stuff
//print out the result of the expensive calculation if it's ready, otherwise wait until it is
println( expensiveToCalculateNumber());

Related

Mocked methods created within `withObjectMocked` not invoked when called from an `Actor`

I have published a minimal project showcasing my problem at https://github.com/Zwackelmann/mockito-actor-test
In my project I refactored a couple of components from classes to objects in all cases where the class did not really have a meaningful state. Since some of these objects establish connections to external services that need to be mocked, I was happy to see that mockito-scala introduced the withObjectMocked context function, which allows mocking objects within the scope of the function.
This feature worked perfectly for me until I introduced Actors in the mix, which would ignore the mocked functions despite being in the withObjectMocked context.
For an extended explanation what I did check out my github example project from above which is ready to be executed via sbt run.
My goal is to mock the doit function below. It should not be called during tests, so for this demonstration it simply throws a RuntimeException.
object FooService {
def doit(): String = {
// I don't want this to be executed in my tests
throw new RuntimeException(f"executed real impl!!!")
}
}
The FooService.doit function is only called from the FooActor.handleDoit function. This function is called by the FooActor after receiving the Doit message or when invoked directly.
object FooActor {
val outcome: Promise[Try[String]] = Promise[Try[String]]()
case object Doit
def apply(): Behavior[Doit.type] = Behaviors.receiveMessage { _ =>
handleDoit()
Behaviors.same
}
// moved out actual doit behavior so I can compare calling it directly with calling it from the actor
def handleDoit(): Unit = {
try {
// invoke `FooService.doit()` if mock works correctly it should return the "mock result"
// otherwise the `RuntimeException` from the real implementation will be thrown
val res = FooService.doit()
outcome.success(Success(res))
} catch {
case ex: RuntimeException =>
outcome.success(Failure(ex))
}
}
}
To mock Foo.doit I used withObjectMocked as follows. All following code is within this block. To ensure that the block is not left due to asynchronous execution, I Await the result of the FooActor.outcome Promise.
withObjectMocked[FooService.type] {
// mock `FooService.doit()`: The real method throws a `RuntimeException` and should never be called during tests
FooService.doit() returns {
"mock result"
}
// [...]
}
I now have two test setups: The first simply calls FooActor.handleDoit directly
def simpleSetup(): Try[String] = {
FooActor.handleDoit()
val result: Try[String] = Await.result(FooActor.outcome.future, 1.seconds)
result
}
The second setup triggers FooActor.handleDoit via the Actor
def actorSetup(): Try[String] = {
val system: ActorSystem[FooActor.Doit.type] = ActorSystem(FooActor(), "FooSystem")
// trigger actor to call `handleDoit`
system ! FooActor.Doit
// wait for `outcome` future. The 'real' `FooService.doit` impl results in a `Failure`
val result: Try[String] = Await.result(FooActor.outcome.future, 1.seconds)
system.terminate()
result
}
Both setups wait for the outcome promise to finish before exiting the block.
By switching between simpleSetup and actorSetup I can test both behaviors. Since both are executed within the withObjectMocked context, I would expect that both trigger the mocked function. However actorSetup ignores the mocked function and calls the real method.
val result: Try[String] = simpleSetup()
// val result: Try[String] = actorSetup()
result match {
case Success(res) => println(f"finished with result: $res")
case Failure(ex) => println(f"failed with exception: ${ex.getMessage}")
}
// simpleSetup prints: finished with result: mock result
// actorSetup prints: failed with exception: executed real impl!!!
Any suggestions?
withObjectMock relies on the code exercising the mock executing in the same thread as withObjectMock (see Mockito's implementation and see ThreadAwareMockHandler's check of the current thread).
Since actors execute on the threads of the ActorSystem's dispatcher (never in the calling thread), they cannot see such a mock.
You may want to investigate testing your actor using the BehaviorTestKit, which itself effectively uses a mock/stub implementation of the ActorContext and ActorSystem. Rather than spawning an actor, an instance of the BehaviorTestKit encapsulates a behavior and passes it messages which are processed synchronously in the testing thread (via the run and runOne methods). Note that the BehaviorTestKit has some limitations: certain categories of behaviors aren't really testable via the BehaviorTestKit.
More broadly, I'd tend to suggest that mocking in Akka is not worth the effort: if you need pervasive mocks, that's a sign of a poor implementation. ActorRef (especially of the typed variety) is IMO the ultimate mock: encapsulate exactly what needs to be mocked into its own actor with its own protocol and inject that ActorRef into the behavior under test. Then you validate that the behavior under test holds up its end of the protocol correctly. If you want to validate the encapsulation (which should be as simple as possible to the extent that it's obviously correct, but if you want/need to spend effort on getting those coverage numbers up...) you can do the BehaviorTestKit trick as above (and since the only thing the behavior is doing is exercising the mocked functionality, it almost certainly won't be in the category of behaviors which aren't testable with the BehaviorTestKit).

Scala methods with no argument evaluating on definition in zeppelin

I'm writing a method in zeppelin that will update several DataFrames, to be called as part of initializing my code.
The pattern we're following is to define all initialization methods in their own paragraphs, and then call them as part of a block.
def init(nc: NotebookContext) = {
method1()
method2()
}
However, for most definition signatures of methods without parameters, it appears that zeppelin is actually calling and evaluating the last method in a paragraph. This is a problem, because when the method is called later, it means the transformations have been applied to the DataFrame twice, which is not desired.
Is this a function of scala, or a quirk of zeppelin, or both? Why do some of these declarations evaluate immediately, while others wait to be called?
Assume the below methods are each defined in their own zeppelin paragraph
def runsAutomatically(): Unit = { println("test") }
//runsAutomatically: ()Unit
//test
def runsAutomatically2 = { println("test2") }
//runsAutomatically2: Unit
//test2
def waitsForDefinition= () => { println("test") }
//waitsForDefinition: () => Unit
I understand that there is a difference in scala between functions/methods with no parameter lists, and a single parameter list with no parameters, but I don't know why these different version would change when things get executed.
Finally if done in a single paragraph:
def runsAutomatically(): Unit = { println("test") }
def runsAutomatically2 = { println("test") }
//runsAutomatically: ()Unit
//runsAutomatically2: Unit
//test2
Is this just a quirk of zeppelins, or something about Scala I'm missing?
Because in scala, a def without an empty parameter list is a strict value; In the end it's actually just a val.
Scala is a strict language and not making the function a thunk by not adding an empty parameter list will actually be evaluated immediately.
You are right, none of these methods should get evaluated automatically. At least, not in pure Scala.
def runsAutomatically(): Unit = { println("test") }
def runsAutomatically2 = { println("test2") }
def waitsForDefinition= () => { println("test") }
You should blame Zeppelin for that then. What version of Zeppelin are you using? I don't see this problem in Zeppelin 0.9.0 - maybe an upgrade would be the option for you?

Scalaz Task not starting

I'm trying to make an asyncronous call using scalaz Task.
For some strange reason whenever I attempt to call methodA, although I get a scalaz.concurrent.Task returned I never see the 2nd print statement which leads me to believe that the asynchronous call is never returned. Is there something that I'm missing that is required in order for me to start the task?
for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}
def methodA(parameterA: SomeType): Task[User]
First off, Scalaz Task is lazy: it won't start when you create it or get from some method. It's by design and allows Tasks to combine well. It's not intended to start until you fully assemble your entire program into single main Task, so when you're done, you need to manually start an aggregated Task using one of the perform methods, like unsafePerformSync that will wait for the result of aggregate async computation on a current thread:
(for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}).unsafePerformSync
As you can see from your original code, you've never started any Task. However, you've mentioned that first println printed its message to console. This may be related to asAsyncSuccess method: I've not found it in Scalaz, so I assume it's in implicit class somewhere in your code:
implicit class TaskOps[A](val a: A) extends AnyVal {
def asAsyncSuccess(): Task[A] = Task(a)
}
If you write a helper implicit class like this, it won't convert an effectful expression into lazy Task action, despite the fact that Task.apply
is call-by-name. It will only convert its result, which happens to be just Unit, because its own parameter a is not call-by-name. To work as intended, it needs to be like the following:
implicit class TaskOps[A](a: => A) {
def asAsyncSuccess(): Task[A] = Task(a)
}
You may also ask why first println prints something while second isn't. This is related to for-comprehensions desugaring into method calls. Specifically, compiler turns your code into this:
println("we are in this for loop!").asAsyncSuccess().flatMap(foo =>
methodA(parameterA).flatMap(authenticatedUser =>
println("we are out of this this call!").asAsyncSuccess().map(foo2 =>
authenticatedUser)))
If asAsyncSuccess doesn't respect intended Task semantics, first println gets executed immediately. But its result is then wrapped into Task that is, essentially, just a wrapper over start function, and flatMap implementation on it just combines such functions together without running them. So, methodA or second println won't be executed until you call unsafePerformSync or other perform helper method. Note, however, that second println will still be executed earlier than it should according to Task semantics, just not that earlier.

Scala parallel calculation and interrupt when one method returns a result

Given the several methods
def methodA(args:Int):Int={
//too long calculation
result
}
def methodB(args:Int):Int={
//too long calculation
result
}
def methodC(args:Int):Int={
//too long calculation
result
}
They have the same set of arguments and return a result of the same type.
It is necessary to calculate the methods in parallel and when one method returns a result I need to interrupt others.
This was an interesting question. I'm not too experienced, so there is probably a more Scala-ish way to do this, for starters with some of the cancellable future implementations. However, I traded brevity for readability.
import java.util.concurrent.{Callable, FutureTask}
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
def longRunningTask(foo: Int => String)(arg: Int): FutureTask[String] = {
new FutureTask[String](new Callable[String]() {
def call(): String = {
foo(arg)
}
})
}
def killTasks(tasks: Seq[FutureTask[_]]): Unit = {
tasks.foreach(_.cancel(false))
}
val task1 = longRunningTask(methodA)(3)
val task2 = longRunningTask(methodB)(4)
val task1Future = Future {
task1.run()
task1.get()
}
val task2Future = Future {
task2.run()
task2.get()
}
val firstFuture = Future.firstCompletedOf(List(task1Future, task2Future))
firstFuture.onSuccess({
case result => {
println(result)
killTasks(List(task1, task2))
}
})
So, what I did here? I crated a helper method which creates a FutureTask (from the Java API) which executes an Int => String operation on some Int argument. I also created a helper method which kills all FutureTasks in a collection.
After that I created two long running tasks with two different methods and inputs (that's your part).
The next part is a bit ugly, basically I run the FutureTasks with Future monad and call get to fetch the result.
After that I basically only use the firstCompletedOf method from the Future companion to handle the first finishing task, and when that is processed kill the rest of the tasks.
I'll probably make an effort to make this code a bit better, but start with this.
You should also have a go in checking this with different 'ExecutionContext's, and parallelism levels.

Reducing code repetition with recurring code in if-else

I'm wondering if the following short snippet, which does show repetition can be made more DRY. I seem to be hitting these kind of constructions quite often.
Say I want some computation to be done either synchronous or asynchronous, which is chosen at runtime.
for(i <- 1 to reps) {
Thread.sleep(expDistribution.sample().toInt)
if (async) {
Future {
sqlContext.sql(query).collect()
}
} else {
sqlContext.sql(query).collect()
}
}
It feels clumsy repeating the call to the sqlContext. Is there an idiom for this trivial recurring construct?
You can "store" your computation in a local def and then evaluate it either synchronously or asynchronously
def go = sqlContext.sql(query).collect()
if(async) Future(go) else Future.successful(go)
You can execture Future in your current thread using MoreExecutors.directExecutor() which is implemented in guava library.
(If you don't wan't to use guava library, see this question)
Using this method, you can switch the execution context according to async flag.
Here's the sample code.
You can see that setting async flag to false makes each Future executed in order.
import com.google.common.util.concurrent.MoreExecutors
import scala.concurrent.{Future,ExecutionContext}
import java.lang.Thread
object Main {
def main(args:Array[String]){
val async = false // change this to switch sync/async
implicit val ec = if(async){
ExecutionContext.Implicits.global // use thread pool
}else{
ExecutionContext.fromExecutor(MoreExecutors.directExecutor) // directy execute in current thread
}
println(Thread.currentThread.getId)
Future{
Thread.sleep(1000)
println(Thread.currentThread.getId)
}
Future{
Thread.sleep(2000)
println(Thread.currentThread.getId)
}
Thread.sleep(4000) // if you are doing asynchronously, you need this.
}
}