debugging scala futures - how to determine the future's execution context - scala

related with another question I posted (scala futures - keeping track of request context when threadId is irrelevant)
when debugging a future, the call stack isn't very informative (as the call context is usually in another thread and another time).
this is especially problematic when there can be different paths leading to the same future code (for instance usage of DAO called from many places in the code etc).
do you know of an elegant solution for this?
I was thinking of passing a token/request ID (for flows started by a web server request) - but this would require passing it around - and also won't include any of the state which you can see in the stack trace.
perhaps passing a stack around? :)

Suppose you make a class
case class Context(requestId: Int, /* other things you need to pass around */)
There are two basic ways to send it around implicitly:
1) Add an implicit Context parameter to any function that requires it:
def processInAnotherThread(/* explicit arguments */)(
implicit evaluationContext: scala.concurrent.EvaluationContext,
context: Context): Future[Result] = ???
def processRequest = {
/* ... */
implicit val context: Context = Context(getRequestId, /* ... */)
processInAnotherThread(/* explicit parameters */)
}
The drawback is that every function that needs to access Context must have this parameter and it litters the function signatures quite a bit.
2) Put it into a DynamicVariable:
// Context companion object
object Context {
val context: DynamicVariable[Context] =
new DynamicVariable[Context](Context(0, /* ... */))
}
def processInAnotherThread(/* explicit arguments */)(
implicit evaluationContext: scala.concurrent.EvaluationContext
): Future[Result] = {
// get requestId from context
Context.context.value.requestId
/* ... */
}
def processRequest = {
/* ... */
Context.context.withValue(Context(getRequestId, /* ... */)) {
processInAnotherThread(/* explicit parameters */)
}
}
The drawbacks are that
it's not immediately clear deep inside the processing that there is some context available and what contents it has and also referential transparency is broken. I believe it's better to strictly limit the number of available DynamicVariables, preferably don't have more than 1 or at most 2 and document their use.
context must either have default values or nulls for all its contents, or it must itself be a null by default (new DynamicVariable[Context](null)). Forgetting to initialize Context or its contents before processing may lead to nasty errors.
DynamicVariable is still much better than some global variable and doesn't influence the signatures of the functions that don't use it directly in any way.
In both cases you may update the contents of an existing Context with a copy method of a case class. For example:
def deepInProcessing(/* ... */): Future[Result] =
Context.context.withValue(
Context.context.value.copy(someParameter = newParameterValue)
) {
processFurther(/* ... */)
}

Related

Atomic compareAndSet parameters are evaluated even if it's not used

I have the following code that set the Atomic variable (both java.util.concurrent.atomic and monix.execution.atomic behaves the same:
class Foo {
val s = AtomicAny(null: String)
def foo() = {
println("called")
/* Side Effects */
"foo"
}
def get(): String = {
s.compareAndSet(null, foo())
s.get
}
}
val f = new Foo
f.get //Foo.s set from null to foo, print called
f.get //Foo.s not updated, but still print called
The second time it compareAndSet, it did not update the value, but still foo is called. This is causing problem because foo is having side effects (in my real code, it creates an Akka actor and give me error because it tries to create duplicate actors).
How can I make sure the second parameter is not evaluated unless it is actually used? (Preferably not using synchronized)
I need to pass implicit parameter to foo so lazy val would not work. E.g.
lazy val s = get() //Error cannot provide implicit parameter
def foo()(implicit context: Context) = {
println("called")
/* Side Effects */
"foo"
}
def get()(implicit context: Context): String = {
s.compareAndSet(null, foo())
s.get
}
Updated answer
The quick answer is to put this code inside an actor and then you don't have to worry about synchronisation.
If you are using Akka Actors you should never need to do your own thread synchronisation using low-level primitives. The whole point of the actor model is to limit the interaction between threads to just passing asynchronous messages. This provides all the thread synchronisation that you need and guarantees that an actor processes a single message at a time in a single-threaded manner.
You should definitely not have a function that is accessed simultaneously by multiple threads that creates a singleton actor. Just create the actor when you have the information you need and pass the ActorRef to any other actors that need it using dependency injection or a message. Or create the actor at the start and initialise it when the first message arrives (using context.become to manage the actor state).
Original answer
The simplest solution is just to use a lazy val to hold your instance of foo:
class Foo {
lazy val foo = {
println("called")
/* Side Effects */
"foo"
}
}
This will create foo the first time it is used and after that will just return the same value.
If this is not possible for some reason, use an AtomicInteger initialised to 0 and then call incrementAndGet. If this returns 1 then it is the first pass through this code and you can call foo.
Explanation:
Atomic operations such as compareAndSet require support from the CPU instruction set, and modern processors have single atomic instructions for such operations. In some cases (e.g. cache line is held exclusively by this processor) the operation can be very fast. Other cases (e.g. cache line also in cache of another processor) the operation can be significantly slower and can impact other threads.
The result is that the CPU must be holding the new value before the atomic instruction is executed. So the value must be computed before it is known whether it is needed or not.

Accessing variable from enclosing scope at runtime

I am working to implement a DSL that will - among other things - allow me to wrap a block of code in a context. For simplicity, the context in this example is just a string.
The code block wrapped by that context could contain anything, but will also contain calls to func1. Now, within func1 I would like to have access to the ID of the enclosing context.
def func1() = {
// Would like to access the context ID here
// i.e. "MY_CONTEXT"
}
def func2()(implicit x: String) = {
// Can access x here, but that would be "OTHER".
}
def context(id: String)(body: => Unit) = {
// How can body make use of id ?
body
}
implicit val c = "OTHER"
context("MY_CONTEXT") {
func1()
func2()
}
I have looked into using an implicit parameter. This can be seen with func2. However, it seems that this would need to be declared just above the call to context. This defeats the purpose, as I only want to specify it with the context call.
I hope the example code is enough to illustrate the problem. Any suggestion on how to keep the context(id)(body) construct as shown and be able to access the context ID within func1? Your help is much appreciated.

Scala Fork-Join-All With Multiple Generic Types and 1 Generic Unit of Work

I'm attempting to write a method which accepts multiple generic types and takes as an argument a unit of work to execute.
The idea is that the unit of work is a common function that itself is generic. For the sake of example, let's say it's something like the following:
def loadModelRdd[T: TypeTag](sc: SparkContext): RDD[T] = {
...
}
loadModelRdd() will construct an RDD of the given type after some internal processing like loading the Model information, etc.
A prototype method I've been hacking on looks something like the following (non-working):
def forkAll[A : Manifest, B : Manifest](work: => RDD[_]): (RDD[A], RDD[B]) = {
def aFuture = Future { work } // How can I notify that this work call returns type A?
def bFuture = Future { work } // How can I notify that this work call returns type B?
val res = for {
a <- aFuture
b <- bFuture
} yield (a.asInstanceOf[A], b.asInstanceOf[B])
Await.result(res, 10.seconds)
}
This is a shortened version of the code I'm working on as I'm actually looking at accepting as many as 10 different types.
As you can see, the overall goal of the forkAll method is to wrap the unit of work in a Future, fork-join the execution of the unit of work for each type, then return the results as a Tuple'd result. An example consumer statement would be:
val (a, b) = forkAll[ClassA, ClassB](loadModelRdd)
i.e I want to fork-join at this point and wait for the results, but I want the executions to be executed in parallel and then collected back to the Driver (Spark Driver to be specific).
The problem is I'm not sure how to coerce the type returned by the unit of work within forkAll when constructing the Future {} blocks. Without the forkAll, the implementation looked like the following:
val resA = loadModelRdd[ClassA](sc)
val resB = loadModelRdd[ClassB](sc)
...
I am looking at doing this for two reasons:
To abstract the details of fork-join for any unit of work which matches this model.
A version of this code, which explicitly states what the unit of work is, is working in Production and was responsible for cutting execution of a long-running block by close to half. I have a couple of execution steps where this pattern could be applied
Is this something that is possible in Scala's type system? Or should I look at this problem from a different perspective? I've tried a couple of implementations (including one described here) but I haven't quite found one that fits my current view of the problem
Please let me know if there is any additional information needed.
Thanks!
Short answer: Scala does not allow functions with type parameters, so what you want is not exactly possible.
You are attempting to pass a method with a type parameter. Although methods are allowed to have type parameters, functions are not. When you try to pass a method, it acts like an anonymous function, so you must specify a type.
However, since methods do allow type parameters, you can take advantage of this by creating an abstract class that will do your fork/join
abstract class ForkJoin {
protected def work[T]: RDD[T]
def apply[A, B]: (RDD[A], RDD[B]) = {
// Write implementation of fork/join here
(work[A], work[B])
}
}
then overriding the type generic work method so that it does what you want, such as calling some other pre-defined method.
val forkJoin = new ForkJoin {
override protected def work[T]: RDD[T] =
loadModelRdd[T](sc)
}
val (intRdd, stringRdd) = forkJoin[Int, String]
Check out this for a prototype implementation that compiles and runs without issues.

What does the word "Action" do in a Scala function definition using the Play framework?

I am developing Play application and I've just started with Scala. I see that there is this word Action after the equals sign in the function below and before curly brace.
def index = Action {
Ok(views.html.index("Hi there"))
}
What does this code do? I've seen it used with def index = { but not with the word before the curly brace.
I would assume that the name of the function is index. But I do not know what the word Action does in this situation.
This word is a part of Play Framework, and it's an object, which has method apply(block: ⇒ Result), so your code is actually:
def index: Action[AnyContent] = Action.apply({
Ok.apply(views.html.index("Hi there"))
})
Your index method returns an instance of the class Action[AnyContent].
By the way, you're passing a block of code {Ok(...)} to apply method, which (block of code) is actually acts as anonymous function here, because the required type for apply's input is not just Result but ⇒ Result, which means that it takes an anonymous function with no input parameters, which returns Result. So, your Ok-block will be executed when container, received your instance of class Action (from index method), decided to execute this block. Which simply means that you're just describing an action here - not executing - it will be actually executed when Play received your request - and find binding to your action inside routing file.
Also, you don't have to use def here as you always return same action - val or lazy val is usually enough. You will need a def only if you actually want to pass some parameter from routing table (for instance):
GET /clients/:id controllers.SomeController.index(id: Long)
def index(id: Long) = Action { ... } // new action generated for every new request here
Another possible approach is to choose Action, based on parameter:
def index(id: Long) = {
if (id == 0) Action {...} else Action{...}
}
But uasually you can use routing table itself for that, which is better for decoupling. This example just shows that Action is nothing more than return value.
Update for #Kazuya
val method1 = Action{...} //could be def too, no big difference here
// this (code inside Action) gonna be called separately after "index" (if method2 is requested of course)
// notice that it needs the whole request, so it (request) should be completely parsed at the time
val method2 = Action{ req => // you can extract additional params from request
val param1 = req.headers("header1")
...
}
//This is gonna be called first, notice that Play doesn't need the whole request body here, so it might not even be parsed on this stage
def index(methodName: String) = methodName match {
case "method1" => method1
case "method2" => method2
}
GWT/Scala.js use simillar approach for client-server interaction. This is just one possible solution to explain importance of the parameter "methodName" passed from routing table. So, action could be thought as a wrapper over function that in its turn represents a reference to OOP-method, which makes it useful for both REST and RPC purposes.
The other answers deal with your specific case. You asked about the general case, however, so I'll attempt to answer from that perspective.
First off, def is used to define a method, not a function (better to learn that difference now). But, you're right, index is the name of that method.
Now, unlike other languages you might be familiar with (e.g., C, Java), Scala lets you define methods with an expression (as suggested by the use of the assignment operator syntax, =). That is, everything after the = is an expression that will be evaluated to a value each time the method is invoked.
So, whereas in Java you have to say:
public int three() { return 3; }
In Scala, you can just say:
def three = 3
Of course, the expression is usually more complicated (as in your case). It could be a block of code, like you're more used to seeing, in which case the value is that of the last expression in the block:
def three = {
val a = 1
val b = 2
a + b
}
Or it might involve a method invocation on some other object:
def three = Numbers.add(1, 2)
The latter is, in fact, exactly what's going on in your specific example, although it requires a bit more explanation to understand why. There are two bits of magic involved:
If an object has an apply method, then you can treat the object as if it were a function. You can say, for example, Add(1, 2) when you really mean Add.apply(1,2) (assuming there's an Add object with an apply method, of course). And just to be clear, it doesn't have to be an object defined with the object keyword. Any object with a suitable apply method will do.
If a method has a single by-name parameter (e.g., def ifWaterBoiling(fn: => Tea)), then you can invoke the method like ifWaterBoiling { makeTea }. The code in that block is evaluated lazily (and may not be evaluated at all). This would be equivalent to writing ifWaterBoiling({ makeTea }). The { makeTea } part just defines an expression that gets passed in, unevaluated, for the fn parameter.
Its the Action being called on with an expression block as argument. (The apply method is used under the hood).
Action.apply({
Ok("Hello world")
})
A simple example (from here) is as follows (look at comments in code):
case class Logging[A](action: Action[A]) extends Action[A] {
def apply(request: Request[A]): Result = {// apply method which is called on expression
Logger.info("Calling action")
action(request) // action being called on further with the request provided to Logging Action
}
lazy val parser = action.parser
}
Now you can use it to wrap any other action value:
def index = Logging { // Expression argument starts
Action { // Action argument (goes under request)
Ok("Hello World")
}
}
Also, the case you mentioned for def index = { is actually returning Unit like: def index: Unit = {.

Scala dependency injection: alternatives to implicit parameters

Please pardon the length of this question.
I often need to create some contextual information at one layer of my code, and consume that information elsewhere. I generally find myself using implicit parameters:
def foo(params)(implicit cx: MyContextType) = ...
implicit val context = makeContext()
foo(params)
This works, but requires the implicit parameter to be passed around a lot, polluting the method signatures of layer after layout of intervening functions, even if they don't care about it themselves.
def foo(params)(implicit cx: MyContextType) = ... bar() ...
def bar(params)(implicit cx: MyContextType) = ... qux() ...
def qux(params)(implicit cx: MyContextType) = ... ged() ...
def ged(params)(implicit cx: MyContextType) = ... mog() ...
def mog(params)(implicit cx: MyContextType) = cx.doStuff(params)
implicit val context = makeContext()
foo(params)
I find this approach ugly, but it does have one advantage though: it's type safe. I know with certainty that mog will receive a context object of the right type, or it wouldn't compile.
It would alleviate the mess if I could use some form of "dependency injection" to locate the relevant context. The quotes are there to indicate that this is different from the usual dependency injection patterns found in Scala.
The start point foo and the end point mog may exist at very different levels of the system. For example, foo might be a user login controller, and mog might be doing SQL access. There may be many users logged in at once, but there's only one instance of the SQL layer. Each time mog is called by a different user, a different context is needed. So the context can't be baked into the receiving object, nor do you want to merge the two layers in any way (like the Cake Pattern). I'd also rather not rely on a DI/IoC library like Guice or Spring. I've found them very heavy and not very well suited to Scala.
So what I think I need is something that lets mog retrieve the correct context object for it at runtime, a bit like a ThreadLocal with a stack in it:
def foo(params) = ...bar()...
def bar(params) = ...qux()...
def qux(params) = ...ged()...
def ged(params) = ...mog()...
def mog(params) = { val cx = retrieveContext(); cx.doStuff(params) }
val context = makeContext()
usingContext(context) { foo(params) }
But that would fall as soon as asynchronous actor was involved anywhere in the chain. It doesn't matter which actor library you use, if the code runs on a different thread then it loses the ThreadLocal.
So... is there a trick I'm missing? A way of passing information contextually in Scala that doesn't pollute the intervening method signatures, doesn't bake the context into the receiver statically, and is still type-safe?
The Scala standard library includes something like your hypothetical "usingContext" called DynamicVariable. This question has some information about it When we should use scala.util.DynamicVariable? . DynamicVariable does use a ThreadLocal under the hood so many of your issues with ThreadLocal will remain.
The reader monad is a functional alternative to explicitly passing an environment http://debasishg.blogspot.com/2010/12/case-study-of-cleaner-composition-of.html. The Reader monad can be found in Scalaz http://code.google.com/p/scalaz/. However, the ReaderMonad does "pollute" your signatures in that their types must change and in general monadic programming can cause a lot of restructuring to your code plus extra object allocations for all the closures may not sit well if performance or memory is a concern.
Neither of these techniques will automatically share a context over an actor message send.
A little late to the party, but have you considered using implicit parameters to your classes constructors?
class Foo(implicit biz:Biz) {
def f() = biz.doStuff
}
class Biz {
def doStuff = println("do stuff called")
}
If you wanted to have a new biz for each call to f() you could let the implicit parameter be a function returning a new biz:
class Foo(implicit biz:() => Biz) {
def f() = biz().doStuff
}
Now you simply need to provide the context when constructing Foo. Which you can do like this:
trait Context {
private implicit def biz = () => new Biz
implicit def foo = new Foo // The implicit parameter biz will be resolved to the biz method above
}
class UI extends Context {
def render = foo.f()
}
Note that the implicit biz method will not be visible in UI. So we basically hide away those details :)
I wrote a blog post about using implicit parameters for dependency injection which can be found here (shameless self promotion ;) )
I think that the dependency injection from lift does what you want. See the wiki for details using the doWith () method.
Note that you can use it as a separate library, even if you are not running lift.
You asked this just about a year ago, but here's another possibility. If you only ever need to call one method:
def fooWithContext(cx: MyContextType)(params){
def bar(params) = ... qux() ...
def qux(params) = ... ged() ...
def ged(params) = ... mog() ...
def mog(params) = cx.doStuff(params)
... bar() ...
}
fooWithContext(makeContext())(params)
If you need all the methods to be externally visible:
case class Contextual(cx: MyContextType){
def foo(params) = ... bar() ...
def bar(params) = ... qux() ...
def qux(params) = ... ged() ...
def ged(params) = ... mog() ...
def mog(params) = cx.doStuff(params)
}
Contextual(makeContext()).foo(params)
This is basically the cake pattern, except that if all your stuff fits into a single file, you don't need all the messy trait stuff to combine it into one object: you can just nest them. Doing it this way also makes cx properly lexically scoped, so you don't end up with funny behavior when you use futures and actors and such. I suspect that if you use the new AnyVal, you could even do away with the overhead of allocating the Contextual object.
If you want to split your stuff into multiple files using traits, you only really need a single trait per file to hold everything and put the MyContextType properly in scope, if you don't need the fancy replaceable-components-via-inheritance thing most cake pattern examples have.
// file1.scala
case class Contextual(cx: MyContextType) with Trait1 with Trait2{
def foo(params) = ... bar() ...
def bar(params) = ... qux() ...
}
// file2.scala
trait Trait1{ self: Contextual =>
def qux(params) = ... ged() ...
def ged(params) = ... mog() ...
}
// file3.scala
trait Trait2{ self: Contextual =>
def mog(params) = cx.doStuff(params)
}
// file4.scala
Contextual(makeContext()).foo(params)
It looks kinda messy in a small example, but remember, you only need to split it off into a new trait if the code is getting too big to sit comfortable in one file. By that point your files are reasonably big, so an extra 2 lines of boilerplate on a 200-500 line file is not so bad really.
EDIT:
This works with asynchronous stuff too
case class Contextual(cx: MyContextType){
def foo(params) = ... bar() ...
def bar(params) = ... qux() ...
def qux(params) = ... ged() ...
def ged(params) = ... mog() ...
def mog(params) = Future{ cx.doStuff(params) }
def mog2(params) = (0 to 100).par.map(x => x * cx.getSomeValue )
def mog3(params) = Props(new MyActor(cx.getSomeValue))
}
Contextual(makeContext()).foo(params)
It Just Works using nesting. I'd be impressed if you could get similar functionality working with DynamicVariable.
You'd need a special subclass of Future that stores the current DynamicVariable.value when created, and hook into the ExecutionContext's prepare() or execute() method to extract the value and properly set up the DynamicVariable before executing the Future.
Then you'd need a special scala.collection.parallel.TaskSupport to do something similar in order to get parallel collections working. And a special akka.actor.Props in order to do something similar for that.
Every time there's a new mechanism of creating asynchronous tasks, DynamicVariable based implementations will break and you'll have weird bugs where you end up pulling up the wrong Context. Every time you add a new DynamicVariable to keep track of, you'll need to patch all your special executors to properly set/unset this new DynamicVariable. Using nesting you can just let lexical closure take care of all of this for you.
(I think Futures, collections.parallel and Props count as "layers in between that aren't my code")
Similar to the implicit approach, with Scala Macros you can do auto-wiring of objects using constructors - see my MacWire project (and excuse the self-promotion).
MacWire also has scopes (quite customisable, a ThreadLocal implementation is provided). However, I don't think you can propagate context across actor calls with a library - you need to carry some identifier around. This can be e.g. through a wrapper for sending actor messages, or more directly with the message.
Then as long as the identifier is unique per request/session/whatever your scope is, it's just a matter of looking things up in a map via a proxy (like the MacWire scopes do, the "identifier" here isn't needed as it is stored in the ThreadLocal).