I have an object with heavy initialization cost and memory footprint. Initialization time is human-noticeable but creation frequency is low.
class HeavyClass {
heavyInit()
}
My solution is to create a Provider actor that would have a single object created ahead of time and provide it instantly on request. The provider would then go on with creating the next object.
class HeavyClassProvider extends Actor {
var hc: Option[HeavyClass] = Some(new HeavyClass())
override def receive = {
case "REQUEST" =>
sender ! { hc getOrElse new HeavyClass() }
self ! "RESPAWN"
hc = None
case "RESPAWN" if (hc == None) => hc = Some(new HeavyClass())
}
}
And a consumer:
abstract class HeavyClassConsumer extends Actor {
import context.dispatcher
import akka.pattern.ask
import scala.concurrent.duration._
import akka.util.Timeout
implicit val timeout = Timeout(5, SECONDS)
var provider: ActorRef
var hc: Option[HeavyClass] = None
override def receive = {
case "START" =>
((provider ask "REQUEST").mapTo[HeavyClass]
onSuccess { case h: HeavyClass => hc = Some(h) })
}
}
Is this a common pattern ? The code feels wacky, is there an obvious cleaner way of doing this ?
The problem with your solution is that when you call new HeavyClass() your actor will block until it will process that computation. Doing it in a Future or in another Actor avoids that. Here is one way to do it:
import akka.pattern.pipe
...
class HeavyClassProvider extends Actor {
// start off async computation during init:
var hc: Future[HeavyClass] = Future(new HeavyClass)
override def receive = {
case "REQUEST" =>
// send result to requester when it's complete or
// immediately if its already complete:
hc pipeTo sender
// start a new computation and send to self:
Future(new HeavyClass) pipeTo self
case result: HeavyClass => // new result is ready
hc = Future.successful(result) // update with newly computed result
case Status.Failure(f) => // computation failed
hc = Future.failed[HeavyClass](f)
// maybe request a recomputation again
}
}
(I didn't compile it)
One particularity about my first solution is that it does not restrict how many Futures are computed at the same time. If you receive multiple requests it will compute multiple futures which might not be desirable, although there is no race condition in this Actor. To restrict that simply introduce a Boolean flag in the Actor that tells you if you are computing something already. Also, all these vars can be replaced with become/unbecome behaviors.
Example of a single concurrent Future computation given multiple requests:
import akka.pattern.pipe
...
class HeavyClassProvider extends Actor {
// start off async computation during init:
var hc: Future[HeavyClass] = Future(new HeavyClass) pipeTo self
var computing: Boolean = true
override def receive = {
case "REQUEST" =>
// send result to requester when it's complete or
// immediately if its already complete:
hc pipeTo sender
// start a new computation and send to self:
if(! computing)
Future(new HeavyClass) pipeTo self
case result: HeavyClass => // new result is ready
hc = Future.successful(result) // update with newly computed result
computing = false
case Status.Failure(f) => // computation failed
hc = Future.failed[HeavyClass](f)
computing = false
// maybe request a recomputation again
}
}
EDIT:
After discussing requirements further in the comments here is yet another implementation that sends a new object to the sender/client on each request in non-blocking manner:
import akka.pattern.pipe
...
class HeavyClassProvider extends Actor {
override def receive = {
case "REQUEST" =>
Future(new HeavyClass) pipeTo sender
}
}
And then it can be simplified to:
object SomeFactoryObject {
def computeLongOp: Future[HeavyClass] = Future(new HeavyClass)
}
In this case no actors are needed. The purpose of using an Actor in these cases as a synchronization mechanism and non-blocking computation is for that Actor to cache results and provide async computation with more complex logic than just Future, otherwise Future is sufficient.
I suspect it's more often done with a synchronized factory of some sort, but the actor seems as good of a synchronization mechanism as any, especially if the calling code is already built on async patterns.
One potential problem with your current implementation is that it can't parallelize creation of multiple HeavyClass objects that are requested "all at once". It might be the case that this is a feature and that parallel-creation of several would bog down the system. If, on the other hand, it's "just slow", you might want to spin off the creation of the "on-demand" instances into its own thread/actor.
Related
I am using a third party library to provide parsing services (user agent parsing in my case) which is not a thread safe library and has to operate on a single threaded basis. I would like to write a thread safe API that can be called by multiple threads to interact with it via Futures API as the library might introduce some potential blocking (IO). I would also like to provide back pressure when necessary and return a failed future when the parser doesn't catch up with the producers.
It could actually be a generic requirement/question, how to interact with any client/library which is not thread safe (user agents/geo locations parsers, db clients like redis, loggers collectors like fluentd), with back pressure in a concurrent environments.
I came up with the following formula:
encapsulate the parser within a dedicated Actor.
create an akka stream source queue that receives ParseReuqest that contains the user agent and a Promise to complete, and using the ask pattern via mapAsync to interact with the parser actor.
create another actor to encapsulate the source queue.
Is this the way to go? Is there any other way to achieve this, maybe simpler ? maybe using graph stage? can it be done without the ask pattern and less code involved?
the actor mentioned in number 3, is because I'm not sure if the source queue is thread safe or not ?? I wish it was simply stated in the docs, but it doesn't. there are multiple versions over the web, some stating it's not and some stating it is.
Is the source queue, once materialized, is thread safe to push elements from different threads?
(the code may not compile and is prone to potential failures, and is only intended for this question in place)
class UserAgentRepo(dbFilePath: String)(implicit actorRefFactory: ActorRefFactory) {
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.duration._
implicit val askTimeout = Timeout(5 seconds)
// API to parser - delegates the request to the back pressure actor
def parse(userAgent: String): Future[Option[UserAgentData]] = {
val p = Promise[Option[UserAgentData]]
parserBackPressureProvider ! UserAgentParseRequest(userAgent, p)
p.future
}
// Actor to provide back pressure that delegates requests to parser actor
private class ParserBackPressureProvider extends Actor {
private val parser = context.actorOf(Props[UserAgentParserActor])
val queue = Source.queue[UserAgentParseRequest](100, OverflowStrategy.dropNew)
.mapAsync(1)(request => (parser ? request.userAgent).mapTo[Option[UserAgentData]].map(_ -> request.p))
.to(Sink.foreach({
case (result, promise) => promise.success(result)
}))
.run()
override def receive: Receive = {
case request: UserAgentParseRequest => queue.offer(request).map {
case QueueOfferResult.Enqueued =>
case _ => request.p.failure(new RuntimeException("parser busy"))
}
}
}
// Actor parser
private class UserAgentParserActor extends Actor {
private val up = new UserAgentParser(dbFilePath, true, 50000)
override def receive: Receive = {
case userAgent: String =>
sender ! Try {
up.parseUa(userAgent)
}.toOption.map(UserAgentData(userAgent, _))
}
}
private case class UserAgentParseRequest(userAgent: String, p: Promise[Option[UserAgentData]])
private val parserBackPressureProvider = actorRefFactory.actorOf(Props[ParserBackPressureProvider])
}
Do you have to use actors for this?
It does not seem like you need all this complexity, scala/java hasd all the tools you need "out of the box":
class ParserFacade(parser: UserAgentParser, val capacity: Int = 100) {
private implicit val ec = ExecutionContext
.fromExecutor(
new ThreadPoolExecutor(
1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue(capacity)
)
)
def parse(ua: String): Future[Option[UserAgentData]] = try {
Future(Some(UserAgentData(ua, parser.parseUa(ua)))
.recover { _ => None }
} catch {
case _: RejectedExecutionException =>
Future.failed(new RuntimeException("parser is busy"))
}
}
I'm trying to make two external calls (to a Redis database) inside an Actor's receive method. Both calls return a Future and I need the result of the first Future inside the second.
I'm wrapping both calls inside a Redis transaction to avoid anyone else from modifying the value in the database while I'm reading it.
The internal state of the actor is updated based on the value of the second Future.
Here is what my current code looks like which I is incorrect because I'm updating the internal state of the actor inside a Future.onComplete callback.
I cannot use the PipeTo pattern because I need both both Future have to be in a transaction.
If I use Await for the first Future then my receive method will block.
Any idea how to fix this ?
My second question is related to how I'm using Futures. Is this usage of Futures below correct? Is there a better way of dealing with multiple Futures in general? Imagine if there were 3 or 4 Future each depending on the previous one.
import akka.actor.{Props, ActorLogging, Actor}
import akka.util.ByteString
import redis.RedisClient
import scala.concurrent.Future
import scala.util.{Failure, Success}
object GetSubscriptionsDemo extends App {
val akkaSystem = akka.actor.ActorSystem("redis-demo")
val actor = akkaSystem.actorOf(Props(new SimpleRedisActor("localhost", "dummyzset")), name = "simpleactor")
actor ! UpdateState
}
case object UpdateState
class SimpleRedisActor(ip: String, key: String) extends Actor with ActorLogging {
//mutable state that is updated on a periodic basis
var mutableState: Set[String] = Set.empty
//required by Future
implicit val ctx = context dispatcher
var rClient = RedisClient(ip)(context.system)
def receive = {
case UpdateState => {
log.info("Start of UpdateState ...")
val tran = rClient.transaction()
val zf: Future[Long] = tran.zcard(key) //FIRST Future
zf.onComplete {
case Success(z) => {
//SECOND Future, depends on result of FIRST Future
val rf: Future[Seq[ByteString]] = tran.zrange(key, z - 1, z)
rf.onComplete {
case Success(x) => {
//convert ByteString to UTF8 String
val v = x.map(_.utf8String)
log.info(s"Updating state with $v ")
//update actor's internal state inside callback for a Future
//IS THIS CORRECT ?
mutableState ++ v
}
case Failure(e) => {
log.warning("ZRANGE future failed ...", e)
}
}
}
case Failure(f) => log.warning("ZCARD future failed ...", f)
}
tran.exec()
}
}
}
The compiles but when I run it gets struck.
2014-08-07 INFO [redis-demo-akka.actor.default-dispatcher-3] a.e.s.Slf4jLogger - Slf4jLogger started
2014-08-07 04:38:35.106UTC INFO [redis-demo-akka.actor.default-dispatcher-3] e.c.s.e.a.g.SimpleRedisActor - Start of UpdateState ...
2014-08-07 04:38:35.134UTC INFO [redis-demo-akka.actor.default-dispatcher-8] r.a.RedisClientActor - Connect to localhost/127.0.0.1:6379
2014-08-07 04:38:35.172UTC INFO [redis-demo-akka.actor.default-dispatcher-4] r.a.RedisClientActor - Connected to localhost/127.0.0.1:6379
UPDATE 1
In order to use pipeTo pattern I'll need access to the tran and the FIRST Future (zf) in the actor where I'm piping the Future to because the SECOND Future depends on the value (z) of FIRST.
//SECOND Future, depends on result of FIRST Future
val rf: Future[Seq[ByteString]] = tran.zrange(key, z - 1, z)
Without knowing too much about the redis client you are using, I can offer an alternate solution that should be cleaner and won't have issues with closing over mutable state. The idea is to use a master/worker kind of situation, where the master (the SimpleRedisActor) receives the request to do the work and then delegates off to a worker that performs the work and responds with the state to update. That solution would look something like this:
object SimpleRedisActor{
case object UpdateState
def props(ip:String, key:String) = Props(classOf[SimpleRedisActor], ip, key)
}
class SimpleRedisActor(ip: String, key: String) extends Actor with ActorLogging {
import SimpleRedisActor._
import SimpleRedisWorker._
//mutable state that is updated on a periodic basis
var mutableState: Set[String] = Set.empty
val rClient = RedisClient(ip)(context.system)
def receive = {
case UpdateState =>
log.info("Start of UpdateState ...")
val worker = context.actorOf(SimpleRedisWorker.props)
worker ! DoWork(rClient, key)
case WorkResult(result) =>
mutableState ++ result
case FailedWorkResult(ex) =>
log.error("Worker got failed work result", ex)
}
}
object SimpleRedisWorker{
case class DoWork(client:RedisClient, key:String)
case class WorkResult(result:Seq[String])
case class FailedWorkResult(ex:Throwable)
def props = Props[SimpleRedisWorker]
}
class SimpleRedisWorker extends Actor{
import SimpleRedisWorker._
import akka.pattern.pipe
import context._
def receive = {
case DoWork(client, key) =>
val trans = client.transaction()
trans.zcard(key) pipeTo self
become(waitingForZCard(sender, trans, key) orElse failureHandler(sender, trans))
}
def waitingForZCard(orig:ActorRef, trans:RedisTransaction, key:String):Receive = {
case l:Long =>
trans.zrange(key, l -1, l) pipeTo self
become(waitingForZRange(orig, trans) orElse failureHandler(orig, trans))
}
def waitingForZRange(orig:ActorRef, trans:RedisTransaction):Receive = {
case s:Seq[ByteString] =>
orig ! WorkResult(s.map(_.utf8String))
finishAndStop(trans)
}
def failureHandler(orig:ActorRef, trans:RedisTransaction):Receive = {
case Status.Failure(ex) =>
orig ! FailedWorkResult(ex)
finishAndStop(trans)
}
def finishAndStop(trans:RedisTransaction) {
trans.exec()
context stop self
}
}
The worker starts the transaction and then makes calls into redis and ultimately completes the transaction before stopping itself. When it calls redis, it gets the future and pipes back to itself for the continuation of the processing, changing the receive method between as a mechanism of showing progressing through its states. In a model like this (which I suppose is somewhat similar to the error kernal pattern), the master owns and protects the state, delegating the "risky" work off to a child who can figure out what the change for the state should be, but the changing is still owned by the master.
Now again, I have no idea about the capabilities of the redis client you are using and if it is safe enough to even do this kind of stuff, but that's not really the point. The point was to show a safer structure for doing something like this that involves futures and state that needs to be changed safely.
Using the callback to mutate internal state is not a good idea, excerpt from the akka docs:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback.
Why do you worry about pipeTo and transactions?
Not sure how redis transactions work, but I would guess that the transaction does not encompass the onComplete callback on the second future anyways.
I would put the state into a separate actor which you pipe the future too. This way you have a separate mailbox, and the ordering there will be the same as the ordering of the messages that came in to modify the state. Also if any read requests come in, they will also be put in the correct order.
Edit to respond to edited question: Ok, so you don't want to pipe the first future, that makes sense, and should be no problem as the first callback is harmless. The callback of the second future is the problem, as it manipulates the state. But this future can be pipe without the need for access to the transaction.
So basically my suggestion is:
val firstFuture = tran.zcard
firstFuture.onComplete {
val secondFuture = tran.zrange
secondFuture pipeTo stateActor
}
With stateActor containing the mutable state.
Let's say we have an Akka actor, which maintains an internal state in terms of a var.
class FooActor extends Actor {
private var state: Int = 0
def receive = { ... }
}
Let's say the reception handler invokes an operation that returns a future, we map it using the dispatcher as context executor and finally we set a onSuccess callback that alters the actor state.
import context.dispatcher
def receive = {
case "Hello" => requestSomething() // asume Future[String]
.map(_.size)
.onSuccess { case i => state = i }
}
Is it thread-safe to alter the state of the actor from the onSuccess callback, even using the actor dispatcher as execution context?
No it's not (akka 2.3.4 documentation).
What you have to do in this case is send a message to self to alter the state. If you need ordering you can use stash and become. Something like this
import akka.actor.{Stash,Actor}
import akka.pattern.pipe
case class StateUpdate(i:int)
class FooActor extends Actor with Stash{
private var state: Int = 0
def receive = ready
def ready = {
case "Hello" => requestSomething() // asume Future[String]
.map(StateUpdate(_.size)) pipeTo self
become(busy)
}
def busy {
case StateUpdate(i) =>
state=i
unstashAll()
become(ready)
case State.Failure(t:Throwable) => // the future failed
case evt =>
stash()
}
}
Of course this is a simplistic implementation you will probably want to handle timeout and stuff to avoid having your actor stuck.
if you don't need ordering guarantees on your state :
case class StateUpdate(i:int)
class FooActor extends Actor with Stash{
private var state: Int = 0
def receive = {
case "Hello" => requestSomething() // asume Future[String]
.map(StateUpdate(_.size)) pipeTo self
case StateUpdate(i) => state=i
}
but then the actor state may not be the length of the last string received
Just to support Jean's answer here's the example from the docs :
class MyActor extends Actor {
var state = ...
def receive = {
case _ =>
//Wrongs
// Very bad, shared mutable state,
// will break your application in weird ways
Future {state = NewState}
anotherActor ? message onSuccess {
r => state = r
}
// Very bad, "sender" changes for every message,
// shared mutable state bug
Future {expensiveCalculation(sender())}
//Rights
// Completely safe, "self" is OK to close over
// and it's an ActorRef, which is thread-safe
Future {expensiveCalculation()} onComplete {
f => self ! f.value.get
}
// Completely safe, we close over a fixed value
// and it's an ActorRef, which is thread-safe
val currentSender = sender()
Future {expensiveCalculation(currentSender)}
}
}
I have a Actor, and on some message I'm running some method which returns Future.
def receive: Receive = {
case SimpleMessge() =>
val futData:Future[Int] = ...
futData.map { data =>
...
}
}
Is it possible to pass actual context to wait for this data? Or Await is the best I can do if I need this data in SimpleMessage?
If you really need to wait for the future to complete before processing the next message, you can try something like this:
object SimpleMessageHandler{
case class SimpleMessage()
case class FinishSimpleMessage(i:Int)
}
class SimpleMessageHandler extends Actor with Stash{
import SimpleMessageHandler._
import context._
import akka.pattern.pipe
def receive = waitingForMessage
def waitingForMessage: Receive = {
case SimpleMessage() =>
val futData:Future[Int] = ...
futData.map(FinishSimpleMessage(_)) pipeTo self
context.become(waitingToFinish(sender))
}
def waitingToFinish(originalSender:ActorRef):Receive = {
case SimpleMessage() => stash()
case FinishSimpleMessage(i) =>
//Do whatever you need to do to finish here
...
unstashAll()
context.become(waitingForMessage)
case Status.Failure(ex) =>
//log error here
unstashAll()
context.become(waitingForMessage)
}
}
In this approach, we process a SimpleMessage and then switch handling logic to stash all subsequent SimpleMessages received until we get a result from the future. When we get a result, failure or not, we unstash all of the other SimpleMessages we have received while waiting for the future and go on our merry way.
This actor just toggles back and forth between two states and that allows you to only fully process one SimpleMessage at a time without needing to block on the Future.
I have an actor which creates another one:
class MyActor1 extends Actor {
val a2 = system actorOf Props(new MyActor(123))
}
The second actor must initialize (bootstrap) itself once it created and only after that it must be able to do other job.
class MyActor2(a: Int) extends Actor {
//initialized (bootstrapped) itself, potentially a long operation
//how?
val initValue = // get from a server
//handle incoming messages
def receive = {
case "job1" => // do some job but after it's initialized (bootstrapped) itself
}
}
So the very first thing MyActor2 must do is do some job of initializing itself. It might take some time because it's request to a server. Only after it finishes successfully, it must become able to handle incoming messages through receive. Before that - it must not do that.
Of course, a request to a server must be asynchronous (preferably, using Future, not async, await or other high level stuff like AsyncHttpClient). I know how to use Future, it's not a problem, though.
How do I ensure that?
p.s. My guess is that it must send a message to itself first.
You could use become method to change actor's behavior after initialization:
class MyActor2(a: Int) extends Actor {
server ! GetInitializationData
def initialize(d: InitializationData) = ???
//handle incoming messages
val initialized: Receive = {
case "job1" => // do some job but after it's initialized (bootstrapped) itself
}
def receive = {
case d # InitializationData =>
initialize(d)
context become initialized
}
}
Note that such actor will drop all messages before initialization. You'll have to preserve these messages manually, for instance using Stash:
class MyActor2(a: Int) extends Actor with Stash {
...
def receive = {
case d # InitializationData =>
initialize(d)
unstashAll()
context become initialized
case _ => stash()
}
}
If you don't want to use var for initialization you could create initialized behavior using InitializationData like this:
class MyActor2(a: Int) extends Actor {
server ! GetInitializationData
//handle incoming messages
def initialized(intValue: Int, strValue: String): Receive = {
case "job1" => // use `intValue` and `strValue` here
}
def receive = {
case InitializationData(intValue, strValue) =>
context become initialized(intValue, strValue)
}
}
I don't know wether the proposed solution is a good idea. It seems awkward to me to send a Initialization message. Actors have a lifecycle and offer some hooks. When you have a look at the API, you will discover the prestart hook.
Therefore i propose the following:
When the actor is created, its preStart hook is run, where you do your server request which returns a future.
While the future is not completed all incoming messages are stashed.
When the future completes it uses context.become to use your real/normal receive method.
After the become you unstash everything.
Here is a rough sketch of the code (bad solution, see real solution below):
class MyActor2(a: Int) extends Actor with Stash{
def preStart = {
val future = // do your necessary server request (should return a future)
future onSuccess {
context.become(normalReceive)
unstash()
}
}
def receive = initialReceive
def initialReceive = {
case _ => stash()
}
def normalReceive = {
// your normal Receive Logic
}
}
UPDATE: Improved solution according to Senias feedback
class MyActor2(a: Int) extends Actor with Stash{
def preStart = {
val future = // do your necessary server request (should return a future)
future onSuccess {
self ! InitializationDone
}
}
def receive = initialReceive
def initialReceive = {
case InitializationDone =>
context.become(normalReceive)
unstash()
case _ => stash()
}
def normalReceive = {
// your normal Receive Logic
}
case class InitializationDone
}