Is Akka Ask Blocking on the Current Thread - scala

I have a scenario where I have to fetch the details of a user by his id. It is a HTTP request that comes in and in my HTTP handler layer, I make use of the id that I get from the request, send a message to the actor which then talks to the database service to fetch the user.
Now since this is a HTTP request, I need to satisfy the request by sending a response back. So I thought of using the Akka ask pattern, but I have the following questions in mind:
Is this going to block on my current thread?
Is using ask pattern here to fetch a user in my case a scalable solution? I mean, I could have a few hundreds to a million users calling this end point at any given point in time. Is this a good idea to use the ask pattern to fetch a user?
In code, it looks like this in my HTTP controller
val result: Future[Any] = userActor ? FetchUser(id)
In my actor, I would do the following:
case fetchUser: FetchUser => sender ! myService.getUser(fetchUser.id)

Answering your questions in the same order you posed them:
No, using the ? does not block the current thread. It returns a Future immediately. However, the result within the Future may not be available immediately.
If you need the solution to be "scalable", and your service is capable of multiple concurrent queries, then you may need to use a pool of Actors so you can serve multiple ? at once, or see below for a Futures only, scalable, solution.
Futures Exclusively
If your Actors are not caching any intermediate values then you can just use Futures directly and avoid the rigmarole of Actors (e.g. Props, actorOf, receive, ?, ...):
import java.util.concurrent.Executors
import scala.concurrent.{ExecutionContext,Future}
object ServicePool {
private val myService = ???
val maxQueries = 11 //should come from a configuration file instead
private val queryExecutionPool =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(maxQueries))
type ID = ???
/**Will only hit the DB with maxQueries at once.*/
def queryService(id : ID) =
Future { myService getUser id }(queryExecutionPool)
}//end object ServiceQuery
You can now call ServicePool.queryService as often as you want but the service will not be hit with more than maxQueries at a single time, and no Actors:
val alotOfIDs : Seq[ID] = (1 to 1000000) map { i => ID(i)}
val results = alotOfIDs map ServicePool.queryService

Related

Akka Classic Ask pattern. How does it match asks with responses?

I'm a newbie with Akka Actors, and I am learning about the Ask pattern. I am looking at the following example from alvin alexander:
class TestActor extends Actor {
def receive = {
case AskNameMessage => // respond to the "ask" request
sender ! "Fred"
case _ => println("that was unexpected")
}
}
...
val myActor = system.actorOf(Props[TestActor], name = "myActor")
// (1) this is one way to "ask" another actor
implicit val timeout = Timeout(5 seconds)
val future = myActor ? AskNameMessage
val result = Await.result(future, timeout.duration).asInstanceOf[String]
println(result)
(Yes, I know that Await.result isn't generally the best practice, but this is just a simple example.)
So from what I can tell, the only thing you need to do to implement the "askee" actor to service an Ask operation is to send a message back to the "asker" via the Tell operator, and that will be turned into a future on the "asker" side as a response to the Ask. Seems simple enough.
My question is this:
When the response comes back, how does Akka know that this particular message is the response to a certain Ask message?
In the example above, the "Fred" message doesn't contain any specific routing information that specifies that it's the response to a particular Ask operation. Does it just assume that the next message that the asker receives from that askee is the answer to the Ask? If that's the case, then what if one actor sends multiple Ask operations to the same askee? Wouldn't the responses likely get jumbled, causing random responses to be mapped to the wrong Asks?
Or, what if the asker is also receiving other types of messages from the same askee actor that are unrelated to these Ask messages? Couldn't the Asks receive response messages of the wrong type?
Just to be clear, I'm asking about Akka Classic, not Typed.
For every Ask message sent to an actor, akka creates a proxy ActorRef whose sole responsibility is to process one single message. This temp "actor" is initialized with a promise, which it needs to complete on message processing.
The source code of it is found here
but the main details are
private[akka] final class PromiseActorRef private (
val provider: ActorRefProvider,
val result: Promise[Any],
....
val alreadyCompleted = !result.tryComplete(promiseResult)
Now, it should be clear that Ask pattern is backed by independent unique actor asker for every message sent to the receiver askee.
The askee does know actor reference of the sender, or asker, of every message received via method context.sender(). Thus, it just needs to use this ActorRef to send a response back to the asker.
Finally, this all avoids any race conditions given that an actor only processes a message at a time. Thus it excludes any possibility of retrieving a "wrong" asker via method context.sender().

Alternative to using Future.sequence inside Akka Actors

We have a fairly complex system developed using Akka HTTP and Actors model. Until now, we extensively used ask pattern and mixed Futures and Actors.
For example, an actor gets message, it needs to execute 3 operations in parallel, combine a result out of that data and returns it to sender. What we used is
declare a new variable in actor receive message callback to store a sender (since we use Future.map it can be another sender).
executed all those 3 futures in parallel using Future.sequence (sometimes its call of function that returns a future and sometimes it is ask to another actor to get something from it)
combine the result of all 3 futures using map or flatMap function of Future.sequence result
pipe a final result to a sender using pipeTo
Here is a code simplified:
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) => {
val sen = sender
val result: Future[Seq[Map[String, Any]]] = if (paging.getOrElse(Paging(0, 0)) == Paging(0, 0)) Future.successful(Seq.empty)
else {
val start = System.currentTimeMillis()
val profileF = profileActor ? Get(userId)
Future.sequence(Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).map { result =>
logger.info(s"Got ${result.size} news in ${System.currentTimeMillis() - start} ms")
result
}.recover { case ex: Throwable =>
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
Seq.empty
}
}
result.pipeTo(sen)
}
Function getAndProcessData contains Future.sequence with executing 3 futures in parallel.
Now, as I'm reading more and more on Akka, I see that using ask is creating another actor listener. Questions are:
As we extensively use ask, can it lead to a to many threads used in a system and perhaps a thread starvation sometimes?
Using Future.map much also means different thread often. I read about one thread actor illusion which can be easily broken with mixing Futures.
Also, can this affect performances in a bad way?
Do we need to store sender in temp variable send, since we're using pipeTo? Could we do only pipeTo(sender). Also, does declaring sen in almost each receive callback waste to much resources? I would expect its reference will be removed once operation in complete.
Is there a chance to design such a system in a better way, meadning that we don't use map or ask so much? I looked at examples when you just pass a replyTo reference to some actor and the use tell instead of ask. Also, sending message to self and than replying to original sender can replace working with Future.map in some scenarios. But how it can be designed having in mind we want to perform 3 async operations in parallel and returns a formatted data to a sender? We need to have all those 3 operations completed to be able to format data.
I tried not to include to many examples, I hope you understand our concerns and problems. Many questions, but I would really love to understand how it works, simple and clear
Thanks in advance
If you want to do 3 things in parallel you are going to need to create 3 Future values which will potentially use 3 threads, and that can't be avoided.
I'm not sure what the issue with map is, but there is only one call in this code and that is not necessary.
Here is one way to clean up the code to avoid creating unnecessary Future values (untested!):
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) =>
if (paging.forall(_ == Paging(0, 0))) {
sender ! Seq.empty
} else {
val sen = sender
val start = System.currentTimeMillis()
val resF = Seq(
profileActor ? Get(userId),
getSymbols(`type`, id),
getData(paging, timeRange, platform),
)
Future.sequence(resF).onComplete {
case Success(result) =>
val dur = System.currentTimeMillis() - start
logger.info(s"Got ${result.size} news in $dur ms")
sen ! result
case Failure(ex)
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
sen ! Seq.empty
}
}
You can avoid ask by creating your own worker thread that collects the different results and then sends the result to the sender, but that is probably more complicated than is needed here.
An actor only consumes a thread in the dispatcher when it is processing a message. Since the number of messages the actor spawned to manage the ask will process is one, it's very unlikely that the ask pattern by itself will cause thread starvation. If you're already very close to thread starvation, an ask might be the straw that breaks the camel's back.
Mixing Futures and actors can break the single-thread illusion, if and only if the code executing in the Future accesses actor state (meaning, basically, vars or mutable objects defined outside of a receive handler).
Request-response and at-least-once (between them, they cover at least most of the motivations for the ask pattern) will in general limit throughput compared to at-most-once tells. Implementing request-response or at-least-once without the ask pattern might in some situations (e.g. using a replyTo ActorRef for the ultimate recipient) be less overhead than piping asks, but probably not significantly. Asks as the main entry-point to the actor system (e.g. in the streams handling HTTP requests or processing messages from some message bus) are generally OK, but asks from one actor to another are a good opportunity to streamline.
Note that, especially if your actor imports context.dispatcher as its implicit ExecutionContext, transformations on Futures are basically identical to single-use actors.
Situations where you want multiple things to happen (especially when you need to manage partial failure (Future.sequence.recover is a possible sign of this situation, especially if the recover gets nontrivial)) are potential candidates for a saga actor to organize one particular request/response.
I would suggest instead of using Future.sequence, Souce from Akka can be used which will run all the futures in parallel, in which you can provide the parallelism also.
Here is the sample code:
Source.fromIterator( () => Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).iterator )
.mapAsync( parallelism = 1 ) { case (seqIdValue, row) =>
row.map( seqIdValue -> _ )
}.runWith( Sink.seq ).map(_.map(idWithDTO => idWithDTO))
This will return Future[Seq[Map[String, Any]]]

Locking read/write operations on a data structure in Scala/akka

I have multiple actors (in the form of Futures) firing other futures off based on what they read from a single object's cache. I want to make sure that no work overlaps, and thus want to put a lock on all read/modify/write operations. How do I do this in Scala?
I tried this, but I don't want every method/function that accesses the cache to have to be synchronized, but rather have anything that tries to access the cache understand that it needs to wait until it's time for it to access.
//The cache
object certCache {
var cache = new HashMap[Char, Future[Boolean]]
}
def someMethod = synchronized {
if(certCache ... )
certCache.do(...)
}
Any tips?
Agents
The akka library has a perfect solution for your question: Agents. From the documentation:
import scala.concurrent.ExecutionContext.Implicits.global
import akka.agent.Agent
val agent = Agent(42)
To read from an Agent you can dereference them or call the get method, both of which are immediately returning synchronous calls:
val agentResult = agent()
val agentResult2 = agent.get
Updates are asynchronous:
agent send (_ + 10) //updates value to 52, eventually
Similarly, you can get a Future of the Agent's value which completes after the currently queued updates have completed:
val futureValue = agent.future
Actors
Of course you can always go with a "home grown" solution and write an Actor that caches your values and responds to queries. BUT, this is a much more manual/inefficient solution than Agents.
Actors should only be considered as a last resort when other akka/scala solutions do not apply. This is because Actors are very low-level and the receive method is not compose-able.

How can I gather state information from a set of actors using only the actorSystem?

I'm creating an actor system, which has a list of actors representing some kind of session state.
These session are created by a factory actor (which might, in the future, get replaced by a router, if performance requires that - this should be transparent to the rest of the system, however).
Now I want to implement an operation where I get some state information from each of my currently existing session actors.
I have no explicit session list, as I want to rely on the actor system "owning" the sessions. I tried to use the actor system to look up the current session actors. The problem is that I did not find a "get all actor refs with this naming pattern" method. I tried to use the "/" operator on the system, followed by resolveOne - but got lost in a maze of future types.
The basic idea I had was:
- Send a message to all current session actors (as given to my by my ActorSystem).
- Wait for a response from them (preferably by using just the "ask" pattern - the method calling this broadcaster request/response is just a monitoring resp. debugging method, so blocking is no probleme here.
- And then collect the responses into a result.
After a death match against Scala's type system I had to give up for now.
Is there really no way of doing something like this?
If I understand the question correctly, then I can offer up a couple of ways you can accomplish this (though there are certainly others).
Option 1
In this approach, there will be an actor that is responsible for waking up periodically and sending a request to all session actors to get their current stats. That actor will use ActorSelection with a wildcard to accomplish that goal. A rough outline if the code for this approach is as follows:
case class SessionStats(foo:Int, bar:Int)
case object GetSessionStats
class SessionActor extends Actor{
def receive = {
case GetSessionStats =>
println(s"${self.path} received a request to get stats")
sender ! SessionStats(1, 2)
}
}
case object GatherStats
class SessionStatsGatherer extends Actor{
context.system.scheduler.schedule(5 seconds, 5 seconds, self, GatherStats)(context.dispatcher)
def receive = {
case GatherStats =>
println("Waking up to gether stats")
val sel = context.system.actorSelection("/user/session*")
sel ! GetSessionStats
case SessionStats(f, b) =>
println(s"got session stats from ${sender.path}, values are $f and $b")
}
}
Then you could test this code with the following:
val system = ActorSystem("test")
system.actorOf(Props[SessionActor], "session-1")
system.actorOf(Props[SessionActor], "session-2")
system.actorOf(Props[SessionStatsGatherer])
Thread.sleep(10000)
system.actorOf(Props[SessionActor], "session-3")
So with this approach, as long as we use a naming convention, we can use an actor selection with a wildcard to always find all of the session actors even though they are constantly coming (starting) and going (stopping).
Option 2
A somewhat similar approach, but in this one, we use a centralized actor to spawn the session actors and act as a supervisor to them. This central actor also contains the logic to periodically poll for stats, but since it's the parent, it does not need an ActorSelection and can instead just use its children list. That would look like this:
case object SpawnSession
class SessionsManager extends Actor{
context.system.scheduler.schedule(5 seconds, 5 seconds, self, GatherStats)(context.dispatcher)
var sessionCount = 1
def receive = {
case SpawnSession =>
val session = context.actorOf(Props[SessionActor], s"session-$sessionCount")
println(s"Spawned session: ${session.path}")
sessionCount += 1
sender ! session
case GatherStats =>
println("Waking up to get session stats")
context.children foreach (_ ! GetSessionStats)
case SessionStats(f, b) =>
println(s"got session stats from ${sender.path}, values are $f and $b")
}
}
And could be tested as follows:
val system = ActorSystem("test")
val manager = system.actorOf(Props[SessionsManager], "manager")
manager ! SpawnSession
manager ! SpawnSession
Thread.sleep(10000)
manager ! SpawnSession
Now, these examples are extremely trivialized, but hopefully they paint a picture for how you could go about solving this issue with either ActorSelection or a management/supervision dynamic. And a bonus is that ask is not needed in either and also no blocking.
There have been many additional changes in this project, so my answer/comments have been delayed quite a bit :-/
First, the session stats gathering should not be periodical, but on request. My original idea was to "mis-use" the actor system as my map of all existing session actors, so that I would not need a supervisor actor knowing all sessions.
This goal has shown to be elusive - session actors depend on shared state, so the session creator must watch sessions anyways.
This makes Option 2 the obvious answer here - the session creator has to watch all children anyways.
The most vexing hurdle with option 1 was "how to determine when all (current) answers are there" - I wanted the statistics request to take a snapshot of all currently existing actor names, query them, ignore failures (if a session dies before it can be queried, it can be ignored here) - the statistics request is only a debugging tool, i.e. something like a "best effort".
The actor selection api tangled me up in a thicket of futures (I am a Scala/Akka newbie), so I gave up on this route.
Option 2 is therefore better suited to my needs.

Akka for REST polling

I'm trying to interface a large Scala + Akka + PlayMini application with an external REST API. The idea is to periodically poll (basically every 1 to 10 minutes) a root URL and then crawl through sub-level URLs to extract data which is then sent to a message queue.
I have come up with two ways to do this:
1st way
Create a hierarchy of actors to match the resource path structure of the API. In the Google Latitude case, that would mean, e.g.
Actor 'latitude/v1/currentLocation' polls https://www.googleapis.com/latitude/v1/currentLocation
Actor 'latitude/v1/location' polls https://www.googleapis.com/latitude/v1/location
Actor 'latitude/v1/location/1' polls https://www.googleapis.com/latitude/v1/location/1
Actor 'latitude/v1/location/2' polls https://www.googleapis.com/latitude/v1/location/2
Actor 'latitude/v1/location/3' polls https://www.googleapis.com/latitude/v1/location/3
etc.
In this case, each actor is responsible for polling its associated resource periodically, as well as creating / deleting child actors for next-level path resources (i.e. actor 'latitude/v1/location' creates actors 1, 2, 3, etc. for all locations it learns about through polling of https://www.googleapis.com/latitude/v1/location).
2nd way
Create a pool of identical polling actors which receive polling requests (containing the resource path) load-balanced by a router, poll the URL once, do some processing, and schedule polling requests (both for next-level resources and for the polled URL). In Google Latitude, that would mean for instance:
1 router, n poller actors. Initial polling request for https://www.googleapis.com/latitude/v1/location leads to several new (immediate) polling requests for https://www.googleapis.com/latitude/v1/location/1, https://www.googleapis.com/latitude/v1/location/2, etc. and one (delayed) polling request for the same resource, i.e. https://www.googleapis.com/latitude/v1/location.
I have implemented both solutions and can't immediately observe any relevant difference of performance, at least not for the API and polling frequencies I am interested in. I find the first approach to be somewhat easier to reason about and perhaps easier to use with system.scheduler.schedule(...) than the second approach (where I need to scheduleOnce(...)). Also, assuming resources are nested through several levels and somewhat short-lived (e.g. several resources may be added/removed between each polling), akka's lifecycle management makes it easy to kill off a whole branch in the 1st case. The second approach should (theoretically) be faster and the code is somewhat easier to write.
My questions are:
What approach seems to be the best (in terms of performance, extensibility, code complexity, etc.)?
Do you see anything wrong with the design of either approach (esp. the 1st one)?
Has anyone tried to implement anything similar? How was it done?
Thanks!
Why not create a master poller, which then kicks of async resource requests on the schedule?
I'm no expert using Akka, but I gave this a shot:
The poller object that iterates through the list of resources to fetch:
import akka.util.duration._
import akka.actor._
import play.api.Play.current
import play.api.libs.concurrent.Akka
object Poller {
val poller = Akka.system.actorOf(Props(new Actor {
def receive = {
case x: String => Akka.system.actorOf(Props[ActingSpider], name=x.filter(_.isLetterOrDigit)) ! x
}
}))
def start(l: List[String]): List[Cancellable] =
l.map(Akka.system.scheduler.schedule(3 seconds, 3 seconds, poller, _))
def stop(c: Cancellable) {c.cancel()}
}
The actor that reads the resource asynchronously and triggers more async reads. You could put the message dispatch on a schedule rather than call immediately if it was kinder:
import akka.actor.{Props, Actor}
import java.io.File
class ActingSpider extends Actor {
import context._
def receive = {
case name: String => {
println("reading " + name)
new File(name) match {
case f if f.exists() => spider(f)
case _ => println("File not found")
}
context.stop(self)
}
}
def spider(file: File) {
io.Source.fromFile(file).getLines().foreach(l => {
val k = actorOf(Props[ActingSpider], name=l.filter(_.isLetterOrDigit))
k ! l
})
}
}