I have a restful API that receives an array of JSON messages that will be converted to individual Avro messages and then sent to Kafka. Inside the route, I call 3 different actors: 1) one actor goes out and retrieves the Avro schema from disk 2) then loop through the array of JSON messages and compare it to the Avro schema in a second actor. If any of the messages don't validate, then I need to return a response back to the caller of the API and stop processing. 3) Loop through the array and pass into a 3rd actor that takes the JSON object, converts it to an Avro message and sends to a Kafka topic.
Where I'm having problem getting my head wrapped around is how to stop processing in the route if something fails in one of the actors. I'm passing in the request context to each actor and calling it's complete method but it doesn't seem to immediately stop, the next actor still processes even when it shouldn't. Here is a code snippet of what I'm doing in the route:
post {
entity(as[JsObject]) { membersObj =>
requestContext =>
val membersJson = membersObj.fields("messages").convertTo[JsArray].elements
val messageService = actorRefFactory.actorOf(Props(new MessageProcessingServicev2()))
val avroService = actorRefFactory.actorOf(Props(new AvroSchemaService()))
val validationService = actorRefFactory.actorOf(Props(new JSONMessageValidationService()))
implicit val timeout = Timeout(5 seconds)
val future = avroService ? AvroSchema.MemberSchema(requestContext)
val memberSchema:Schema = Await.result(future, timeout.duration).asInstanceOf[Schema]
for (member <- membersJson) validationService ! ValidationService.MemberValidation(member.asJsObject, memberSchema, requestContext)
for (member <- membersJson) (messageService ! MessageProcessingv2.ProcessMember(member.asJsObject, topicName, memberSchema, requestContext))
I've looked through a lot of blogs/books/slides around this topic and not sure what the best approach is. I've been using Scala/Akka for about 2 months and basically self taught on just the pieces that I've been needing. So any insight that the more seasoned Scala/Akka/Spray developers have in this, it's much appreciated. The one thought I had was to wrapper the 3 actors in a 'master' actor and make each a child of that actor and attempt to approach it like that.
As you are using async processing (!) you cannot control the processing after your messages have been sent. You would need to use ask (?) that will return a future you can work with.
But I have a better idea. You could send a message from the first actor to the second. And instead of returning a result to the first actor, you could send the message to the third one to keep on the computation.
Related
I'm a newbie with Akka Actors, and I am learning about the Ask pattern. I am looking at the following example from alvin alexander:
class TestActor extends Actor {
def receive = {
case AskNameMessage => // respond to the "ask" request
sender ! "Fred"
case _ => println("that was unexpected")
}
}
...
val myActor = system.actorOf(Props[TestActor], name = "myActor")
// (1) this is one way to "ask" another actor
implicit val timeout = Timeout(5 seconds)
val future = myActor ? AskNameMessage
val result = Await.result(future, timeout.duration).asInstanceOf[String]
println(result)
(Yes, I know that Await.result isn't generally the best practice, but this is just a simple example.)
So from what I can tell, the only thing you need to do to implement the "askee" actor to service an Ask operation is to send a message back to the "asker" via the Tell operator, and that will be turned into a future on the "asker" side as a response to the Ask. Seems simple enough.
My question is this:
When the response comes back, how does Akka know that this particular message is the response to a certain Ask message?
In the example above, the "Fred" message doesn't contain any specific routing information that specifies that it's the response to a particular Ask operation. Does it just assume that the next message that the asker receives from that askee is the answer to the Ask? If that's the case, then what if one actor sends multiple Ask operations to the same askee? Wouldn't the responses likely get jumbled, causing random responses to be mapped to the wrong Asks?
Or, what if the asker is also receiving other types of messages from the same askee actor that are unrelated to these Ask messages? Couldn't the Asks receive response messages of the wrong type?
Just to be clear, I'm asking about Akka Classic, not Typed.
For every Ask message sent to an actor, akka creates a proxy ActorRef whose sole responsibility is to process one single message. This temp "actor" is initialized with a promise, which it needs to complete on message processing.
The source code of it is found here
but the main details are
private[akka] final class PromiseActorRef private (
val provider: ActorRefProvider,
val result: Promise[Any],
....
val alreadyCompleted = !result.tryComplete(promiseResult)
Now, it should be clear that Ask pattern is backed by independent unique actor asker for every message sent to the receiver askee.
The askee does know actor reference of the sender, or asker, of every message received via method context.sender(). Thus, it just needs to use this ActorRef to send a response back to the asker.
Finally, this all avoids any race conditions given that an actor only processes a message at a time. Thus it excludes any possibility of retrieving a "wrong" asker via method context.sender().
I want to stream a file from s3 to actor to be parsed and enriched and to write the output to other file.
The number of parserActors should be limited e.g
application.conf
akka{
actor{
deployment {
HereClient/router1 {
router = round-robin-pool
nr-of-instances = 28
}
}
}
}
code
val writerActor = actorSystem.actorOf(WriterActor.props())
val parser = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
however the actor that is writing to a file should be limited to 1 (singleton)
I tried doing something like
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.map (record => record ! parser)
but I am not sure that the backpressure is handled correctly. any advice ?
Indeed your solution is disregarding backpressure.
The correct way to have a stream interact with an actor while maintaining backpressure is to use the ask pattern support of akka-stream (reference).
From my understanding of your example you have 2 separate actor interaction points:
send records to the parsing actors (via a router)
send parsed records to the singleton write actor
What I would do is something similar to the following:
val writerActor = actorSystem.actorOf(WriterActor.props())
val parserActor = actorSystem.actorOf(FromConfig.props(ParsingActor.props(writerActor)), "router1")
val reader: ParquetReader[GenericRecord] = AvroParquetReader.builder[GenericRecord](file).withConf(conf).build()
val source: Source[GenericRecord, NotUsed] = AvroParquetSource(reader)
source.ask[ParsedRecord](28)(parserActor)
.ask[WriteAck](writerActor)
.runWith(Sink.ignore)
The idea is that you send all the GenericRecord elements to the parserActor which will reply with a ParsedRecord. Here as an example we specify a parallelism of 28 since that's the number of instances you have configured, however as long as you use a value higher than the actual number of actor instances no actor should suffer from work starvation.
Once the parseActor replies with the parsing result (here represented by the ParsedRecord) we apply the same pattern to interact with the singleton writer actor. Note that here we don't specify the parallelism as we have a single instance so it doesn't make sense the send more than 1 message at a time (in reality this happens anyway due to buffering at async boundaries, but this is just a built-in optimization). In this case we expect that the writer actor replies with a WriteAck to inform us that the writing has been successful and we can send the next element.
Using this method you are maintaining backpressure throughout your whole stream.
I think you should be using one of the "async" operations
Perhaps this other q/a gives you some insperation Processing an akka stream asynchronously and writing to a file sink
We have a fairly complex system developed using Akka HTTP and Actors model. Until now, we extensively used ask pattern and mixed Futures and Actors.
For example, an actor gets message, it needs to execute 3 operations in parallel, combine a result out of that data and returns it to sender. What we used is
declare a new variable in actor receive message callback to store a sender (since we use Future.map it can be another sender).
executed all those 3 futures in parallel using Future.sequence (sometimes its call of function that returns a future and sometimes it is ask to another actor to get something from it)
combine the result of all 3 futures using map or flatMap function of Future.sequence result
pipe a final result to a sender using pipeTo
Here is a code simplified:
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) => {
val sen = sender
val result: Future[Seq[Map[String, Any]]] = if (paging.getOrElse(Paging(0, 0)) == Paging(0, 0)) Future.successful(Seq.empty)
else {
val start = System.currentTimeMillis()
val profileF = profileActor ? Get(userId)
Future.sequence(Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).map { result =>
logger.info(s"Got ${result.size} news in ${System.currentTimeMillis() - start} ms")
result
}.recover { case ex: Throwable =>
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
Seq.empty
}
}
result.pipeTo(sen)
}
Function getAndProcessData contains Future.sequence with executing 3 futures in parallel.
Now, as I'm reading more and more on Akka, I see that using ask is creating another actor listener. Questions are:
As we extensively use ask, can it lead to a to many threads used in a system and perhaps a thread starvation sometimes?
Using Future.map much also means different thread often. I read about one thread actor illusion which can be easily broken with mixing Futures.
Also, can this affect performances in a bad way?
Do we need to store sender in temp variable send, since we're using pipeTo? Could we do only pipeTo(sender). Also, does declaring sen in almost each receive callback waste to much resources? I would expect its reference will be removed once operation in complete.
Is there a chance to design such a system in a better way, meadning that we don't use map or ask so much? I looked at examples when you just pass a replyTo reference to some actor and the use tell instead of ask. Also, sending message to self and than replying to original sender can replace working with Future.map in some scenarios. But how it can be designed having in mind we want to perform 3 async operations in parallel and returns a formatted data to a sender? We need to have all those 3 operations completed to be able to format data.
I tried not to include to many examples, I hope you understand our concerns and problems. Many questions, but I would really love to understand how it works, simple and clear
Thanks in advance
If you want to do 3 things in parallel you are going to need to create 3 Future values which will potentially use 3 threads, and that can't be avoided.
I'm not sure what the issue with map is, but there is only one call in this code and that is not necessary.
Here is one way to clean up the code to avoid creating unnecessary Future values (untested!):
case RetrieveData(userId, `type`, id, lang, paging, timeRange, platform) =>
if (paging.forall(_ == Paging(0, 0))) {
sender ! Seq.empty
} else {
val sen = sender
val start = System.currentTimeMillis()
val resF = Seq(
profileActor ? Get(userId),
getSymbols(`type`, id),
getData(paging, timeRange, platform),
)
Future.sequence(resF).onComplete {
case Success(result) =>
val dur = System.currentTimeMillis() - start
logger.info(s"Got ${result.size} news in $dur ms")
sen ! result
case Failure(ex)
logger.error(s"Failure on getting data: ${ex.getMessage}", ex)
sen ! Seq.empty
}
}
You can avoid ask by creating your own worker thread that collects the different results and then sends the result to the sender, but that is probably more complicated than is needed here.
An actor only consumes a thread in the dispatcher when it is processing a message. Since the number of messages the actor spawned to manage the ask will process is one, it's very unlikely that the ask pattern by itself will cause thread starvation. If you're already very close to thread starvation, an ask might be the straw that breaks the camel's back.
Mixing Futures and actors can break the single-thread illusion, if and only if the code executing in the Future accesses actor state (meaning, basically, vars or mutable objects defined outside of a receive handler).
Request-response and at-least-once (between them, they cover at least most of the motivations for the ask pattern) will in general limit throughput compared to at-most-once tells. Implementing request-response or at-least-once without the ask pattern might in some situations (e.g. using a replyTo ActorRef for the ultimate recipient) be less overhead than piping asks, but probably not significantly. Asks as the main entry-point to the actor system (e.g. in the streams handling HTTP requests or processing messages from some message bus) are generally OK, but asks from one actor to another are a good opportunity to streamline.
Note that, especially if your actor imports context.dispatcher as its implicit ExecutionContext, transformations on Futures are basically identical to single-use actors.
Situations where you want multiple things to happen (especially when you need to manage partial failure (Future.sequence.recover is a possible sign of this situation, especially if the recover gets nontrivial)) are potential candidates for a saga actor to organize one particular request/response.
I would suggest instead of using Future.sequence, Souce from Akka can be used which will run all the futures in parallel, in which you can provide the parallelism also.
Here is the sample code:
Source.fromIterator( () => Seq(profileF, getSymbols(`type`, id), getData(paging, timeRange, platform)).iterator )
.mapAsync( parallelism = 1 ) { case (seqIdValue, row) =>
row.map( seqIdValue -> _ )
}.runWith( Sink.seq ).map(_.map(idWithDTO => idWithDTO))
This will return Future[Seq[Map[String, Any]]]
I have a scenario where I have to fetch the details of a user by his id. It is a HTTP request that comes in and in my HTTP handler layer, I make use of the id that I get from the request, send a message to the actor which then talks to the database service to fetch the user.
Now since this is a HTTP request, I need to satisfy the request by sending a response back. So I thought of using the Akka ask pattern, but I have the following questions in mind:
Is this going to block on my current thread?
Is using ask pattern here to fetch a user in my case a scalable solution? I mean, I could have a few hundreds to a million users calling this end point at any given point in time. Is this a good idea to use the ask pattern to fetch a user?
In code, it looks like this in my HTTP controller
val result: Future[Any] = userActor ? FetchUser(id)
In my actor, I would do the following:
case fetchUser: FetchUser => sender ! myService.getUser(fetchUser.id)
Answering your questions in the same order you posed them:
No, using the ? does not block the current thread. It returns a Future immediately. However, the result within the Future may not be available immediately.
If you need the solution to be "scalable", and your service is capable of multiple concurrent queries, then you may need to use a pool of Actors so you can serve multiple ? at once, or see below for a Futures only, scalable, solution.
Futures Exclusively
If your Actors are not caching any intermediate values then you can just use Futures directly and avoid the rigmarole of Actors (e.g. Props, actorOf, receive, ?, ...):
import java.util.concurrent.Executors
import scala.concurrent.{ExecutionContext,Future}
object ServicePool {
private val myService = ???
val maxQueries = 11 //should come from a configuration file instead
private val queryExecutionPool =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(maxQueries))
type ID = ???
/**Will only hit the DB with maxQueries at once.*/
def queryService(id : ID) =
Future { myService getUser id }(queryExecutionPool)
}//end object ServiceQuery
You can now call ServicePool.queryService as often as you want but the service will not be hit with more than maxQueries at a single time, and no Actors:
val alotOfIDs : Seq[ID] = (1 to 1000000) map { i => ID(i)}
val results = alotOfIDs map ServicePool.queryService
I'm trying to figure out if my usage of passing Akka ActorRef around to other actors is not an anti-pattern.
I've a few actors in my system. Some are long lived (restClientRouter,publisher) and some die after that they have done the work (geoActor). The short-lived actors need to send messages to the long-lived actors and therefore need their ActorRefs.
//router for a bunch of other actors
val restClientRouter = createRouter(context.system)
//publishers messages to an output message queue
val publisher: ActorRef = context.actorOf(Props(new PublisherActor(host, channel)), name = "pub-actor")
//this actor send a message to the restClientRouter and then sends the response
//to the publisher
val geoActor = context.actorOf(Props(new GeoInferenceActor(restClientRouter, publisher)), name = "geo-inference-actor")
As you can see I'm passing the ActorRefs (restClientRouter and publisher) to the constructor of GeoInferenceActor. Is this okay or not? Is there a better way of doing this ?
There are a couple of good ways to "introduce" actor refs to actor instances that need them.
1) Create the actor with the refs it needs as constructor args (which is what you are doing)
2) Pass in the needed refs with a message after the instance is created
Your solution is perfectly acceptable, and even suggested by Roland Kuhn, the Tech Lead for Akka, in this post:
Akka actor lookup or dependency injection
It is perfectly valid, as stated in the Akka API documentation:
ActorRefs can be freely shared among actors by message passing.
In your case, you are passing them to the constructors, which is absolutely fine and the way it is supposed to be.