New to Akka and Actors - I need to start a few actors that will basically spend their lives reading from Kafka topics and writing to an Ignite cache. I configured my dispatcher like this:
kafka-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
My actors are created with .withDispatcher("kafka-dispatcher"), and my assumption is that each actor will be assigned an individual thread.
These actors basically spend their lives like this:
override def receive: Receive = LoggingReceive {
case InitWorker => {
initialize()
pollTopic() // This never returns
}
}
In other words, they receive an initialization message and then call the pollTopic() method, which never returns - it runs a loop reading (which will block until there is data) and then writing the data.
My questions:
Is this kosher?
Is there a better, i.e. more idiomatic, way to do this? Note that the read call inside pollTopic() blocks.
Answering to your point 2 and from the description of what you're trying to do, maybe you want to consider using Akka streams with the reactive-kafka library. Akka streams uses actors under the hood but manages all of this for you, so you can focus only on implementing reusable small components that do exactly one thing.
You will then be able to write data processing pipelines, using Kafka as a Source for your data flow. I don't know much about Ignite cache, but chances are you either will write a Sink for it or - if you're talking about a blocking API, mapAsync will be your friend.
Related
I'm currently porting an Akka Classic app to Akka Typed. I have the following components:
HttpService - Not an Actor
JobDispatcher - An Actor
JobWorker - Child actor of JobDispatcher
The JobDispatcher is a singleton actor that orchestrates jobs. Each JobWorker is responsible for one "job" and knows the job's status.
The HTTP service will make an Ask to JobDispatcher, called GetJobStatuses.
The JobDispatcher will then Ask each of the JobWorkers what their status is, aggregate the results into a list, and reply to HttpService.
The way I did this in Akka Classic was to have JobDispatcher do all the Asks, put the Futures into a list of Futures, and then transform that into a Future of Lists, and when that aggregate Future completed, I would send the results to HttpService. It looked something like this:
val statusFutures: Seq[Future[JobStatus]] = jobWorkers map (jobWorker => (jobWorker ? GetJobStatus).mapTo[JobStatus])
val aggregateFuture: Future[Seq[SearchStatus]] = Future.sequence(statusFutures)
val theSender = context.sender()
aggregateFuture onComplete {
case Success(jobStatuses: Seq[JobStatus]) => {
theSender ! jobStatuses
}
case Failure(exception) => {
theSender ! exception
}
}
So, now that we're moving to Akka Typed, we're not supposed to use Futures / onComplete, but instead turn the Ask response into a message back ourself (JobDispatcher in this case). This is fairly straightforward for simple situations where I'm Asking one other actor for one reply. But in this case, I have a whole list of child actors from which I need to compile their responses.
The only thing I can think of is to make JobDispatcher hold a "state" of a list of JobWorker responses that I'm waiting for, track of which ones have been received, and when I've received them all, send a response message back to the HTTP service. This is further complicated by the fact that I may have multiple simultaneous Asks coming in from the HTTP service, and I'd have to track multiple copies of this "state", and somehow identify which HTTP request each one is for.
This is all WAY more complicated than the the aggregate Future solution above.
What is the simple/correct way to handle situations like this in Akka Typed?
The docs suggest using a per-session child actor for this situation. The child actor, being only associated with a single HTTP request is implicitly tracking exactly one copy of that state and is also able to manage the state of the process of scatter/gathering jobs (e.g. around timeouts and retries).
It's also worth noting that the example classic code has a massive bug: never call sender in code involving futures. Mixing futures and actors is superficially easy but also easy to turn into something that only works by coincidence (with tests often exhibiting that coincidental behavior).
I have a Scala class ComponentBuilder which deals with creating actors and initializing them. It has a field system corresponding to the ActorSystem (among other things).
Now, I'd like to test it using TestKit - intercept logs which actors produce and check them. If I try to use regular EventFilter I can see the log which my actors produce in stdout, but the EventFilter doesn't catch them, I guess because they are in a different ActorSystem.
One solution I thought of was to make ComponentBuilder a subclass of ActorSystem and pass all the AS-commands to its field. I didn't manage to do it, due to protected methods that return InternalActorRef, but anyway I'm not sure if it'll work then, as there will be 2 actor systems anyway.
I also tried to pass some messages from testing class to the actor and wait and check the results. This results in replying to the deadLetters, probably for the same reason - the inner ActorSystem doesn't know about the outer one.
I will appreciate any solution you may have.
I am a little confused by how I am supposed to get a reference to my Actor once it has been crerated in the Akka system. Say I have the following:
class MyActor(val x: Int) extends Actor {
def receive = {
case Msg => doSth()
}
}
And at some stage I would create the actor as follows:
val actorRef = system.actorOf(Props(new MyActor(2), name = "Some actor")
But when I need to refer to the MyActor object I cannot see any way to get it from the actor ref?
Thanks
Des
I'm adding my comment as an answer since it seems to be viewed as something worth putting in as an answer.
Not being able to directly access the actor is the whole idea behind actor systems. You are not supposed to be able to get at the underlying actor class instance. The actor ref is a lightweight proxy to your actor class instance. Allowing people to directly access the actor instance could lead down the path of mutable data issues/concurrent state update issues and that's a bad path to go down. By forcing you to go through the ref (and thus the mailbox), state and data will always be safe as only one message is processed at a time.
I think cmbaxter has a good answer, but I want to make it just a bit more clear. The ActorRef is your friend for the following reasons:
You cannot ever access the underlying actor. Actors receive their thread of execution from the Dispatcher given to them when they are created. They operate on one mailbox message at a time, so you never have to worry about concurrency inside of the actor unless YOU introduce it (by handling a message asynchronously with a Future or another Actor instance via delegation, for example). If someone had access to the underlying class behind the ActorRef, they could easily call into the actor via a method using another thread, thus negating the point of using the Actor to avoid concurrency in the first place.
ActorRefs provide Location Transparency. By this, I mean that the Actor instance could exist locally on the same JVM and machine as the actor from which you would like to send it a message, or it could exist on a different JVM, on a different box, in a different data center. You don't know, you don't care, and your code is not littered with the details of HOW the message will be sent to that actor, thus making it more declarative (focused on the what you want to do business-wise, not how it will be done). When you start using Remoting or Clusters, this will prove very valuable.
ActorRefs mask the instance of the actor behind it when failure occurs, as well. With the old Scala Actors, you had no such proxy and when an actor "died" (threw an Exception) that resulted in a new instance of the Actor type being created in its place. You then had to propagate that new instance reference to anyone who needed it. With ActorRef, you don't care that the Actor has been reincarnated - it's all transparent.
There is one way to get access to the underlying actor when you want to do testing, using TestActorRef.underlyingActor. This way, you can write unit tests against functionality written in methods inside the actor. Useful for checking basic business logic without worrying about Akka dynamics.
Hope that helps.
So this question is related to an old one of mine: Do I need to re-use the same Akka ActorSystem or can I just create one every time I need one?
I asked a question about the lifecycle of actors, and I knew something was wrong in my mind, but couldn't phrase it correctly. Hopefully I can now :-).
Here's the situation. I want to test actors that have dependencies to other components and actors, so I went about composing my actors in bootstrap time (I'm using scalatra but however you bootstrap your app). I therefore have something like this:
trait DependencyComponent
{
val dependency : Dependency
}
trait ActorComponentA extends Actor with DependencyComponent {
val actorB : ActorRef
}
trait ActorComponentB extends Actor with DependencyComponent
Ok, so now I can test my actors by extending the traits and providing mock dependencies, all good. And I can bootstrap my app like so:
Bootstrap
val system = ActorSystem()
val actorA = system.actorOf(Props[DefaultActorA])
class DefaultActorB extends ActorComponentB {
val dependency = new RealDependency()
}
class DefaultActorA extends ActorComponentA {
val dependency = new RealDependency()
val actorB = context.actorOf(Props[DefaultActorB]).withRouter(RoundRobinRouter(nrOfInstances = 100)))
}
Cool, Im happy :-), now I can use the actorSystem and actorA within my app, and it has a 100 actorB routed to pass work to. So when actorA decideds that the work is done, it's my understanding that it should broadcast to the routed actors to shutdown. At this point when another request comes in actorA can no longer send messages to the router because all its actors are dead.
If I wasn't setting this up at boot time then actorA and its dependencies could be created when needed in my app. But that is very much like "newing up on object" in DI world. In order to test I would end up overriding the places where the actors were created.
Scalatra docs are suggesting creating my actors at boot time, so I feel that I am missing somehting here. Any help appreciated.
Cheers, Chris.
EDIT
I've +1 both #futurechimp and #cmbaxter as these both seem valid but slightly conflicting. So this is an open comment to both of you.
So #cmbaxter am I right in thinking that your suggesting never calling 'stop' on the routed actors and just maintaining a pool of them for use by ALL requests. And #futurechimp, your suggesting having the servlet instantiate the actors per request and killing them at the end of there lifecycle. Right?
It seems like per-request will spawn more actors (but dispose of them). Where the poll will have only a limited set for all requests in which case is there a potential bottle neck to this approach?
I guess basically, I'm asking if my assumptions are correct and if so what are the advantage and disadvantages to both approaches?
Instantiating an ActorSystem is expensive - however instantiating an Actor isn't. If you only want to instantiate your ActorSystem in ScalatraBootstrap, and your Actors elsewhere, that should work fine if that's what you need to do. I'll talk to some other people to confirm this, and then change the docs in Scalatra's Akka Guide to avoid confusion in future.
One of the questions you have to ask yourself here is: Are my actors going to be stateful or stateless. If stateless (and I would prefer this approach personally when possible), then they can be "long-lived" and you can start them when the server boots up and leave them running for the duration of the server's life. When you need to talk to them from elsewhere in the code, use system.actorFor(String) or system.actorSelection(String) (depending on what version of akka you are using) to look up the actor and send it a message. If the actors are going to be stateful, then they probably should be "short-lived" and started up in response to an individual request. In this case, you will not start them up when the server boots up; you will only start up the ActorSystem itself. Then, when the request comes in, you will instantiate via system.actorOf instead and make sure that when the work is done that you stop ActorA as it's the supervisor of all the ActorBs and stopping A will stop all of the Bs started by A.
As it was said several times before here, you can spwan a new thread (putting a long-processing computation into actor{} block) inside an actor, and spawned computation will be safely run on the same thread pool (used by the actor scheduler).
actor{
var i = 0
case msg => actor {
// computation
i = i + 1 // is `i` still thread safe?
// looks like it can be access simultaneosly from 2 two threads now
// should I make it #volatile?
}
reply(i)
}
However, will it be thread-safe, and does it follow, in general, the original design, which states that only one thread in a moment of time can work with one actor?
Spawning an actor from another is completely safe.
However, sharing mutable state between actors isn't, regardless of how & where they were spawned.
The whole point of actors is that they should communicate with messages, via their mailboxes. Abuse this model and actors offer no more protection from concurrency issues than you get with raw threads.
In your example, i can indeed be accessed from multiple threads; that doing the computation and the thread calling reply. Perhaps you should reply with aFuture?
reply( future { /* perform computation */ } )
The caller would then be able to react to the futures input channel (non-blocking), or simply apply() (blocking).
Actors can only process one message simultanously, so anything you are doing within your actor is threadsafe by design you have never to think about thread safety.
This is the complete idea behind actors and it's concurrency design.