Do Akka Streams leverage Akka Actors? - scala

I started learning Akka Streams, which is a framework for processing data with back-pressure functionality. The library is part of Akka that describes itself as:
Akka is a toolkit and runtime for building highly concurrent,
distributed, and resilient message-driven applications on the JVM.
These capabilities comes from the nature of Akka actors. However, from my perspective, stream processing and actors are irrelevant concept to each other.
Question:
Do Akka Streams take advantage of these features of Akka actors? If yes, would you explain how actors help streams?

Akka Streams is a higher level abstraction than actors. It's an implementation of Reactive Streams which builds on top of the actor model. It takes advantage of all the actor features because it uses actors.
You can even go back to using actors directly in any part of the stream. Look at ActorPublisher and ActorSubscriber.

A good starting point is the akka stream quickstart.
Yes, an Actor is used to "materialize" each {Source, Flow, Sink} of a Stream. This means that when you create a Stream nothing actually happens until the stream is materialized, typically via the .run() method call.
As an example, here is a Stream being defined:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source, Flow, Sink}
val stream = Source.single[String]("test")
.via(Flow[String].filter(_.size > 0))
.to(Sink.foreach{println})
Even though the stream is now a val no computation has actually happened. The Stream is just a recipe for computation. To actually kick-off the work the Stream needs to be materialized. Here is an example that does not use implicits to clearly show how materialization occurs:
val actorSystem = ActorSystem()
val materializer = ActorMaterializer()(actorSystem)
stream.run()(materializer) //work begins
Now 3 Actors (at least) have been created: 1 for the Source.single, 1 for the Flow.filter, and 1 for the Sink.foreach. Note: you can use the same materializer to initiate other streams
val doesNothingStream = Source.empty[String]
.to(Sink.ignore)
.run()(materializer)

Related

akka-streams with akka-cluster

My akka-streams learn-o-thon continues. I'd like to integrate my akka-streams application with akka-cluster and DistributedPubSubMediator.
Adding support for Publish is fairly straight forward, but the Subscribe part I'm having trouble with.
For reference, a subscriber is given as follows in the Typesafe sample:
class ChatClient(name: String) extends Actor {
val mediator = DistributedPubSub(context.system).mediator
mediator ! Subscribe("some topic", self)
def receive = {
case ChatClient.Message(from, text) =>
...process message...
}
}
My question is, how should I integrate this actor with my flow, and how should I ensure I'm getting publish messages in the absence of stream backpressure?
I'm trying to accomplish a pubsub model where one stream may publish a message and another stream would consume it (if subscribed).
You probably want to make your Actor extend ActorPublisher. Then you can create a Source from it and integrate that into your stream.
See the docs on ActorPublisher here: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.3/scala/stream-integrations.html
The other answers are outdated: they suggest using ActorPublisher, which has been deprecated since version 2.5.0.
For those interested in a current approach, Colin Breck wrote an excellent series in his blog about integrating Akka Streams and Akka actors. Over the course of the series, Breck fleshes out a system that begins with Akka Streams and plain actors, then incorporates Akka Cluster and Akka Persistence. The first post in the series is here (the distributed stream processing piece is in part 3).

How to use an Akka actor from the Spark nodes of a cluster

Having a Spark cluster running a certain application, I would like to use an Akka actor to stream data from within each of the Spark nodes in the cluster. That is: the nodes are processing data in some way, and in parallel the actor is sending some other data within the node to an external process.
Now, these are the possible options:
Just create the ActorRef through a regular ActorSystem: not possible as the ActorSystem instance is not Serializable and it will fail at runtime
Use the Spark internal ActorSystem to create the actor: not a good option since Spark 1.4 as Spark.get.ActorSystem is deprecated
So what is the best way for the Spark nodes to instantiate a given actor if the options above are not valid? is it possible at all?
This question is somewhat related to this one although formulated on a wider scope
Note: I know I could somehow use Spark streaming for this scenario, but at the moment I would like to explore the feasibility of a pure Akka option

Akka: "Trying to deserialize a serialized ActorRef without an ActorSystem in scope" error

I am integrating the use of Akka actors and Spark in the following way: when a task is distributed among the Spark nodes, while processing that tasks, each node also periodically sends metrics data to a different collector process that sits somewhere else on the network through the use of an Akka actor (connecting to the remote process through akka-remote).
The actor-based metrics sending/receiving functionality works just fine when used in standalone mode, but when integrated in a Spark task the following error is thrown:
java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
at akka.actor.SerializedActorRef.readResolve(ActorRef.scala:407) ~[akka-actor_2.10-2.3.11.jar:na]
If I understood it correctly, the source of the problem is the Spark node being unable to deserialize the ActorRef because it does not have the full information required to do it. I understand that putting an ActorSystem in scope would fix it, but I am not sure how to use the suggested akka.serialization.Serialization.currentSystem.withValue(system) { ... }
The Akka official docs are very good in pretty much all topics they cover. Unfortunately, the chapter devoted to Serialization could be improved IMHO.
Note: there is a similar SO question here but the accepted solution is too specific and thus not really useful in the general case
An ActorSystem is responsible for all of the functionality involved with ActorRef objects.
When you program something like
actorRef ! message
You're actually invoking a bunch of work within the ActorSystem, not the ActorRef, to put the message in the right mailbox, tee-up the Actor to run the receive method within the thread pool, etc... From the documentation:
An actor system manages the resources it is configured to use in order
to run the actors which it contains. There may be millions of actors
within one such system, after all the mantra is to view them as
abundant and they weigh in at an overhead of only roughly 300 bytes
per instance. Naturally, the exact order in which messages are
processed in large systems is not controllable by the application
author
That is why your code works fine "standalone", but not in Spark. Each of your Spark nodes is missing the ActorSystem machinery, therefore even if you could de-serialize the ActorRef in a node there would be no ActorSystem to process the ! in your node function.
You can establish an ActorSystem within each node and use (i) remoting to send messages to your ActorRef in the "master" ActorSystem via actorSelection or (ii) the serialization method you mentioned where each node's ActorSystem would be the system in the example you quoted.

How to organize concurrent request processing in Scala?

Suppose I have a server in Scala, which processes incoming client requests. I have a function def process(req: Request): Response to process requests. Now I would like to process only K requests concurrently and keep M requests in queue.
In Java I would probably create a ThreadPoolExecutor with K threads and a queue of size M. Now I wonder how to do that in Scala with Actors/Futures etc.
If you must have def process(req: Request): Response), then I think your Scala solution may turn out to be similar to Java. If you can have def process(req: Request): Future[Response], the it opens other possibilities.
When using futures, you provide (implicitly or explicitly) an execution context that can be constructed from a Java executor. So you would be able to choose your thread pool size and queue size that way. The benefit of using futures is that you can compose them with map and flatMap and a few other combinators. See http://docs.scala-lang.org/overviews/core/futures.html for more information.
With actors, you have another model of concurrency where you can create K actors. You can have a router that dispatches each request to the actors. Each actor is independently processes the request and sends the Response to whatever needs it when processing is complete. The nice part about actors is that each actor being independent does not typically share anything with other actors, so you don't have to synchronize the code. See http://doc.akka.io/docs/akka/2.1.4/general/terminology.html for more information.
Overall I think Scala can use anything that Java has, and provides more mechanisms to do more complex things.

How are the multiple Actors implementation in Scala different?

With the release of Scala 2.9.0, the Typesafe Stack was also announced, which combines the Scala language with the Akka framework. Now, though Scala has actors in its standard library, Akka uses its own implementation. And, if we look for other implementations, we'll also find that Lift and Scalaz have implementations too!
So, what is the difference between these implementations?
This answer isn't really mine. It was produced by Viktor Klang (of Akka fame) with the help of David Pollak (of Lift fame), Jason Zaugg (of Scalaz fame), Philipp Haller (of Scala Actors fame).
All I'm doing here is formatting it (which would be easier if Stack Overflow supported tables).
There are a few places I'll fill later when I have more time.
Design Philosophy
Scalaz Actors
Minimal complexity. Maximal generality, modularity and extensibility.
Lift Actors
Minimal complexity, Garbage Collection by JVM rather than worrying about an explicit lifecycle, error handling behavior consistent with other Scala & Java programs, lightweight/small memory footprint, mailbox, statically similar to Scala Actors and Erlang actors, high performance.
Scala Actors
Provide the full Erlang actor model in Scala, lightweight/small memory footprint.
Akka Actors
Simple and transparently distributable, high performance, lightweight and highly adaptable.
Versioning
Scalaz Actors Lift Actors Scala Actors Akka Actors
Current stable ver. 5 2.1 2.9.0 0.10
Minimum Scala ver. 2.8 2.7.7 2.8
Minimum Java ver. 1.5 1.5 1.6
Actor Model Support
Scalaz Actors Lift Actors Scala Actors Akka Actors
Spawn new actors Yes Yes Yes Yes
inside of actor
Send messages to Yes Yes Yes Yes
known actor
Change behavior Actors are Yes Yes: nested Yes:
for next message immutable react/receive become/unbecome
Supervision Not provided No Actor: Yes, Yes
(link/trapExit) Reactor: No
Level of state isolation
If user defines public methods on
their Actors, are they callable from
the outside?
Scalaz Actors: n/a. Actor is a sealed trait.
Lift Actors: Yes
Scala Actors: Yes
Akka Actors: No, actor instance is shielded behind an ActorRef.
Actor type
Scalaz Actors: Actor[A] extends A => ()
Lift Actors: LiftActor, SpecializeLiftActor[T]
Scala Actors: Reactor[T], Actor extends Reactor[Any]
Akka Actors: Actor[Any]
Actor lifecycle management
Scalaz Actors Lift Actors Scala Actors Akka Actors
Manual start No No Yes Yes
Manual stop No No No Yes
Restart-on-failure n/a Yes Yes Configurable per actor instance
Restart semantics n/a Rerun actor Restore actor to stable state by re-allocating it and
behavior throw away the old instance
Restart configurability n/a n/a X times, X times within Y time
Lifecycle hooks provided No lifecycle act preStart, postStop, preRestart, postRestart
Message send modes
Scalaz Actors Lift Actors Scala Actors Akka Actors
Fire-forget a ! message actor ! msg actor ! msg actorRef ! msg
a(message)
Send-receive-reply (see 1) actor !? msg actor !? msg actorRef !! msg
actor !! msg
Send-receive-future (see 2) actor !! msg actorRef !!! msg
Send-result-of- promise(message). future.onComplete( f => to ! f.result )
future to(actor)
Compose actor with actor comap f No No No
function (see 3)
(1) Any function f becomes such an actor:
val a: Msg => Promise[Rep] = f.promise
val reply: Rep = a(msg).get
(2) Any function f becomes such an actor:
val a = f.promise
val replyFuture = a(message)
(3) Contravariant functor: actor comap f. Also Kleisli composition in Promise.
Message reply modes
TBD
Scalaz Actors Lift Actors Scala Actors Akka Actors
reply-to-sender-in-message
reply-to-message
Message processing
Supports nested receives?
Scalaz Actors: --
Lift Actors: Yes (with a little hand coding).
Scala Actors: Yes, both thread-based receive and event-based react.
Akka Actors: No, nesting receives can lead to memory leaks and degraded performance over time.
Message Execution Mechanism
TBD
Scalaz Actors Lift Actors Scala Actors Akka Actors
Name for Execution Mechanism
Execution Mechanism is
configurable
Execution Mechanism can be
specified on a per-actor basis
Lifecycle of Execution Mechanism
must be explicitly managed
Thread-per-actor execution
mechanism
Event-driven execution mechanism
Mailbox type
Supports transient mailboxes
Supports persistent mailboxes
Distribution/Remote Actors
Scalaz Actors Lift Actors Scala Actors Akka Actors
Transparent remote n/a No Yes Yes
actors
Transport protocol n/a n/a Java Akka Remote Protocol
serialization (Protobuf on top of TCP)
on top of TCP
Dynamic clustering n/a n/a n/a In commercial offering
Howtos
TBD
Scalaz Actors Lift Actors Scala Actors Akka Actors
Define an actor
Create an actor instance
Start an actor instance
Stop an actor instance
scala.actors was the first serious attempt to implement Erlang-style concurrency in Scala that has inspired other library designers for making a better (in some cases) and more performant implementations. The biggest problem (at least for me), is that unlike Erlang processes, complemented with OTP (that allows for building fault-tolerant systems), scala.actors only offer a good foundation, a set of stable primitives that must be used for building a more high-level frameworks - at the end of the day, you’ll have to write your own supervisors, catalogs of actors, finite state machines, etc. on top of actors.
And here Akka comes to the rescue, offering a full-featured stack for actor-based development: more idiomatic actors, set of high-level abstractions for coordination (load balancers, actor pools, etc.) and building fault-tolerant systems (supervisors, ported from OTP, etc.), easily configurable schedulers (dispatchers), and so on. Sorry, if I sound rude, but I think, there will be no merge in 2.9.0+ - I’d rather expect Akka actors to gradually replace stdlib implementation.
Scalaz. Normally I have this library in the list of dependencies of all my projects, and when, for some reason, I can’t use Akka, non-blocking Scalaz Promises (with all the goodness, like sequence) combined with the standard actors are saving the day. I never used Scalaz actors as a replacement for scala.actors or Akka, however.
Actors: Scala 2.10 vs Akka 2.3 vs Lift 2.6 vs Scalaz 7.1
Test code & results for average latency and throughput on JVM 1.8.0_x.