what the essential difference between akka and ThreadPool+BlockingQueue in ONE Process? - scala

We know Akka is one implementation of actor pattern. Without Akka, I usually implement a simple actor pattern using ThreadPool+BlockingQueue. So the message is offered into the queue, and the works(actors) take the message from the Queue, then do what they should do. Of course, this kind of implementation can be only in just ONE process.
So as to in one process,
What's the essential difference between these two(Akka vs.
ThreadPool+BlockingQueue)
Moreover, what's the difference between actor pattern and producer-consumer model?

Actor model is indeed quite similar to producer-consumer model (P-C).
However, if you use a blocking queue with P-C your application won't be completely non-blocking and asynchronous. The promise of actor model and Akka is that all messages are sent asynchronously and don't block the sender.
Another aspect of it is managing these queues gets quite cumbersome once you have many consumers and producers. With actors you simply send a message and don't have to think about these low level details. Under the hood Akka will keep a message queue aka mailbox per actor with a dispatcher assigning actors to the thread pool to process those messages.
It's much easier to use Akka to achieve highly performant and resilient application than coding it yourself. You get fault tolerance, resource management, location transparency, routing, distributed, async processing, hierarchical supervision out of the box. Not to mention other frameworks and libraries leveraging these features to give you even more (reactive streams, akka http, etc). There are lot's of patterns developed for you already there, so why bother with your own.

Related

What happens to messages that come to a server implements stream processing after the source reached its bound?

Im learning akka streams but obviously its relevant to any streaming framework :)
quoting akka documentation:
Reactive Streams is just to define a common mechanism of how to move
data across an asynchronous boundary without losses, buffering or
resource exhaustion
Now, from what I understand is that if up until before streams, lets take an http server for example, the request would come and when the receiver wasent finished with a request, so the new requests that are coming will be collected in a buffer that will hold the waiting requests, and then there is a problem that this buffer have an unknown size and at some point if the server is overloaded we can loose requests that were waiting.
So then stream processing came to play and they bounded this buffer to be controllable...so we can predefine the number of messages (requests in my example) we want to have in line and we can take care of each at a time.
my question, if we implement that a source in our server can have a 3 messages at most, so if the 4th id coming what happens with it?
I mean when another server will call us and we are already taking care of 3 requests...what will happened to he's request?
What you're describing is not actually the main problem that Reactive Streams implementations solve.
Backpressure in terms of the number of requests is solved with regular networking tools. For example, in Java you can configure a thread pool of a networking library (for example Netty) to some parallelism level, and the library will take care of accepting as much requests as possible. Or, if you use synchronous sockets API, it is even simpler - you can postpone calling accept() on the server socket until all of the currently connected clients are served. In either case, there is no "buffer" on either side, it's just until the server accepts a connection, the client will be blocked (either inside a system call for blocking APIs, or in an event loop for async APIs).
What Reactive Streams implementations solve is how to handle backpressure inside a higher-level data pipeline. Reactive streams implementations (e.g. akka-streams) provide a way to construct a pipeline of data in which, when the consumer of the data is slow, the producer will slow down automatically as well, and this would work across any kind of underlying transport, be it HTTP, WebSockets, raw TCP connections or even in-process messaging.
For example, consider a simple WebSocket connection, where the client sends a continuous stream of information (e.g. data from some sensor), and the server writes this data to some database. Now suppose that the database on the server side becomes slow for some reason (networking problems, disk overload, whatever). The server now can't keep up with the data the client sends, that is, it cannot save it to the database in time before the new piece of data arrives. If you're using a reactive streams implementation throughout this pipeline, the server will signal to the client automatically that it cannot process more data, and the client will automatically tweak its rate of producing in order not to overload the server.
Naturally, this can be done without any Reactive Streams implementation, e.g. by manually controlling acknowledgements. However, like with many other libraries, Reactive Streams implementations solve this problem for you. They also provide an easy way to define such pipelines, and usually they have interfaces for various external systems like databases. In particular, such libraries may implement backpressure on the lowest level, down to to the TCP connection, which may be hard to do manually.
As for Reactive Streams itself, it is just a description of an API which can be implemented by a library, which defines common terms and behavior and allows such libraries to be interchangeable or to interact easily, e.g. you can connect an akka-streams pipeline to a Monix pipeline using the interfaces from the specification, and the combined pipeline will work seamlessly and supporting all of the backpressure features of Reacive Streams.

Akka, advices on implementation

I would like to create an app who integrate our (enterprise) specific tools with Slack. For that I can foreseen some bots who send events to Slack and a few commands who trigger actions.
I would like to use Akka because it is a buzzword and seems nice but don't have any arguments in favor of it. (It is not a problem since I will develop this app alone on my freetime).
But I don't have any experience on how to create an "Actor based application". I already have 3 actors, two to collect Events and one to publish those Events to Slack. Each collector are triggered by a timer, they hold a reference to the publisher and send message to him... that works..
For the commands part I din't have anything but can imagine a listener (Play http controller) who convert Slack requests to message and send them to one Actor. Ideally, I would like to decouple this listener from the Actor who will handle the command.
I would like to have some advices on how to develop this kind of applications where I have actors to collect information on a time basis and other to react to messages.
Thanks.
For time based activity, i.e. "ticks" or "heartbeats", the documentation demonstrates a very good pattern for sending messages to Actors periodically.
For your other need, Actors responding to messages as they come in, this is exactly the purpose of the Actor's receive method. From the documentation:
Actors are objects which encapsulate state and behavior, they
communicate exclusively by exchanging messages which are placed into
the recipient’s mailbox. In a sense, actors are the most stringent
form of object-oriented programming, but it serves better to view them
as persons: while modeling a solution with actors, envision a group of
people and assign sub-tasks to them, arrange their functions into an
organizational structure and think about how to escalate failure (all
with the benefit of not actually dealing with people, which means that
we need not concern ourselves with their emotional state or moral
issues). The result can then serve as a mental scaffolding for
building the software implementation.

Importance of Akka Routers

I have this lingering doubt in my mind about the importance of Akka Routers. I have used Akka Routers in the current project I am working on. However, I am a little confused about the importance of it. Out of the two below methods, which is more beneficial.
having routers and routees.
Creating as many actors as needed.
I understood that router will assign the incoming messages among its routees based on the strategy. Also, we can have supervisor strategy based on the router.
I have also understood that actors are also lightweight and it is not an overhead to create as many actors as possible. So, we can create actors for each of the incoming messages and kill it if necessary after the processing si completed.
So I want to understand which one of the above design is better? Or in other words, in which case (1) has advantage over (2) OR vice versa.
Good question. I had similar doubts before I read Akka documentation. Here are the reasons:
Efficiency. From docs:
On the surface routers look like normal actors, but they are actually
implemented differently. Routers are designed to be extremely
efficient at receiving messages and passing them quickly on to
routees.
A normal actor can be used for routing messages, but an actor's
single-threaded processing can become a bottleneck. Routers can
achieve much higher throughput with an optimization to the usual
message-processing pipeline that allows concurrent routing. This is
achieved by embedding routers' routing logic directly in their
ActorRef rather than in the router actor. Messages sent to a router's
ActorRef can be immediately routed to the routee, bypassing the
single-threaded router actor entirely.
The cost to this is, of course, that the internals of routing code are
more complicated than if routers were implemented with normal actors.
Fortunately all of this complexity is invisible to consumers of the
routing API. However, it is something to be aware of when implementing
your own routers.
Default implementation of multiple routing strategies. You can always write your own, but it might get tricky. You have to take into account supervision, recovery, load balancing, remote deployment, etc.
Akka Router patterns will be familiar to Akka users. If you roll-out your custom routing then everyone will have to spend time understanding all corner cases and implications (+ testing? :)).
TL;DR If you don't care about efficiency too much and if it's easier for you to spawn new actors then go for it. Otherwise use Routers.

Suspend Akka Actors

I am trying to use Akka to implement the following (I think I'm trying to use Akka the proper way):
I have a system where I have n resource listeners. Essentially a resource listener is an entity that will listen on an input resource and publish what it sees (i.e. polling a database, tailing a log file, etc.).
So I want to use Akka actors to do these little bits of work units (listening on a resource). I've noticed that the Akka gives me a thread pool of t threads which may be less than the number of listeners. Unfortunately for me, getting a message from these resource listeners might be blocking, so it could take seconds, minutes, before the next message pops up.
Is there any way to suspend a resource listener so it leaves the thread to another actor and we'll come back to it a little later in time?
Executive Summary
What you want is for your producer API (the resources) to be asynchronous, or at least support non-blocking operations (so that you can do polling). If the API does not support that, then there is no way to retrofit this property, not even using the almighty actors ;-)
Strategies for Different Situations
Only Blocking API
If the resources only support the blocking getWhatever() method of retrieving things, then you must allocate one thread per resource. An Actor with a PinnedDispatcher could be a way to do this. But be aware that the actor will not be responsive while waiting for events from the resource.
Non-Blocking but Synchronous API
If there is a peek() or poll() method on the resource API, you can use one actor per resource, have them share a thread (or pool) and schedule the polling as required (i.e. every 100ms or whatever you need). This has the huge advantage that nobody is actually blocked and the whole system remains responsive. But latency for event reception will be of the order of your schedule interval.
Proper Asynchronous API
If you have enough good karma to encounter a nice asynchronous API, then simply register a callback which will send a message to the actor whenever an event occurs. Sadly, this is not the norm.
PS:
The JVM does not support wrapping up the current call stack, doing something else and return to that same processing state later. A method can only be popped of the stack when it is actually finished.
In general, you should try to avoid blocking operations in actors. For file IO, there are asynchronous libraries and for some databases, too. If that is not an option for you, you can set change the default dispatcher so that the underlying thread pool expands as needed.
One option is to call your blocking APIs inside Futures. The Futures should use an ExecutionContext (thread pool) that is separate from the Actors' ExecutionContext.
See this blog post for an example (specifically CacheActor.findValueForSender).

The Scala way to use one actor per socket connection

I am wondering how it is possible to avoid one socket connection pr. thread in Scala. I have thought a lot about it, but I always end up with some code which is listening for incoming data for each client connection.
The problem is that I want to develop an application which should simultanously handle perhaps a couple of thousand connections. However I will of course not want to create a thread for each connection because of the lack of scalability and context switching.
What would be the "right" way to do this. In my world it should be possible to have one actor for each connection without the need to block one thread per actor.
In the book "Programming Scala" the authors used a library called naggati which provides a framework that combines NIO and actors, http://programming-scala.labs.oreilly.com/ch09.html.
I have an application that mixes actors with non-blocking sockets (i.e. NIO). The way I have done this is to have a dedicated IO thread, which sends messages to actors (in much the same way it would delegate work to a thread pool in a Java system) using the reactor pattern.
Obviously using the old blocking sockets, you are restricted to one thread per connection. And actor could handle this but of course this places a restriction on the number of connections which can be handled simultaneously.
In the case of a single IO thread, this is a bottleneck in theory but not much in practice (in our observations) as the IO thread is doing computationally non-intensive work. There are plenty of good discussions to be found on the NIO reactor pattern.