How is ReactiveMongo implemented so that it is considered non-blocking? - scala

Reading the documentation about the Play Framework and ReactiveMongo leads me to believe that ReactiveMongo works in such a way that it uses few threads and never blocks.
However, it seems that the communication from the Play application to the Mongo server would have to happen on some thread somewhere. How is this implemented? Links to the source code for Play, ReactiveMongo, Akka, etc. would also be very appreciated.
The Play Framework includes some documentation about this on this page about thread pools. It starts off:
Play framework is, from the bottom up, an asynchronous web framework. Streams are handled asynchronously using iteratees. Thread pools in Play are tuned to use fewer threads than in traditional web frameworks, since IO in play-core never blocks.
It then talks a little bit about ReactiveMongo:
The most common place that a typical Play application will block is when it’s talking to a database. Unfortunately, none of the major databases provide asynchronous database drivers for the JVM, so for most databases, your only option is to using blocking IO. A notable exception to this is ReactiveMongo, a driver for MongoDB that uses Play’s Iteratee library to talk to MongoDB.
Following is a note about using Futures:
Note that you may be tempted to therefore wrap your blocking code in Futures. This does not make it non blocking, it just means the blocking will happen in a different thread. You still need to make sure that the thread pool that you are using there has enough threads to handle the blocking.
There is a similar note in the Play documentation on the page Handling Asynchronous Results:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
The documentation seems to be saying that ReactiveMongo is non-blocking, so you don't have to worry about it eating up a lot of the threads in your thread pool. But ReactiveMongo has to communicate with the Mongo server somewhere.
How is this communication implemented so that Mongo doesn't use up threads from Play's default thread pool?
Once again, links to the specific files in Play, ReactiveMongo, Akka, etc, would be very appreciated.

Yes, indeed, you still need to use threads to perform any kind of work, including communication with the database. What's important is how exactly this communication happens.
ReactiveMongo "does not use threads" in a sense that it does not use blocking I/O. Usual Java I/O facilities like java.io.InputStream are blocking; this means that reading from such an InputStream or writing to OutputStream blocks the thread until the "other side" provides the required data or is ready to accept it. For network communication this means that threads will be blocked.
However, Java provides NIO API which supports non-blocking and asynchronous I/O. I don't want to get into its details right now, but the basic idea, naturally, is that non-blocking I/O allow not to block threads which need to exchange some data with the outside world: for example, these threads can poll the data source to check if there is some data available, and if there is none, they return to the thread pool and can be used for other tasks. Of course, down there these facilities are provided by the underlying OS.
Exact implementation details of non-blocking I/O is usually hidden inside high-level libraries like Netty because it is not at all nice to use. Netty (which is exactly the library ReactiveMongo uses), for example, provides nice asynchronous callback-like API which is really easy to use but is also powerful and expressive enough to allow building complex I/O-heavy applications with high throughput.
So, ReactiveMongo uses Netty to talk with Mongo database server, and because Netty is an implementation of asynchronous network I/O, ReactiveMongo really does not need to block threads for a long time.

Related

How useful will JDBC be in an event-driven program?

I'm writing an event-driven architecture in Scala and I need to manage a database using it.
I was wondering if using JDBC, which only supports synchronous calls, would be a good solution to my problem?
I'd thought of writing an asynchronous wrapper for the calls to JDBC, but will it really tackle my concerns of the thread being blocked because of the database call?
This is really good question, and actually there's no single good answer to it.
It really depends on your database, its protocol and driver implementation. First of all, some databases, e.g. Cassandra, have asynchronous capabilities built into the protocol level. Should make it easier to work in the event-driven model, right? Not exactly - if you get gigabytes of data over slow connection you may block on the network level.
Other databases have only synchronous protocol and thus can block your resources, right? Not exactly - there are connection pools that prevent some of the issues with blocking.
So, depending on your application architecture and data you're accessing, you may need to isolate a layer of data access, that will wrap the JDBC connections and provide asynchronous capabilities. This layer would scale up and down depending on availability of open connections (e.g. actors holding the connection, and supervisor that will spawn new DB connection actors if there are no free databases, and create new threads using PinnedDispatcher).
In other cases, with specific drivers, you may stick with just JDBC wrapped into Future an hope that driver does it magic for you.
If you build large-scale application, you may want to even separate persistence access logic completely into, say, RabbitMQ, and use RPC for accessing the database.
As far as I know JDBC drivers are synchronous. Maybe you can design your system so your "main" actors asynchronously dispatch requests to background "JDBC" actors that deal with the JDBC driver?
I also try to implement fully asnyc architecture and DB calls are my bottleneck. I found that good idea is to use connection polled JDBC C3P0 driver. Here is usage example in Scala Slick framework link. Connection pooling is one of the solutions in which you have ready to use connections to your DB. It's better than spawning new connection for each request and after complition removing that connection. Here is full website od C3P0 project link

Server-side Websocket implementations in non-event driven HTTP Server Environments

I am trying to understand implementations/options for server-side Websocket endpoints - particularly in Perl using PSGI/Plack and I have a question: Why are all server-side websocket implementations based around event-driven PSGI servers (Twiggy, Tatsumaki, etc.)?
I get that websocket communication is asynchronous, but a non-event driven PSGI server (say Starman) could spawn an asynchronous listener to handle the websocket side of things. I have seen (but not understood) PHP implementations of Websocket servers, so why cant the same be done with PSGI without having to change the server to an event driven one?
Underlying network logic to deal with sockets depends on platform, OS and particular software implementations.
Most common three methods are:
pulling - there is blocking constant "asking" if socket has some data. This method is well bad, as it will block execution of main thread for as long as it waits for some data.
thread per socket - each new connection involves creating new thread and asking each socket in blocking manner happens within that thread. So it wont block main thread with logic. This method is bad as creating thread for each connection is too expensive for memory, and can be around 1Mb or RAM based on OS and other criteria.
async - uses system features to "notify" your process when there is something. So you can react once your app is ready (in case of single threaded app) or even react in separate thread straight away. This method is well efficient as it saves RAM, and allows your app to work without need of waiting or asking for data. It utilises existing functionalities that most OS and platforms provide.
Taking this in account, you indeed can create single process functional way to deal with sockets traffic. But that is not efficient at all as been proven previously. That is why fully async models are major today, as most languages and platforms do support such paradigm.

Play & Akka and blocking threads for database access

I want to make a call to a database which has lots of data and it might take a while to return.
I plan to do that work inside a call to Akka.future(f) and use an Async{} to render the response when the work is done.
Does it make sense to do that, or should I just do the long database call in the controller, without sending the work to Akka?
Or is there a way to do non blocking database access?
If you're forced to use a blocking driver for your database (if for some reason the async driver for MySQL doesn't work out) consider setting up an Actor pool (using routing) with a PinnedDispatcher.
The PinnedDispatcher provides a thread per actor and, by setting up the router, will give you the ability to adjust the number of threads strictly responsible for handling the database calls. Easy scaling. Also, by using Actors you can structure the messages between actors (e.g. a message having the results of the database call) a little easier.
You can use Akka.future(f) and provide your own Akka configuration file to get more threads to process your database accesses. Look at this config file for example.
But you pointed it out: the real problem is in using a database driver that blocks. I don't know which DB you are using, but it's worth to take a look to MongoDB with ReactiveMongo for example. With ReactiveMongo all MongoDB operations are perfectly non-blocking and asynchronous. There is a good introduction here. Moreover, it deals very well with Play Framework (check the ReactiveMongo Play Plugin).
EDIT: You can also check "Configuring Playframework's internal Akka system" to tune the worker threads number.
If the response is blocked on completion of the database call, then it's only useful to make it asynchronous if you can get other work done towards assembling the response while the call runs.
Non blocking database access could mean a couple different things: A client library that gives you a callback based API, which would be pretty similar to the future solution, or one that uses non-blocking sockets to save on thread usage. I'm assuming you mean the former, in which case I think it'd be functionally equivalent to using a future.

Is it good to put jdbc operations in actors?

I am building a traditional webapp that do database CRUD operations through JDBC. And I am wondering if it is good to put jdbc operations into actors, out of current request processing thread. I did some search but found no tutorials or sample applications that demo this.
So What are the cons and pros? Will this asynchonization improve the capacity of the appserver(i.e. the concurrent request processed) like nio?
Whether putting JDBC access in actors is 'good' or not greatly depends upon the rest of your application.
Most web applications today are synchronous, thanks to the Servlet API that underlies most Java (and Scala) web frameworks. While we're now seeing support for asynchronous servlets, that support hasn't worked its way up all frameworks. Unless you start with a framework that supports asynchronous processing, your request processing will be synchronous.
As for JDBC, JDBC is synchronous. Realistically there's never going to be anything done about that, given the burden that would place on modifying the gazillion JDBC driver implementations that are out in the world. We can hope, but don't hold your breath.
And the JDBC implementations themselves don't have to be thread safe, so invoking an operation on a JDBC connection prior to the completion of some other operation on that same connection will result in undefined behavior. And undefined behavior != good.
So my guess is that you won't see quite the same capacity improvements that you see with NIO.
Edit: Just discovered adbcj; an asynchronous database driver API. It's an experimental project written for a master's thesis, very early, experimental. It's a worthy experiment, and I hope it succeeds. Check it out!
But, if you are building an asynchronous, actor-based system, I really like the idea of having data access or repository actors, much in the same way your would have data acccess or repository objects in a layered OO architecture.
Actors guarantee that messages are processed one at a time, which is ideal for accessing a single JDBC connection. (One word of caution: most connection pools default to handing out connection-per-thread, which does not play well with actors. Instead you'll need to make sure that you are using a connection-per-actor. The same is true for transaction management.)
This allows you to treat the database like the asynchronous remote system we ought to have been treating it as all along. This also means that results from your data access/repository actors are futures, which are composable. This makes it easier to coordinate data access with other asynchronous activities.
So, is it good? Probably, if it fits within the architecture of the rest of your system. Will it improve capacity? That will depend on your overall system, but it sounds like a very worthy experiment.

The Scala way to use one actor per socket connection

I am wondering how it is possible to avoid one socket connection pr. thread in Scala. I have thought a lot about it, but I always end up with some code which is listening for incoming data for each client connection.
The problem is that I want to develop an application which should simultanously handle perhaps a couple of thousand connections. However I will of course not want to create a thread for each connection because of the lack of scalability and context switching.
What would be the "right" way to do this. In my world it should be possible to have one actor for each connection without the need to block one thread per actor.
In the book "Programming Scala" the authors used a library called naggati which provides a framework that combines NIO and actors, http://programming-scala.labs.oreilly.com/ch09.html.
I have an application that mixes actors with non-blocking sockets (i.e. NIO). The way I have done this is to have a dedicated IO thread, which sends messages to actors (in much the same way it would delegate work to a thread pool in a Java system) using the reactor pattern.
Obviously using the old blocking sockets, you are restricted to one thread per connection. And actor could handle this but of course this places a restriction on the number of connections which can be handled simultaneously.
In the case of a single IO thread, this is a bottleneck in theory but not much in practice (in our observations) as the IO thread is doing computationally non-intensive work. There are plenty of good discussions to be found on the NIO reactor pattern.