How useful will JDBC be in an event-driven program? - scala

I'm writing an event-driven architecture in Scala and I need to manage a database using it.
I was wondering if using JDBC, which only supports synchronous calls, would be a good solution to my problem?
I'd thought of writing an asynchronous wrapper for the calls to JDBC, but will it really tackle my concerns of the thread being blocked because of the database call?

This is really good question, and actually there's no single good answer to it.
It really depends on your database, its protocol and driver implementation. First of all, some databases, e.g. Cassandra, have asynchronous capabilities built into the protocol level. Should make it easier to work in the event-driven model, right? Not exactly - if you get gigabytes of data over slow connection you may block on the network level.
Other databases have only synchronous protocol and thus can block your resources, right? Not exactly - there are connection pools that prevent some of the issues with blocking.
So, depending on your application architecture and data you're accessing, you may need to isolate a layer of data access, that will wrap the JDBC connections and provide asynchronous capabilities. This layer would scale up and down depending on availability of open connections (e.g. actors holding the connection, and supervisor that will spawn new DB connection actors if there are no free databases, and create new threads using PinnedDispatcher).
In other cases, with specific drivers, you may stick with just JDBC wrapped into Future an hope that driver does it magic for you.
If you build large-scale application, you may want to even separate persistence access logic completely into, say, RabbitMQ, and use RPC for accessing the database.

As far as I know JDBC drivers are synchronous. Maybe you can design your system so your "main" actors asynchronously dispatch requests to background "JDBC" actors that deal with the JDBC driver?

I also try to implement fully asnyc architecture and DB calls are my bottleneck. I found that good idea is to use connection polled JDBC C3P0 driver. Here is usage example in Scala Slick framework link. Connection pooling is one of the solutions in which you have ready to use connections to your DB. It's better than spawning new connection for each request and after complition removing that connection. Here is full website od C3P0 project link

Related

starting with reactive DB access in a blocking monolith

In a DB heavy monolith based on wildfly. does it make sense to transform the DB access to reactive one for starters? should I see performance benefits?
also, the DB is sybase and the only 'generic' jdbc driver I know is from vert.x but this implies that I will have to put vert.x inside my wildfly. I understand that they are sort of alternatives but I cant find any other options.
I would love to hear your thoughts about the 2 points I am raising. In general, I cant commit to a full transition from wildfly to quarkus/vert.x from the get go as it will take lots of resources so I thought I could start smaller...
Vert.x is a toolkit, which means, for example, you do not need to use the web server it provides, nor any other module. It's also very lightweight, so you will only add a few more dependencies to your application. So, yes it can make sense to integrate Vert.x.
vertx-jdbc-client however, cannot magically transform blocking calls into non-blocking calls. Instead, it will off-load the blocking calls onto Vert.x' worker thread pool. That will lead to another effect: The DB call you used to wait for, will immediately return, leaving you with nothing but a Future. That Future will eventually have the expected result.
Going further upstream in your code (the direction where your user's request came from), this means that you will have to
either defer processing of the result via Future.map() or Future.compose()
block the thread to get the result immediately
You will win nothing by (2), so rule that out.
When you go for (1), you must defer all further processing, up to the point where the incoming request is originally handled. If that is, for example, a Servlet, you have to use Asynchronous Processing to make sure that Wildfly does not commit the response after the doGet, doPost etc. method exits.
The result of all this will be that Wildfly now handles your request asynchronously, with Vert.x managing the DB interaction. You can do that. But it would be more idiomatic to your current setup to just use Asynchronous Processing (or Spring's #Async feature) and wrap all of your code in a Runnable. Both approaches will not speed up request processing itself, because the processing depends on the slower DB. However, Wildfly will be able to process more requests because the threads it assigns to requests will not be blocked anymore.
Having all that said, if you want to migrate to Quarkus in small steps, you should do that service by service. Identify the Servlets (or Controllers) which do the work, and port them one by one to Quarkus. If sessions are your problem, then you could possibly share them between Wildfly and Quarkus, using Infinispan.

Pooling in Phoenix for OrientDB database

I want to use Phoenix/Elixir with OrientDB. I decided to build a little demo app to get a good understanding of it.
As database driver I will use MarcoPolo and not use Ecto at all. MarcoPolo is very low level (binary driver) and doesn't support pooling.
Do I have to use pooling? Does Phoenix have a way to deal with this? Or do I have to implement it myself using something like Poolboy? Or something else?
I want to share the demo app to make life easier for others. So I want to go about it the right way. But maybe my approach is an overkill.
MarcoPolo is a non-blocking client which means that when a process asks the MarcoPolo connection to send a command to OrientDB, MarcoPolo sends the command to OrientDB right away but then doesn't wait for the response (which it then receives as an Erlang message because it uses :active on :gen_tcp). What this means in practice is that a single MarcoPolo connection should be capable of handling several client processes, thus eliminating the need for pooling if your application doesn't have to handle lots of requests to OrientDB.
In case you want to use pooling, the simplest solution is probably poolboy as you already figured out. I have no OrientDB-specific setup but you can find some information on how to setup a pool of connections to a db in the documentation for Redix (a Redis client for Elixir). The principles are the same. This is the section in the documentation for Redix that covers pooling.

How is ReactiveMongo implemented so that it is considered non-blocking?

Reading the documentation about the Play Framework and ReactiveMongo leads me to believe that ReactiveMongo works in such a way that it uses few threads and never blocks.
However, it seems that the communication from the Play application to the Mongo server would have to happen on some thread somewhere. How is this implemented? Links to the source code for Play, ReactiveMongo, Akka, etc. would also be very appreciated.
The Play Framework includes some documentation about this on this page about thread pools. It starts off:
Play framework is, from the bottom up, an asynchronous web framework. Streams are handled asynchronously using iteratees. Thread pools in Play are tuned to use fewer threads than in traditional web frameworks, since IO in play-core never blocks.
It then talks a little bit about ReactiveMongo:
The most common place that a typical Play application will block is when it’s talking to a database. Unfortunately, none of the major databases provide asynchronous database drivers for the JVM, so for most databases, your only option is to using blocking IO. A notable exception to this is ReactiveMongo, a driver for MongoDB that uses Play’s Iteratee library to talk to MongoDB.
Following is a note about using Futures:
Note that you may be tempted to therefore wrap your blocking code in Futures. This does not make it non blocking, it just means the blocking will happen in a different thread. You still need to make sure that the thread pool that you are using there has enough threads to handle the blocking.
There is a similar note in the Play documentation on the page Handling Asynchronous Results:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
The documentation seems to be saying that ReactiveMongo is non-blocking, so you don't have to worry about it eating up a lot of the threads in your thread pool. But ReactiveMongo has to communicate with the Mongo server somewhere.
How is this communication implemented so that Mongo doesn't use up threads from Play's default thread pool?
Once again, links to the specific files in Play, ReactiveMongo, Akka, etc, would be very appreciated.
Yes, indeed, you still need to use threads to perform any kind of work, including communication with the database. What's important is how exactly this communication happens.
ReactiveMongo "does not use threads" in a sense that it does not use blocking I/O. Usual Java I/O facilities like java.io.InputStream are blocking; this means that reading from such an InputStream or writing to OutputStream blocks the thread until the "other side" provides the required data or is ready to accept it. For network communication this means that threads will be blocked.
However, Java provides NIO API which supports non-blocking and asynchronous I/O. I don't want to get into its details right now, but the basic idea, naturally, is that non-blocking I/O allow not to block threads which need to exchange some data with the outside world: for example, these threads can poll the data source to check if there is some data available, and if there is none, they return to the thread pool and can be used for other tasks. Of course, down there these facilities are provided by the underlying OS.
Exact implementation details of non-blocking I/O is usually hidden inside high-level libraries like Netty because it is not at all nice to use. Netty (which is exactly the library ReactiveMongo uses), for example, provides nice asynchronous callback-like API which is really easy to use but is also powerful and expressive enough to allow building complex I/O-heavy applications with high throughput.
So, ReactiveMongo uses Netty to talk with Mongo database server, and because Netty is an implementation of asynchronous network I/O, ReactiveMongo really does not need to block threads for a long time.

Play & Akka and blocking threads for database access

I want to make a call to a database which has lots of data and it might take a while to return.
I plan to do that work inside a call to Akka.future(f) and use an Async{} to render the response when the work is done.
Does it make sense to do that, or should I just do the long database call in the controller, without sending the work to Akka?
Or is there a way to do non blocking database access?
If you're forced to use a blocking driver for your database (if for some reason the async driver for MySQL doesn't work out) consider setting up an Actor pool (using routing) with a PinnedDispatcher.
The PinnedDispatcher provides a thread per actor and, by setting up the router, will give you the ability to adjust the number of threads strictly responsible for handling the database calls. Easy scaling. Also, by using Actors you can structure the messages between actors (e.g. a message having the results of the database call) a little easier.
You can use Akka.future(f) and provide your own Akka configuration file to get more threads to process your database accesses. Look at this config file for example.
But you pointed it out: the real problem is in using a database driver that blocks. I don't know which DB you are using, but it's worth to take a look to MongoDB with ReactiveMongo for example. With ReactiveMongo all MongoDB operations are perfectly non-blocking and asynchronous. There is a good introduction here. Moreover, it deals very well with Play Framework (check the ReactiveMongo Play Plugin).
EDIT: You can also check "Configuring Playframework's internal Akka system" to tune the worker threads number.
If the response is blocked on completion of the database call, then it's only useful to make it asynchronous if you can get other work done towards assembling the response while the call runs.
Non blocking database access could mean a couple different things: A client library that gives you a callback based API, which would be pretty similar to the future solution, or one that uses non-blocking sockets to save on thread usage. I'm assuming you mean the former, in which case I think it'd be functionally equivalent to using a future.

Is it good to put jdbc operations in actors?

I am building a traditional webapp that do database CRUD operations through JDBC. And I am wondering if it is good to put jdbc operations into actors, out of current request processing thread. I did some search but found no tutorials or sample applications that demo this.
So What are the cons and pros? Will this asynchonization improve the capacity of the appserver(i.e. the concurrent request processed) like nio?
Whether putting JDBC access in actors is 'good' or not greatly depends upon the rest of your application.
Most web applications today are synchronous, thanks to the Servlet API that underlies most Java (and Scala) web frameworks. While we're now seeing support for asynchronous servlets, that support hasn't worked its way up all frameworks. Unless you start with a framework that supports asynchronous processing, your request processing will be synchronous.
As for JDBC, JDBC is synchronous. Realistically there's never going to be anything done about that, given the burden that would place on modifying the gazillion JDBC driver implementations that are out in the world. We can hope, but don't hold your breath.
And the JDBC implementations themselves don't have to be thread safe, so invoking an operation on a JDBC connection prior to the completion of some other operation on that same connection will result in undefined behavior. And undefined behavior != good.
So my guess is that you won't see quite the same capacity improvements that you see with NIO.
Edit: Just discovered adbcj; an asynchronous database driver API. It's an experimental project written for a master's thesis, very early, experimental. It's a worthy experiment, and I hope it succeeds. Check it out!
But, if you are building an asynchronous, actor-based system, I really like the idea of having data access or repository actors, much in the same way your would have data acccess or repository objects in a layered OO architecture.
Actors guarantee that messages are processed one at a time, which is ideal for accessing a single JDBC connection. (One word of caution: most connection pools default to handing out connection-per-thread, which does not play well with actors. Instead you'll need to make sure that you are using a connection-per-actor. The same is true for transaction management.)
This allows you to treat the database like the asynchronous remote system we ought to have been treating it as all along. This also means that results from your data access/repository actors are futures, which are composable. This makes it easier to coordinate data access with other asynchronous activities.
So, is it good? Probably, if it fits within the architecture of the rest of your system. Will it improve capacity? That will depend on your overall system, but it sounds like a very worthy experiment.