Pooling in Phoenix for OrientDB database - orientdb

I want to use Phoenix/Elixir with OrientDB. I decided to build a little demo app to get a good understanding of it.
As database driver I will use MarcoPolo and not use Ecto at all. MarcoPolo is very low level (binary driver) and doesn't support pooling.
Do I have to use pooling? Does Phoenix have a way to deal with this? Or do I have to implement it myself using something like Poolboy? Or something else?
I want to share the demo app to make life easier for others. So I want to go about it the right way. But maybe my approach is an overkill.

MarcoPolo is a non-blocking client which means that when a process asks the MarcoPolo connection to send a command to OrientDB, MarcoPolo sends the command to OrientDB right away but then doesn't wait for the response (which it then receives as an Erlang message because it uses :active on :gen_tcp). What this means in practice is that a single MarcoPolo connection should be capable of handling several client processes, thus eliminating the need for pooling if your application doesn't have to handle lots of requests to OrientDB.
In case you want to use pooling, the simplest solution is probably poolboy as you already figured out. I have no OrientDB-specific setup but you can find some information on how to setup a pool of connections to a db in the documentation for Redix (a Redis client for Elixir). The principles are the same. This is the section in the documentation for Redix that covers pooling.

Related

How useful will JDBC be in an event-driven program?

I'm writing an event-driven architecture in Scala and I need to manage a database using it.
I was wondering if using JDBC, which only supports synchronous calls, would be a good solution to my problem?
I'd thought of writing an asynchronous wrapper for the calls to JDBC, but will it really tackle my concerns of the thread being blocked because of the database call?
This is really good question, and actually there's no single good answer to it.
It really depends on your database, its protocol and driver implementation. First of all, some databases, e.g. Cassandra, have asynchronous capabilities built into the protocol level. Should make it easier to work in the event-driven model, right? Not exactly - if you get gigabytes of data over slow connection you may block on the network level.
Other databases have only synchronous protocol and thus can block your resources, right? Not exactly - there are connection pools that prevent some of the issues with blocking.
So, depending on your application architecture and data you're accessing, you may need to isolate a layer of data access, that will wrap the JDBC connections and provide asynchronous capabilities. This layer would scale up and down depending on availability of open connections (e.g. actors holding the connection, and supervisor that will spawn new DB connection actors if there are no free databases, and create new threads using PinnedDispatcher).
In other cases, with specific drivers, you may stick with just JDBC wrapped into Future an hope that driver does it magic for you.
If you build large-scale application, you may want to even separate persistence access logic completely into, say, RabbitMQ, and use RPC for accessing the database.
As far as I know JDBC drivers are synchronous. Maybe you can design your system so your "main" actors asynchronously dispatch requests to background "JDBC" actors that deal with the JDBC driver?
I also try to implement fully asnyc architecture and DB calls are my bottleneck. I found that good idea is to use connection polled JDBC C3P0 driver. Here is usage example in Scala Slick framework link. Connection pooling is one of the solutions in which you have ready to use connections to your DB. It's better than spawning new connection for each request and after complition removing that connection. Here is full website od C3P0 project link

Play & Akka and blocking threads for database access

I want to make a call to a database which has lots of data and it might take a while to return.
I plan to do that work inside a call to Akka.future(f) and use an Async{} to render the response when the work is done.
Does it make sense to do that, or should I just do the long database call in the controller, without sending the work to Akka?
Or is there a way to do non blocking database access?
If you're forced to use a blocking driver for your database (if for some reason the async driver for MySQL doesn't work out) consider setting up an Actor pool (using routing) with a PinnedDispatcher.
The PinnedDispatcher provides a thread per actor and, by setting up the router, will give you the ability to adjust the number of threads strictly responsible for handling the database calls. Easy scaling. Also, by using Actors you can structure the messages between actors (e.g. a message having the results of the database call) a little easier.
You can use Akka.future(f) and provide your own Akka configuration file to get more threads to process your database accesses. Look at this config file for example.
But you pointed it out: the real problem is in using a database driver that blocks. I don't know which DB you are using, but it's worth to take a look to MongoDB with ReactiveMongo for example. With ReactiveMongo all MongoDB operations are perfectly non-blocking and asynchronous. There is a good introduction here. Moreover, it deals very well with Play Framework (check the ReactiveMongo Play Plugin).
EDIT: You can also check "Configuring Playframework's internal Akka system" to tune the worker threads number.
If the response is blocked on completion of the database call, then it's only useful to make it asynchronous if you can get other work done towards assembling the response while the call runs.
Non blocking database access could mean a couple different things: A client library that gives you a callback based API, which would be pretty similar to the future solution, or one that uses non-blocking sockets to save on thread usage. I'm assuming you mean the former, in which case I think it'd be functionally equivalent to using a future.

Is it good to put jdbc operations in actors?

I am building a traditional webapp that do database CRUD operations through JDBC. And I am wondering if it is good to put jdbc operations into actors, out of current request processing thread. I did some search but found no tutorials or sample applications that demo this.
So What are the cons and pros? Will this asynchonization improve the capacity of the appserver(i.e. the concurrent request processed) like nio?
Whether putting JDBC access in actors is 'good' or not greatly depends upon the rest of your application.
Most web applications today are synchronous, thanks to the Servlet API that underlies most Java (and Scala) web frameworks. While we're now seeing support for asynchronous servlets, that support hasn't worked its way up all frameworks. Unless you start with a framework that supports asynchronous processing, your request processing will be synchronous.
As for JDBC, JDBC is synchronous. Realistically there's never going to be anything done about that, given the burden that would place on modifying the gazillion JDBC driver implementations that are out in the world. We can hope, but don't hold your breath.
And the JDBC implementations themselves don't have to be thread safe, so invoking an operation on a JDBC connection prior to the completion of some other operation on that same connection will result in undefined behavior. And undefined behavior != good.
So my guess is that you won't see quite the same capacity improvements that you see with NIO.
Edit: Just discovered adbcj; an asynchronous database driver API. It's an experimental project written for a master's thesis, very early, experimental. It's a worthy experiment, and I hope it succeeds. Check it out!
But, if you are building an asynchronous, actor-based system, I really like the idea of having data access or repository actors, much in the same way your would have data acccess or repository objects in a layered OO architecture.
Actors guarantee that messages are processed one at a time, which is ideal for accessing a single JDBC connection. (One word of caution: most connection pools default to handing out connection-per-thread, which does not play well with actors. Instead you'll need to make sure that you are using a connection-per-actor. The same is true for transaction management.)
This allows you to treat the database like the asynchronous remote system we ought to have been treating it as all along. This also means that results from your data access/repository actors are futures, which are composable. This makes it easier to coordinate data access with other asynchronous activities.
So, is it good? Probably, if it fits within the architecture of the rest of your system. Will it improve capacity? That will depend on your overall system, but it sounds like a very worthy experiment.

Should I connect directly to CouchDB's socket and pass HTTP requests or use node.js as a proxy?

First, here's my original question that spawned all of this.
I'm using Appcelerator Titanium to develop an iPhone app (eventually Android too). I'm connecting to CouchDB's port directly by using Titanium's Titanium.Network.TCPSocket object. I believe it utilizes the Apple SDK's CFSocket/NSStream class.
Once connected, I simply write:
'GET /mydb/_changes?filter=app/myfilter&feed=continuous&gameid=4&heartbeat=30000 HTTP/1.1\r\n\r\n'
directly to the socket. It keeps it open "forever" and returns JSON data whenever the db is updated and matches the filter and change request. Cool.
I'm wondering, is it ok to connect directly to CouchDB's socket like this, or would I be better off opening the socket to node.js instead, and maybe using this CouchDB node.js module to handle the CouchDB proxy through node.js?
My main concern is performance. I just don't have enough experience with CouchDB to know if hitting its socket and passing faux HTTP requests directly is good practice or not. Looking for experience and opinions on any ramifications or alternate suggestions.
It's me again. :-)
CouchDB inherits super concurrency handling from Erlang, the language it was written in. Erlang uses lightweight processes and message passing between those processes to achieve excellent performance under high concurrent load. It will take advantage of all cpu cores, too.
Nodejs runs a single process and basically only does one thing at a time within that process. Its event-based, non-blocking IO approach does allow it to multitask while it waits for chunks of IO but it still only does one thing at a time.
Both should easily handle tens of thousands of connections, but I would expect CouchDB to handle concurrency better (and with less effort on your part) than Node. And keep in mind that Node adds some latency if you put it in front of CouchDB. That may only be noticeable if you have them on different machines, though.
Writing directly to Couch via TCPSocket is a-ok as long as your write a well-formed HTTP request that follows the spec. (You're not passing a faux request...that's a real HTTP request you're sending just like any other.)
Note: HTTP 1.1 does require you to include a Host header in the request, so you'll need to correct your code to reflect that OR just use HTTP 1.0 which doesn't require it to keep things simple. (I'm curious why you're not using Titanium.Network.HTTPClient. Does it only give you the request body after the request finishes or something?)
Anyway, CouchDB can totally handle direct connections and--unless you put a lot of effort into your Node proxy--it's probably going to give users a better experience when you have 100k of them playing the game at once.
EDIT: If you use Node write an actual HTTP proxy. That will run a lot faster than using the module you provided and be simpler to implement. (Rather than defining your own API that then makes requests to Couch you can just pass certain requests on to CouchDB and block others, say, for security reasons.
Also take a look at how "multinode" works:
http://www.sitepen.com/blog/2010/07/14/multi-node-concurrent-nodejs-http-server/

How should I implement bi-directional networking between an iPhone application and an Objective-C server-side application?

I'm looking for advice on the best way to implement some kind of bi-directional communication between a "server-side" application, written in Objective-C and running on a mac, and a client application running on an iPhone.
To cut a long story short, I'm adapting an existing library for use in a client-server environment. The library (which runs on the server) is basically a search engine which provides periodic results, and additionally can provide updates for any of those results at a later date. In an ideal world therefore I would be able to achieve the following with my hypothetical networking solution:
Start queries on the server.
Have the server "push" results to the client as they arrive.
Have the server "push" updates to individual results to the client as they arrive.
If I was writing this client to run on another Mac, I might well look at using Distributed Objects to mask the fact that the server was actually running remotely, but DO is not available on an iPhone.
If I was writing a more generic client-server application I would probably look at using HTTP to provide some kind of RESTful interface to searches, but this solution does not lend itself well to asynchronous updates and additionally what I am proposing does not fit well with the "stateless" tennet of REST: I would have to model my protocol so I "created" a search resource that I could subsequently query the state of and I would have to poll for updates to it.
One suggestion someone made was to make use of something like BLIP to provide me with a two-way pipe between the client and the server and implement my own "proxy" type objects for the server-side resources that knew how to fetch data from the server and additionally were addressable so that the server could push updates to them. Whilst BLIP provides the low-level messaging framework needed to communicate bi-directionally it still leaves me with a few questions:
How will I manage the lifetime of the objects on the server? I can have a message type that "creates" a search object, but when should that object be destroyed?
How well with this perform on an iPhone: if I have a persistent connection to the server will this drain the batteries too fast? This question is also pertinent in the HTTP world: most async updates are done using a COMET type hack which again requires a persistent connection.
So right now I'm still completely unsure what the best way to go is: I've done a lot of searching and reading but have not settled on any solution. I'm asking here on SO because I'm sure that there are many of you out there who have already solved this problem.
How have you gone about achieving real-time bidirectional networking between the iPhone and an Objective-C server-side app?