How can I safely return an object to a pool after asynchronous usage? - scala

Given I have a pool of objects, how can I safely return an an pooled object back to the pool after usage if the usage is asynchronous? (using Future and Promise in this case)
Here's an example:
pool
.take
.flatMap {
connection =>
connection
.sendQuery("SELECT 0")
.map {
query =>
pool.giveBack(connection)
query.rows.get(0, 0)
}
}
The problem here is that if the sendQuery call fails, the object will never be returned to the pool. Is there some kind of pipeline sequence for futures what would allow me to safely return this object to the pool even if the code itself fails to do so or should I just ignore this?
The pool implementation is this one and the pooled object is this one.
My main objective here is to make the pool usage be as little error prone as possible and as it stands currently it's clearly not that, since the programmer could forget to return the object and the pool would quickly exhaust itself.

You're looking for the "andThen" method on Future.

Related

How can I convert my mgo sessions to mongo-go-driver clients using connection pooling?

Long, long ago, when we were using mgo.v2, we created some wrapper functions that copied the session, set the read pref and returned that for consumption by other libraries, e.g.
func NewMonotonicConnection() (conn *Connection, success bool) {
conn := &Connection{
session: baseSession.Copy(),
}
conn.session.SetMode(mongo.Monotonic, true)
return conn, true
}
We now just pass the default client (initialized using mongo.Connect and passed into a connection singleton) in an init function and then consumed like this:
func NewMonotonicConnection() (conn *Connection, success bool) {
conn = defaultConnection
return conn, true
}
My understanding is that to leverage connection pooling, you need to use the same client (which is contained in defaultConn), and session is now implicitly handled inside of the .All()/cursor teardown. Please correct me if I'm wrong here.
It would be nice if we could still set the readpref on these connections (e.g. set NearestMode on this connection before returning), but what's the community/standard way of doing that?
I know I could call mongo.Connect over and over again, but is that expensive?
I could create different clients - each client with a different readpref - but I was thinking that if a write occurred on that connection, it wouldn't ever go back to reading from a slave.
It looks like I *can create sessions explicitly, but I'm not certain I should or if there are any implications around managing those explicitly in the new driver.
There are a couple things I learned on this quest through the mongo-go-driver codebase that I thought I should share with the world before closing this question. If I'm wrong here - please correct me.
You should not call Connect() over and over if you want to leverage connection pooling. It looked like each time Connect() was called, a new socket was created. This means that there's a risk of socket exhaustion over time unless you are manually defer Close()-ing it each time.
In mongo-go-driver, sessions are automatically handled under the covers now when you make the call to execute the query (e.g. All()). You can* explicitly create and teardown a session, but you can't consume it using the singleton approach I proposed above without having to change all the caller functions.
This is because you can no longer call query operations on the session, you instead have to consume it using a WithSession function at the DB operation itself
I realized that writeconcern, readpref and readconcern can all be set at the:
client level (these are the defaults that everything will use if not overridden)
session level
database level
query level
So what I did is create Database options and overloaded *mongo.Database e.g.:
// Database is a meta-helper that allows us to wrap and overload
// the standard *mongo.Database type
type Database struct {
*mongo.Database
}
// NewEventualConnection returns a new instantiated Connection
// to the DB using the 'Nearest' read preference.
// Per https://github.com/go-mgo/mgo/blob/v2/session.go#L61
// Eventual is the same as Nearest, but may change servers between reads.
// Nearest: The driver reads from a member whose network latency falls within
// the acceptable latency window. Reads in the nearest mode do not consider
// whether a member is a primary or secondary when routing read operations;
// primaries and secondaries are treated equivalently.
func NewEventualConnection() (conn *Connection, success bool) {
conn = &Connection{
client: baseConnection.client,
dbOptions: options.Database().
SetReadConcern(readconcern.Local()).
SetReadPreference(readpref.Nearest()).
SetWriteConcern(writeconcern.New(
writeconcern.W(1))),
}
return conn, true
}
// GetDB returns an overloaded Database object
func (conn Connection) GetDB(dbname string) *Database {
dbByName := &Database{conn.client.Database(dbname, conn.dbOptions)}
}
This allows me to leverage connection pooling and maintain backwards compatibility with our codebase. Hopefully this helps someone else.

Spark Object (singleton) serialization on executors

I am not sure that what I want to achieve is possible. What I do know is I am accessing a singleton object from an executor to ensure it's constructor has been called only once on each executor. This pattern is already proven and works as expected for similar use cases in my code base.
However, What I would like to know is if I can ship the object after it has been initialized on the driver. In this scenario,
when accesing ExecutorAccessedObject.y, ideally it would not call the println but just return the value. This is a highly simplified version, in reality, I would like to make a call to some external system on the driver, so when accessed on the executor, it will not re-call that external system. I am ok with #transient lazy val x to be reinitialized once on the executors, as that will hold a connection pool which cannot be serialized.
object ExecutorAccessedObject extends Serializable {
#transient lazy val x: Int = {
println("Ok with initialzing this on the executor. I.E. database connection pool")
1
}
val y: Int = {
// call some external system to return a value.
// I do not want to call the external system from the executor
println(
"""
|Idealy, this would not be printed on the executor.
|return value 1 without re initializing
""")
1
}
println("The constructor will be initialized Once on each executor")
}
someRdd.mapPartitions { part =>
ExecutorAccessedObject
ExecutorAccessedObject.x // first time accessed should re-evaluate
ExecutorAccessedObject.y // idealy, never re-evaluate and return 1
part
}
I attempted to solve this with broadcast variables as well, but I am unsure how to access the broadcast variable within the singleton object.
What I would like to know is if I can ship the object after it has been initialized on the driver.
You cannot. Objects, as singletons, are never shipped to executors. There initialized locally, whenever objects is accessed for the first time.
If the result of the call is serializable, just pass it alone, either as an arguments to the ExecutorAccessedObject (implicitly or explicitly) or making ExecutorAccessedObject mutable (and adding required synchronization).

How to call a method in a catch clause on an object defined in a try clause?

I am creating a redis pubsub client in a try-catch block. In the try block, the client is initialised with a callback to forward messages to a client. If there's a problem sending the message to the client, an exception will be thrown, in which case I need to stop the redis client. Here's the code:
try {
val redisClient = RedisPubSub(
channels = Seq(currentUserId.toString),
patterns = Seq(),
onMessage = (pubSubMessage: PubSubMessage) => {
responseObserver.onValue(pubSubMessage.data)
}
)
}
catch {
case e: RuntimeException =>
// redisClient isn't defined here...
redisClient.unsubscribe(currentUserId.toString)
redisClient.stop()
messageStreamResult.complete(Try(true))
responseObserver.onCompleted()
}
The problem is that the redis client val isn't defined in the catch block because there may have been an exception creating it. I also can't move the try-catch block into the callback because there's no way (that I can find) of referring to the redisClient object from within the callback (this doesn't resolve).
To solve this I'm instantiating redisClient as a var outside the try-catch block. Then inside the try block I stop the client and assign a new redisPubSub (created as above) to the redisClient var. That's an ugly hack which is also error prone (e.g. if there genuinely is a problem creating the second client, the catch block will try to call methods on an erroneous object).
Is there a better way of writing this code so that I can correctly call stop() on the redisClient if an exception is raised when trying to send the message to the responseObserver?
Update
I've just solved this using promises. Is there a simpler way though?
That exception handler is not going to be invoked if there is a problem sending the message. It is for problems in setting up the client. This SO answer talks about handling errors when sending messages.
As for the callback referring to the client, I think you want to register the callback after creating the client rather than trying to pass the callback in when you create it. Here is some sample code from Debashish Ghosh that does this.
Presumably that callback is going to run in another thread, so if it uses redisClient you'll have to be careful about concurrency. Ideally the callback could get to the client object through some argument. If not, then perhaps using volatile would be the easiest way to deal with that, although I suspect you'd eventually get into trouble if multiple callbacks can fail at once. Perhaps use an actor to manage the client connection, as Debashish has done?

akka: sharing mutable state

I need to have one global variable (singleton) that will change very infrequently. Actually it only changes when the actor restarts, and reinitialize the variable. Since I cannot do this with singleton val in companion object, I have to declare it as a var (mutable).
object UserDatabase {
var dbConnection = "" // initializing db connection
}
Many guidelines that I read always go against sharing a mutable state. So I move the variable to class and use message passing to retrieve the variable.
class UserDatabase extends Actor{
val dbConnection = "" // initializing db connection locally
def receive = {case GetConnection => self.reply(dbConnection)}
}
Problem is, dbConnection is accessed very frequently by many .. many actors, and continuously sending message will reduce performance (since akka process mailbox one by one).
I don't see how I can do this without sacrificing performance. Any idea?
Perhaps use an Agent instead? http://akka.io/docs/akka/1.2-RC6/scala/agents.html
First of all, have you actually measure/notice performance reduction ? Since messaging is lightweight, perhaps it's fast enough for your application.
Then, a possible solution: If the "global" state is written rarely, but accessed very often, you can choose a push strategy. Every time it changes, the UserDatabase actor will send the updated value to interested actors. You can then use a publish/subscribe approach, rely on the actor register, use a pool of actors, etc.
class UserDatabase extends Actor{
var dbConnection = "" // initializing db connection locally
def receive = {
case SetConnection( newConnection ) if dbConnection != newConnection => {
dbConnection = newConnection
sendUpdatedConnection(); // sends the change to every relevant actor
}
}
}
If you don't need to use the variable very often in any case, it might be simpler and more efficient to make it a java.lang.concurrent.atomic.AtomicReference or wrap every access of it in a synchronized block (on the variable). Actors don't always make things easier and safer, just usually.
Create many actors as routees of a RoundRobinRouter.
Make each actor handle a connection and actually handling the DB logic.

Class Design: Demeter vs. Connection Lifetimes

Okay, so here's a problem I'm running into.
I have some classes in my application that have methods that require a database connection. I am torn between two different ways to design the classes, both of which are centered around dependency injection:
Provide a property for the connection that is set by the caller prior to method invocation. This has a few drawbacks.
Every method relying on the connection property has to validate that property to ensure that it isn't null, it's open and not involved in a transaction if that's going to muck up the operation.
If the connection property is unexpectedly closed, all the methods have to either (1.) throw an exception or (2.) coerce it open. Depending on the level of robustness you want, either case is appropriate. (Note that this is different from a connection that is passed to a method in that the reference to the connection exists for the lifetime of the object, not simply for the lifetime of the method invocation. Consequently, the volatility of the connection just seems higher to me.)
Providing a Connection property seems (to me, anyway) to scream out for a corresponding Transaction property. This creates additional overhead in the documentation, since you'd have to make it fairly obvious when the transaction was being used, and when it wasn't.
On the other hand, Microsoft seems to favor the whole set-and-invoke paradigm.
Require the connection to be passed as an argument to the method. This has a few advantages and disadvantages:
The parameter list is naturally larger. This is irksome to me, primarily at the point of call.
While a connection (and a transaction) must still be validated prior to use, the reference to it exists only for the duration of the method call.
The point of call is, however, quite clear. It's very obvious that you must provide the connection, and that the method won't be creating one behind your back automagically.
If a method doesn't require a transaction (say a method that only retrieves data from the database), no transaction is required. There's no lack of clarity due to the method signature.
If a method requires a transaction, it's very clear due to the method signature. Again, there's no lack of clarity.
Because the class does not expose a Connection or a Transaction property, there's no chance of callers trying to drill down through them to their properties and methods, thus enforcing the Law of Demeter.
I know, it's a lot. But on the one hand, there's the Microsoft Way: Provide properties, let the caller set the properties, and then invoke methods. That way, you don't have to create complex constructors or factory methods and the like. Also, avoid methods with lots of arguments.
Then, there's the simple fact that if I expose these two properties on my objects, they'll tend to encourage consumers to use them in nefarious ways. (Not that I'm responsible for that, but still.) But I just don't really want to write crappy code.
If you were in my shoes, what would you do?
Here is a third pattern to consider:
Create a class called ConnectionScope, which provides access to a connection
Any class at any time, can create a ConnectionScope
ConnectionScope has a property called Connection, which always returns a valid connection
Any (and every) ConnectionScope gives access to the same underlying connection object (within some scope, maybe within the same thread, or process)
You then are free to implement that Connection property however you want, and your classes don't have a property that needs to be set, nor is the connection a parameter, nor do they need to worry about opening or closing connections.
More details:
In C#, I'd recommend ConnectionScope implement IDisposable, that way your classes can write code like "using ( var scope = new ConnectionScope() )" and then ConnectionScope can free the connection (if appropriate) when it is destroyed
If you can limit yourself to one connection per thread (or process) then you can easily set the connection string in a [thread] static variable in ConnectionScope
You can then use reference counting to ensure that your single connection is re-used when its already open and connections are released when no one is using them
Updated: Here is some simplified sample code:
public class ConnectionScope : IDisposable
{
private static Connection m_Connection;
private static int m_ReferenceCount;
public Connection Connection
{
get
{
return m_Connection;
}
}
public ConnectionScope()
{
if ( m_Connection == null )
{
m_Connection = OpenConnection();
}
m_ReferenceCount++;
}
public void Dispose()
{
m_ReferenceCount--;
if ( m_ReferenceCount == 0 )
{
m_Connection.Dispose();
m_Connection = null;
}
}
}
Example code of how one (any) of your classes would use it:
using ( var scope = new ConnectionScope() )
{
scope.Connection.ExecuteCommand( ... )
}
I would prefer the latter method. It sounds like your classes use the database connection as a conduit to the persistence layer. Making the caller pass in the database connection makes it clear that this is the case. If the connection/transaction were represented as a property of the object, then things are not so clear and all of the ownership and lifetime issues come out. Better to avoid them from the start.