I've been using doobie (cats) to connect to a postgresql database from a scalatra application. Recently I noticed that the app was creating a new connection pool for every transaction. I eventually worked around it - see below, but this approach is quite different from that taken in the 'managing connections' section of the book of doobie, I was hoping someone could confirm whether it is sensible or whether there is a better way of setting up the connection pool.
Here's what I had initially - this works but creates a new connection pool on every connection:
import com.zaxxer.hikari.HikariDataSource
import doobie.hikari.hikaritransactor.HikariTransactor
import doobie.imports._
val pgTransactor = HikariTransactor[IOLite](
"org.postgresql.Driver",
s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}",
postgresDBUser,
postgresDBPassword
)
// every query goes via this function
def doTransaction[A](update: ConnectionIO[A]): Option[A] = {
val io = for {
xa <- pgTransactor
res <- update.transact(xa) ensuring xa.shutdown
} yield res
io.unsafePerformIO
}
My initial assumption was that the problem was having ensuring xa.shutdown on every request, but removing it results in connections quickly being used up until there are none left.
This was an attempt to fix the problem - enabled me to remove ensuring xa.shutdown, but still resulted in the connection pool being repeatedly opened and closed:
val pgTransactor: HikariTransactor[IOLite] = HikariTransactor[IOLite](
"org.postgresql.Driver",
s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}",
postgresDBUser,
postgresDBPassword
).unsafePerformIO
def doTransaction[A](update: ConnectionIO[A]): Option[A] = {
val io = update.transact(pgTransactor)
io.unsafePerformIO
}
Finally, I got the desired behaviour by creating a HikariDataSource object and then passing it into the HikariTransactor constructor:
val dataSource = new HikariDataSource()
dataSource.setJdbcUrl(s"jdbc:postgresql://${postgresDBHost}:${postgresDBPort}/${postgresDBName}")
dataSource.setUsername(postgresDBUser)
dataSource.setPassword(postgresDBPassword)
val pgTransactor: HikariTransactor[IOLite] = HikariTransactor[IOLite](dataSource)
def doTransaction[A](update: ConnectionIO[A], operationDescription: String): Option[A] = {
val io = update.transact(pgTransactor)
io.unsafePerformIO
}
You can do something like this:
val xa = HikariTransactor[IOLite](dataSource).unsafePerformIO
and pass it to your repositories.
.transact applies the transaction boundaries, like Slick's .transactionally.
E.g.:
def interactWithDb = {
val q: ConnectionIO[Int] = sql"""..."""
q.transact(xa).unsafePerformIO
}
Yes, the response from Radu gets at the problem. The HikariTransactor (the underlying HikariDataSource really) has internal state so constructing it is a side-effect; and you want to do it once when your program starts and pass it around as needed. So your solution works, just note the side-effect.
Also, as noted, I don't monitor SO … try the Gitter channel or open an issue if you have questions. :-)
Related
I have made a factory method which should either start a database (cassandra) and connect to it or should return an existing session. The connection to the database is static field.
class EmbeddedCassandraManager {
def getCassandra() = {
if(EmbeddedCassandraManager.cassandraConnection.isDefined) //return existing instance
{
(EmbeddedCassandraManager.testCassandra,EmbeddedCassandraManager.cassandraConnection)
}
else {
EmbeddedCassandraManager.startCassandra()
}
}
def closeCassandra() = {
EmbeddedCassandraManager.closeCassandra()
}
}
object EmbeddedCassandraManager {
val factory = new EmbeddedCassandraFactory
//can I do the logic without using var?
var (testCassandra,cassandraConnection) = startCassandra()
def closeCassandra() = {
cassandraConnection.get.close()
cassandraConnection = None
testCassandra.stop()
}
def startCassandra():(Cassandra,Option[CassandraConnection]) = {
val testCassandra = factory.create()
testCassandra.start()
val cassandraConnectionFactory:DefaultCassandraConnectionFactory = new DefaultCassandraConnectionFactory();
val localCassandraConnection:Option[CassandraConnection] = try{
val connection = cassandraConnectionFactory.create(testCassandra)
Some(connection)
}catch{
case exception:Throwable => {
throw exception
}
}
this.cassandraConnection = localCassandraConnection
(testCassandra,this.cassandraConnection)
}
}
The only way I am able to create the logic is by using a var for the cassandraConnection. Is there a pattern I can use to avoid using var?
In one of the test, I have to stop cassandra to test that the connection doesn't get established if database isn't running. This makes the existing connection stale. Without var, I am not able to set the value to None to invalidate the connection and set it to new value once the database connection is established again.
What is the functional way to create such logic? I need static value of connection so that only one connection is created and I want a way to check that the value is not stale.
Mutability is often unavoidable, because it is an inherent property of the systems we build. However, that doesn't mean that we have to use mutable variables in our code.
There are usually two main ways that you can deal with situations that involve mutable state:
Push the mutable state to a repository outside of your program.
Typical examples of this are "standard" database (if state needs to be persisted) and in-memory storage (if state exists for the duration of your program's lifecycle). Whenever you would fetch a value from such storage, you would treat it as an immutable value. Mutability still exists, but not inside your program, which makes it easier to reason about.
Some people criticize this line of thinking by saying "you are not solving anything, you're just making it some else's problem", and that's true actually. We are letting the database handle the mutability for us. Why not? It's what database is designed to do. Besides, main problem with mutability is reasoning about it, and we are not going to reason about internal implementation of the database. So pushing the mutability from one of our services to another is indeed like throwing the hot potato around, but pushing it to an external system that's designed for it is completely fine.
However, all that being said, it doesn't help your case, because it's not really elegant to store database connection objects in an external storage. Which takes me to point number two.
Use state monad.
If the word "monad" raises some flags for you, pretend I said "use State" (it's quite a simple concept actually, no big words needed). I will be using the implementation of State available in the Cats library, but it exists in other FP libraries as well.
State is a function from some existing state to a new state and some produced value:
S => (S, V)
By going from an existing state to a new state, we achieve the "mutation of state".
Example 1:
Here's some code that uses an integer state which gets incremented by one and produces a string value every time the state changes:
import cats.data.State
val s: State[Int, String] = State((i: Int) => (i + 1, s"Value: $i"))
val program = for {
produced1 <- s
_ = println(produced1) // Value: 42
produced2 <- s
_ = println(produced2) // Value: 43
produced3 <- s
_ = println(produced3) // Value: 44
} yield "Done."
program.run(42).value
That's the gist of it.
Example 2:
For completeness, here's a bigger example which demonstrates a use case similar to yours.
First, let's introduce a simplified model of CassandraConnection (this is just for the sake of example; real object would come from the Cassandra library, so no mutability would exist in our own code).
class CassandraConnection() {
var isOpen: Boolean = false
def connect(): Unit = isOpen = true
def close(): Unit = isOpen = false
}
How should we define the state? Mutable object is obviously the CassandraConnection, and the result value which will be used in for-comprehension could be a simple String.
import cats.data.State
type DbState = State[CassandraConnection, String]
Now let's define some functions for manipulating the state using an existing CassandraConnection object.
val openConnection: DbState = State(connection => {
if (connection.isOpen) {
(connection, "Already connected.")
} else {
val newConnection = new CassandraConnection()
newConnection.connect()
(newConnection, "Connected!")
}
})
val closeConnection: DbState = State(connection => {
connection.close()
(connection, "Closed!")
})
val checkConnection: DbState =
State(connection => {
if (connection.isOpen) (connection, "Connection is open.")
else (connection, "Connection is closed.")
})
And finally, let's play with these functions in the main program:
val program: DbState =
for {
log1 <- checkConnection
_ = println(log1) // Connection is closed.
log2 <- openConnection
_ = println(log2) // Connected!
log3 <- checkConnection
_ = println(log3) // Connection is open.
log4 <- openConnection
_ = println(log4) // Already connected.
log5 <- closeConnection
_ = println(log5) // Closed!
log6 <- checkConnection
_ = println(log6) // Connection is closed.
} yield "Done."
program.run(new CassandraConnection()).value
I know this is not exact code that you could copy/paste into your project and have it work nicely, but I wanted to give a slightly more general answer that might be a bit easier to understand for other readers. With some playing around, I'm sure you can shape it into your own solution. As long as your main program is a for-comprehension on the State level, you can easily open and close your connections and (re)use the same connection objects.
What did we really achieve with this solution? Why is this better than just having a mutable CassandraConnection value?
One big thing is that we achieve referential transparency, which is why this pattern fits into functional programming paradigm nicely, and standard mutability doesn't. Since this answer is already getting a bit long, I will point you towards Cats documentation which explains the whole thing in more detail and demonstrates the benefit of using State very nicely.
I have created a simple test code (using ScalaTest). The find searching in database using Solr.
implicit override val patienceConfig =
PatienceConfig(timeout = scaled(Span(30, Seconds)), interval = scaled(Span(20, Seconds)))
for {
...
saved <- service.save(...)
result <- eventually {
service.find(...)
}
} yield result
But in 9/10 cases it ends with timeout. When I rewrite this code into:
implicit override val patienceConfig =
PatienceConfig(timeout = scaled(Span(30, Seconds)))
for {
...
saved <- service.save(...)
result <- eventually {
Thread.sleep(20000)
service.find(...)
}
} yield result
It works much better, but also I got timeouts sometimes (less than in first example). Do you know where the problem could be? I thought that something with eventually works different I thought, but I do not know what.
Also - is any way to make this code always run without timeouts? I prefer to use eventually, but would like to increase results without timeout.
I understand that it is possible that searching something in database could return timeouts, but I would like to have as most stable test as possible.
I am trying to access ignite cache values from spark map operation
Ignite grid name thread local must be set or this method should be accessed under org.apache.ignite.thread.IgniteThread
I have exact same problem, and tried some method suggested by the person who asked the same question
val cache = ignite.getOrCreateCache[String,String]("newCache")
val cache_value = cache.get("key")
val myTransformedRdd = myRdd.map { x =>println(cache_value)}.take(2)
This is my sample code, I understood that, when we initiates ignite(Ignition.start()), it may only initiates in spark driver, but spark executes in executors. So in some executors the ignite may not be initiated.
So I tried this also,
val myTransformedRdd = myRdd.map { x =>
if(Ignition.state.toString=="STOPPED")
{
Ignition.start("/etc/ignite/examples/config/example-ignite1.xml")
}
println(cache_value)
}
From this I got the same error.
It seems, that ignite in your sample is taken from the outer scope somewhere, outside the mapper function. Make sure, that you don't try to send this object over the network.
In your example you use cache_value taken from the driver's context. Your mapper function should look something like
val myTransformedRdd = rdd.map { _ =>
val igniteCfg = Ignition.loadSpringBean("/etc/ignite/examples/config/example-ignite1.xml", "ignite.cfg")
val ignite = Ignition.getOrStart(igniteCfg)
val cache = ignite.getOrCreateCache[String,String]("newCache")
val cacheValue = cache.get("key")
println(cacheValue)
}
Note, that example-ignite1.xml file should have a defenition of a ignite.cfg bean of type IgniteConfiguration.
In my application, I have to interact (read-only) with multiple MySQL DBs one-by-one. For each DB, I need a certain no of connections. Interactions with a DB do not occur in a single stretch: I query the DB, take some time processing the results, again query the DB, again process the result and so on.
Each one of these interactions require multiple connections [I fire multiple queries concurrently], hence I need a ConnectionPool that spawns when I start interacting with the DB and lives until I'm done with all queries to that DB (including the interim time intervals when I'm not querying, only processing the results).
I'm able to successfully create a ConnectionPool with desired no of connections and obtain the implicit session as shown below
def createConnectionPool(poolSize: Int): DBSession = {
implicit val session: AutoSession.type = AutoSession
ConnectionPool.singleton(
url = "myUrl",
user = "myUser",
password = "***",
settings = ConnectionPoolSettings(initialSize = poolSize)
)
session
}
I then pass this implicit session throughout the methods where I need to interact with DB. That ways, I'm able to fire poolSize no of queries concurrently using this session. Fair enough.
def methodThatCallsAnotherMethod(implicit session: DBSession): Unit = {
...
methodThatInteractsWithDb
...
}
def methodThatInteractsWithDb(implicit session: DBSession): Unit = {
...
getResultsParallely(poolSize = 32, fetchSize = 2000000)
...
}
def getResultsParallely(poolSize: Int, fetchSize: Int)(implicit session: DBSession): Seq[ResultClass] = {
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
implicit val ec: ExecutionContext = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(poolSize))
val resultsSequenceFuture: Seq[Future[ResultClass]] = {
(0 until poolSize).map { i =>
val limit: Long = fetchSize
val offset: Long = i * fetchSize
Future(methodThatMakesSingleQuery(limit, offset))
}
}
val resultsFutureSequence: Future[Seq[ResultClass]] = Future.sequence(resultsSequenceFuture)
Await.result(resultsFuture, 2.minutes)
}
There are 2 problems with this technique:
My application is quite big and has many nested method calls, so passing implicit session through all methods like this (see below) isn't feasible.
In addition to the said interactions with different DBs one-by-one, I also need a single connection to another (fixed) DB throughout the lifetime of my entire application. This connection would be used to make a small write operation (logging the progress of my interactions with other DBs) after every few minutes. Therefore, I need multiple ConnectionPools, one for each DB
From what I could make out of ScalikeJdbc's docs, I came up with following way of doing it that doesn't require me to pass the implicit session everywhere.
def createConnectionPool(poolName: String, poolSize: Int): Unit = {
ConnectionPool.add(
name = poolName,
url = "myUrl",
user = "myUser",
password = "***",
settings = ConnectionPoolSettings(initialSize = poolSize)
)
}
def methodThatInteractsWithDb(poolName: String): Unit = {
...
(DB(ConnectionPool.get(poolName).borrow())).readOnly { implicit session: DBSession =>
// interact with DB
...
}
...
}
Although this works, but I'm no longer able to parallelize the db-interaction. This behaviour is obvious since I'm using the borrow() method, that gets a single connection from the pool. This, in turn, makes me wonder why that AutoSession thing worked earlier: why was I able to fire multiple queries simultaneously using a single implicit session? And if that thing worked, then why doesn't this work? But I find no examples of how to obtain a DBSession from a ConnectionPool that supports multiple connections.
To sum up, I have 2 problems and 2 solutions: one for each problem. But I need a single (commmon) solution that solves both the problems.
ScalikeJdbc's limited docs aren't offering a lot of help and blogs / articles on ScalikeJdbc are practically non-existent.
Please suggest the correct way / some work-around.
Framework versions
Scala 2.11.11
"org.scalikejdbc" %% "scalikejdbc" % "3.2.0"
Thanks to #Dennis Hunziker, I was able to figure out the correct way to release connections borrowed from ScalikeJdbc's ConnectionPool. It can be done as follows:
import scalikejdbc.{ConnectionPool, using}
import java.sql.Connection
using(ConnectionPool.get("poolName").borrow()) { (connection: Connection) =>
// use connection (only once) here
}
// connection automatically returned to pool
With this, now I'm able to parallelize interaction with the pool.
To solve my problem of managing several ConnectionPools and using connections across several classes, I ended up writing a ConnectionPoolManager, complete code for which can be found here. By offloading the tasks of
creating pools
borrowing connections from pools
removing pools
to a singleton object that I could use anywhere across my project, I was able to clear a lot of clutter and eliminated the need pass implicit session across chain of methods.
EDIT-1
While I've already linked the complete code for ConnectionPoolManager, here's a quick hint of how you can go about it
Following method of ConnectionPoolManager lets you borrow connections from ConnectionPools
def getDB(dbName: String, poolNameOpt: Option[String] = None): DB = {
// create a pool for db (only) if it doesn't exist
addPool(dbName, poolNameOpt)
val poolName: String = poolNameOpt.getOrElse(dbName)
DB(ConnectionPool.get(poolName).borrow())
}
Thereafter, throughout your code, you can use the above method to borrow connections from pools and make your queries
def makeQuery(dbName: String, poolNameOpt: Option[String]) = {
ConnectionPoolManager.getDB(dbName, poolNameOpt).localTx { implicit session: DBSession =>
// perform ScalikeJdbc SQL query here
}
}
I'm using scrooge + thrift to generate my server and client code. Everything is working just fine so far.
Here's a simplified example of how I use my client:
private lazy val client =
Thrift.newIface[MyPingService[Future]](s"$host:$port")
def main(args: Array[String]): Unit = {
logger.info("ping!")
client.ping().foreach { _ =>
logger.info("pong!")
// TODO: close client
sys.exit(0)
}
}
Everything is working just fine, but the server complains when the program exits about unclosed connections. I've looked all over but I can't seem to figure out how to close the client instance.
So my question is, how do you close a Finagle thrift client? I feel like I'm missing something obvious.
As far as I know, when you use the automagic Thrift.newIface[Iface] method to create your service, you can't close it, because the only thing that your code knows about the resulting value is that it conforms to Iface. If you need to close it, you can instantiate your client in two steps, creating the Thrift service in one and adapting it to your interface in the other.
Here's how it looks if you're using Scrooge to generate your Thrift interface:
val serviceFactory: ServiceFactory[ThriftClientRequest,Array[Byte]] =
Thrift.newClient(s"$host:$port")
val client: MyPingService[Future] =
new MyPingService.FinagledClient(serviceFactory.toService)
doStuff(client).ensure(serviceFactory.close())
I tried this in the repl, and it worked for me. Here's a lightly-edited transcript:
scala> val serviceFactory = Thrift.newClient(...)
serviceFactory: ServiceFactory[ThriftClientRequest,Array[Byte]] = <function1>
scala> val tweetService = new TweetService.FinagledClient(serviceFactory.toService)
tweetService: TweetService.FinagledClient = TweetService$FinagledClient#20ef6b76
scala> Await.result(tweetService.getTweets(GetTweetsRequest(Seq(20))))
res7: Seq[GetTweetResult] = ... "just setting up my twttr" ...
scala> serviceFactory.close
res8: Future[Unit] = ConstFuture(Return(()))
scala> Await.result(tweetService.getTweets(GetTweetsRequest(Seq(20))))
com.twitter.finagle.ServiceClosedException
This is not too bad, but I hope there's a better way that I don't know yet.
I havent used finagle, but according to Finagle documentation
val product = client().flatMap { service =>
// `service` is checked out from the pool.
service(QueryRequest("SELECT 5*5 AS `product`")) map {
case rs: ResultSet => rs.rows.map(processRow)
case _ => Seq.empty
} ensure {
// put `service` back into the pool.
service.close()
}
}
couldn’t you adopt similar strategy
client.ping().foreach { service =>
logger.info("pong!")
// TODO: close client
service.close()
sys.exit(0)
}