Slick Database Instantiation And Connection Pool Logic - scala

I am instantiating a slick database with code similar to
import slick.jdbc.JdbcBackend.Database
val db : Database = Database forConfig "configPath"
The query is constructed from a function that takes in a user id and returns a user name from the database table:
def queryName(userId : String) =
for {
row <- TableQuery[Tables.MyTable] if row.userid === userId
} yield row.username
And then running the query to produce distinct Publisher values:
val p1 : Publisher[String] = db stream (queryName("foo").result)
val p2 : Publisher[String] = db stream (queryName("bar").result)
Finally, my question is: Do multiple calls to db.stream utilize the same connection in the connection pool?
In other words, once I've instantiated the database is that the same as locking in on a single connection?
The implication would be that true utilization of all connections in the pool would require a function to create Database values before querying:
//Is this necessary?
val db = () => Database forConfig "configPath"
val p1 = db() stream (queryName("foo").result)
Thank you in advance for your consideration and response

According to Slick documentation about database thread pool:
When using Database.forConfig, the thread pool is configured directly in the external configuration file together with the connection parameters.
My hypothesis here is that you are using a connection pool (which is always recommended in production environments) and you have configured it properly in the external configuration file (the one referred by configPath).
You don't have to worry about database connections since your Database object (your db) is managing that for you.
Each call to db.stream() actually uses (and withdraw) a connection from the pool (eventually opening a new one according to the pool size and configuration) and releases it afterwards back into the pool.
For further details on how a connection pool works and how to configure (e.g. size) it in slick can be found at connection-pools.
An addictional note from adding-slick-to-your-project:
If you want to use Slick’s connection pool support, you need to add HikariCP as a dependency.

Related

How to perform JDBC authentication with Scala's Slick library?

Intro
I have a microservice written in Akka HTTP which communicates with the frontend via an REST API.
I use Scala's Slick library to communicate with the PostgreSQL DBMS.
I want secure the API with HTTP's basic authentication mechanism (the frontend sends an username and password to the microservice in each request).
The extract credentials from each HTTP API call I want to add into Slick's actions in order to ensure via "JDBC Authentication" that the user is authorized to manipulate a given set of data.
Problem
While its obvious for me how to accomplish this with plain Java's DriverManager by creating each new connection with a set of credentials and using this connection in order to perform future operations, the Slick library offers something like this:
val db = Database.forURL(jdbcUrl, superuser, password)
but the documentation states that only one such instance should be created for the whole life of an application:
Note: A Database object usually manages a thread pool and a connection pool. You should always shut it down properly when it is no longer needed (unless the JVM process terminates anyway). Do not create a new Database for every database operation. A single instance is meant to be kept alive for the entire lifetime your your application.
So I am stick to the one db singleton which always runs all actions with the superuser credentials.
I am new to Slick and I don't know how to add an additional username and password in order to send it together with an action (SQL, DDL statement) to the DBMS that can verify if the user is permit to perform this action. Here is an example of what I want to achieve:
override def removeDatabase(name: String): Future[Unit] =
db.run(sqlu"""DROP DATABASE IF EXISTS #${name.toLowerCase};""").map(_ => ())
To this drop action I would like add the credentials extract from an HTTP request in order to get an JDBC authentication to ensure that the current user performing this action is authorized to delete this database.
What I have did so far?
According to the Slick documentation:
In order to drop down to the JDBC level for functionality that is not available in Slick, you can use a SimpleDBIO action which is run on a database thread and gets access to the JDBC Connection:
val getAutoCommit = SimpleDBIO[Boolean](_.connection.getAutoCommit)
I tried this:
override def removeDatabase(name: String): Future[Unit] = {
val dropStatement = SimpleDBIO[Unit] { session => {
val currentConnection = session.connection
currentConnection.setClientInfo("username", "testUsername")
currentConnection.setClientInfo("password", "invalidPwd")
println(">>> username: " + currentConnection.getClientInfo("username"))
println(">>> password: " + currentConnection.getClientInfo("password"))
sqlu"""DROP DATABASE IF EXISTS #${name.toLowerCase};"""
}
}
db.run(dropStatement).map(_ => ())
}
But unfortunately it doesn't have any impact, the passed credentials to the client info are ignored:
>>> username: null
>>> password: null
You can create a custom Database like this:
import slick.jdbc.PostgresProfile.api._
val db: Database = Database.forConfig("myDatabase")
This uses the appropriate config from application.conf:
myDatabase = {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "mydata"
user = "postgres"
password = "postgres"
}
numThreads = 4
}

Play Framework 2.5 Requests timing out randomly

Symptom
After some time of running just fine our backend will stop giving responses for most of its endpoints. It will just start behaving like a blackhole for those. Once in this state, it will stay there if we don't take any action.
Update
We can reproduce this behaviour with a db dump we made when the backend was in the non responding state.
Infrastructure Setup
We are running Play 2.5 in AWS on an EC2 instance behind a loadbalancer with a PostgreSQL database on RDS. We are using slick-pg as our database connector.
What we know
Here a few things we figured out so far.
About the HTTP requests
Our logs and debugging shows us that the requests are passing the filters. Also, we see that for the authentication (we are using Silhoutte for that) the application is able to perform database queries to receive the identity for that request. The controllers action will just never be called, though.
The backend is responing for HEAD requests. Further logging showed us that it seems that Controllers using injected services (we are using googles guice for that) are the ones whose methods are not being called anymore. Controllers without injected services seem to work fine.
About the EC2 instance
Unfortunately, we are not able to get much information from that one. We are using boxfuse which provides us with an immutable and by that in-ssh-able infrastructure. We are about to change this to a docker based deployment and might be able to provide more information soon. Nevertheless, we have New Relic setup to monitor our servers. We cannot find anything suspicious there. The memory and CPU usages look fine.
Still, this setup gives us a new EC2 instance on every deployment anyway. And even after a redeployment the issue persists at least for most of the times. Eventually it is possible to resolve this with a redeployment.
Even more weird is the fact that we can run the backend locally connected to the Database on AWS and everything will just work fine there.
So it is hard to say for us where the problem lies. It seems the db is not working with any EC2 instance (until it will work with a new one eventually) but with our local machines.
About the Database
The db is the only stateful entity in this setup, so we think the issue somehow should be related to it.
As we have a production and a staging environment, we are able to dump the production db into staging when the later is not working anymore. We found that this indeed resolves the issue immediately. Unfortunately, we were not able to take a snapshot from a somehow corrupt database to dump it into the staging environment and see whether this will break it immediately. We have a snapshot of the db when the backend was not responding anymore. When we dump this to our staging environment the backend will stop responding immediately.
The number of connections to the DB is around 20 according to the AWS console which is normal.
TL;DR
Our backend starts behaving like a blackhole for some of its endpoints eventually
The requests are not reaching the controller actions
A new instance in EC2 might resolve this, but not necessarily
locally with the very same db everything is working fine
dumping a working db into it resolves the issue
CPU and memory usages of the EC2 instances as well as the number of connections to the DB look totally fine
We can reproduce the behaviour with a db dump we made when the backend was not responding anymore (see Update 2)
with new slick threadpool settings, we would get ThreadPoolExecutor exceptions from slick after a reboot of the db with a reboot of our ec2 instance afterwards. (see Update 3)
Update 1
Responding to marcospereira:
Take for instance this ApplicationController.scala:
package controllers
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import akka.actor.ActorRef
import com.google.inject.Inject
import com.google.inject.name.Named
import com.mohiva.play.silhouette.api.Silhouette
import play.api.i18n.{ I18nSupport, MessagesApi }
import play.api.mvc.Action
import play.api.mvc.Controller
import jobs.jobproviders.BatchJobChecker.UpdateBasedOnResourceAvailability
import utils.auth.JobProviderEnv
/**
* The basic application controller.
*
* #param messagesApi The Play messages API.
* #param webJarAssets The webjar assets implementation.
*/
class ApplicationController #Inject() (
val messagesApi: MessagesApi,
silhouette: Silhouette[JobProviderEnv],
implicit val webJarAssets: WebJarAssets,
#Named("batch-job-checker") batchJobChecker: ActorRef
)
extends Controller with I18nSupport {
def index = Action.async { implicit request =>
Future.successful(Ok)
}
def admin = Action.async { implicit request =>
Future.successful(Ok(views.html.admin.index.render))
}
def taskChecker = silhouette.SecuredAction.async {
batchJobChecker ! UpdateBasedOnResourceAvailability
Future.successful(Ok)
}
}
The index and admin are just working fine. The taskchecker will show the weird behaviour, though.
Update 2
We are able to reproduce this issue now! We found that we made a db dump the last time our backend was not responding anymore. When we dump this into our staging database, the backend will stop responding immediately.
We started logging the number of threads now in one of our filters using Thread.getAllStackTraces.keySet.size and found that there are between 50 and 60 threads running.
Update 3
As #AxelFontaine suggested we enabled MultiAZ Deployment failover for the database. We rebooted the database with failover. Before, during and after the reboot the backend was not responding.
After the reboot we noticed that the number of connections to the db stayed 0. Also, we did not get any logs for authentication anymore (before we did so, the authentication step could even make db requests and would get responses).
After a rebooting of the EC2 instance, we are now getting
play.api.UnexpectedException: Unexpected exception[RejectedExecutionException: Task slick.backend.DatabaseComponent$DatabaseDef$$anon$2#76d6ac53 rejected from java.util.concurrent.ThreadPoolExecutor#6ea1d0ce[Running, pool size = 4, active threads = 4, queued tasks = 5, completed tasks = 157]]
(we did not get those before)
for our requests as well as for our background jobs that need to access the db. Our slick settings now include
numThreads = 4
queueSize = 5
maxConnections = 10
connectionTimeout = 5000
validationTimeout = 5000
as suggested here
Update 4
After we got the exceptions as described in Update 3, the backend is now running fine again. We didn't do anything for that. This was the first time the backend would recover from this state without us being involved.
It sounds like a thread management issue at first glance. Slick will provide its own execution context for database operations if you are using Slick 3.1, but you do want to manage the queue size so that it maps out to roughly the same size as the database:
myapp = {
database = {
driver = org.h2.Driver
url = "jdbc:h2:./test"
user = "sa"
password = ""
// The number of threads determines how many things you can *run* in parallel
// the number of connections determines you many things you can *keep in memory* at the same time
// on the database server.
// numThreads = (core_count (hyperthreading included))
numThreads = 4
// queueSize = ((core_count * 2) + effective_spindle_count)
// on a MBP 13, this is 2 cores * 2 (hyperthreading not included) + 1 hard disk
queueSize = 5
// https://groups.google.com/forum/#!topic/scalaquery/Ob0R28o45eM
// make larger than numThreads + queueSize
maxConnections = 10
connectionTimeout = 5000
validationTimeout = 5000
}
}
Also, you may want to use a custom ActionBuilder, and inject a Futures component and add
import play.api.libs.concurrent.Futures._
once you do that, you can add future.withTimeout(500 milliseconds) and time out the future so that an error response will come back. There's an example of a custom ActionBuilder in the Play example:
https://github.com/playframework/play-scala-rest-api-example/blob/2.5.x/app/v1/post/PostAction.scala
class PostAction #Inject()(messagesApi: MessagesApi)(
implicit ec: ExecutionContext)
extends ActionBuilder[PostRequest]
with HttpVerbs {
type PostRequestBlock[A] = PostRequest[A] => Future[Result]
private val logger = org.slf4j.LoggerFactory.getLogger(this.getClass)
override def invokeBlock[A](request: Request[A],
block: PostRequestBlock[A]): Future[Result] = {
if (logger.isTraceEnabled()) {
logger.trace(s"invokeBlock: request = $request")
}
val messages = messagesApi.preferred(request)
val future = block(new PostRequest(request, messages))
future.map { result =>
request.method match {
case GET | HEAD =>
result.withHeaders("Cache-Control" -> s"max-age: 100")
case other =>
result
}
}
}
}
so you'd add the timeout, metrics (or circuit breaker if the database is down) here.
After some more investigations we found that one of our jobs was generating deadlocks in our database. The issue we were running into is a known bug in the slick version we were using and is reported on github.
So the problem was that we were running DB transactions with .transactionally within a .map of a DBIOAction on too many threads at the same time.

how does Database.forDataSource work in slick?

I am currently setting up a project that uses play slick with scala. From the docs I find that to get a session we should do
val db = Database.forDataSource(dataSource: javax.sql.DataSource)
So I followed the pattern and used this in every Repository layer.(A layer on top of model similar to a dao).
I have couple of repositories and I have duplicated this line .
My question is , does this connect to database every time of have a common pool and we get the connection from this pool ?
From slick documentation :
Using a DataSource
You can provide a DataSource object to forDataSource. If you got it from the connection pool of your application framework, this plugs the pool into Slick.
val db = Database.forDataSource(dataSource: javax.sql.DataSource)
When you later create a Session, a connection is acquired from the pool and when the Session is closed it is returned to the pool.

How to use Cashbah MongoDB connections?

Note: I realise there is a similar question on SO but it talks about an old version of Casbah, plus, the behaviour explained in the answer is not what I see!
I was under the impression that Casbah's MongoClient handled connection pooling. However, doing lsof on my process I see a big and growing number of mongodb connections, which makes me doubt this pooling actually exists.
Basically, this is what I'm doing:
class MongodbDataStore {
val mongoClient = MongoClient("host",27017)("database")
var getObject1(): Object1 = {
val collection = mongoClient("object1Collection")
...
}
var getObject2(): Object2 = {
val collection = mongoClient("object2Collection")
...
}
}
So, I never close MongoClient.
Should I be closing it after every query? Implement my own pooling? What then?
Thank you
Casbah is a wrapper around the MongoDB Java client, so the connection is actually managed by it.
According to the Java driver documentation (http://docs.mongodb.org/ecosystem/drivers/java-concurrency/) :
If you are using in a web serving environment, for example, you should
create a single MongoClient instance, and you can use it in every
request. The MongoClient object maintains an internal pool of
connections to the database (default maximum pool size of 100). For
every request to the DB (find, insert, etc) the Java thread will
obtain a connection from the pool, execute the operation, and release
the connection. This means the connection (socket) used may be
different each time.
By the way, that's what I've experienced in production. I did not see any problem with this.

Re-using sessions in ScalaQuery?

I need to do small (but frequent) operations on my database, from one of my api methods. When I try wrapping them into "withSession" each time, I get terrible performance.
db withSession {
SomeTable.insert(a,b)
}
Running the above example 100 times takes 22 seconds. Running them all in a single session is instantaneous.
Is there a way to re-use the session in subsequent function invocations?
Do you have some type of connection pooling (see JDBC Connection Pooling: Connection Reuse?)? If not you'll be using a new connection for every withSession(...) and that is a very slow approach. See http://groups.google.com/group/scalaquery/browse_thread/thread/9c32a2211aa8cea9 for a description of how to use C3PO with ScalaQuery.
If you use a managed resource from an application server you'll usually get this for "free", but in stand-alone servers (for example jetty) you'll have to configure this yourself.
I'm probably stating the way too obvious, but you could just put more calls inside the withSession block like:
db withSession {
SomeTable.insert(a,b)
SomeOtherTable.insert(a,b)
}
Alternately you can create an implicit session, do your business, then close it when you're done:
implicit val session = db.createSession
SomeTable.insert(a,b)
SomeOtherTable.insert(a,b)
session.close