Scala Play 2.5 with Slick 3 and Spec2 - postgresql

I have a play application using Slick that I want to test using Spec2, but I keep getting the error org.postgresql.util.PSQLException: FATAL: sorry, too many clients already. I have tried to shut down the database connection by using
val mockApp = new GuiceApplicationBuilder()
val db = mockApp.injector.instanceOf[DBApi].database("default")
...
override def afterAll = {
db.getConnection().close()
db.shutdown()
}
But the error persists. The Slick configuration is
slick.dbs.default.driver="slick.driver.PostgresDriver$"
slick.dbs.default.db.driver="org.postgresql.Driver"
slick.dbs.default.db.url="jdbc:postgresql://db:5432/hygge_db"
slick.dbs.default.db.user="*****"
slick.dbs.default.db.password="*****"

getConnection of DbApi either gets connection from underlying data-source's (JdbcDataSource I presume) pool or creates a new one. I see no pool specified in your configuration, so I think it always creates a new one for you. So if you didn't close connection inside the test - getConnection won't help - it will just try to create a new one or take random connection from pool (if pooling is enabled).
So the solution is to either configure connection pooling:
When using a connection pool (which is always recommended in
production environments) the minimum size of the connection pool
should also be set to at least the same size. The maximum size of the
connection pool can be set much higher than in a blocking application.
Any connections beyond the size of the thread pool will only be used
when other connections are required to keep a database session open
(e.g. while waiting for the result from an asynchronous computation in
the middle of a transaction) but are not actively doing any work on
the database.
so you can just set maximum available connections number in your config:
connectionPool = 5
Or you can share same connection (you'll probably have to ensure sequentiality then):
object SharedConnectionForAllTests{
val connection = db.getConnection()
def close() = connection.close()
}
It's better to inject it with Spring/Guice of course, so you could conviniently manage connection's lifecycle.

Related

Does setMinimumIdle() respects the setIdleTimeout () in HikariCP Datasource

I am trying to create a connection pool by using HikariCP for Postgres. The pool should always have one active session/connection and the remaining connections should be on-demand which Hikari will take care of. To do the same, configured Datasource like below
private static HikariDataSource dataSource = null;
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://hostname:port/dbname");
config.setUsername("USERNAME");
config.setPassword("PASSWORD");
config.setMinimumIdle(1);
config.setMaximumPoolSize(5);
config.setIdleTimeout(10000);
dataSource = new HikariDataSource(config);
MinimumIdle(1) always keeps the one active session/open connection to DB unless the program or server closed and the remaining 4 connections will be created on-demand if they do not exist already in the pool.
IdleTimeout(10000) removes any connection object which is idle in the pool for more than 10 seconds and observed that it won't apply to any of the connections set to MinimumIdle(Here it is 1).
Is my understanding correct and serves my requirement? Appreciate your suggestions here.

Difference between io.r2dbc.pool.PoolingConnectionFactoryProvider.MAX_SIZE vs ConnectionPoolConfiguration max size in r2dbc

I am trying to understand the difference between 2 configurable parameters while creating a connection pool using r2dbc-pool.
I was able to configure the connection pool with the help of the below post:
Connection pool size with postgres r2dbc-pool
But wanted to understand the difference while configuring max size and initial size while creating
ConnectionFactory connectionFactory = ConnectionFactories.get(ConnectionFactoryOptions.builder()
.option(DRIVER, "pool")
.option(PROTOCOL, "postgresql")
.option(HOST, host)
.option(USER, user)
.option(PASSWORD, password)
.option(MAX_SIZE, 30)
.option(INITIAL_SIZE, 10)
.option(DATABASE, database)
.build());
ConnectionPoolConfiguration configuration = ConnectionPoolConfiguration.builder(connectionFactory)
.maxIdleTime(Duration.ofMinutes(30))
.initialSize(initialSize)
.maxSize(maxSize)
.initialSize(20)
.maxCreateConnectionTime(Duration.ofSeconds(1))
.build();
There isn't a difference really, these are just different ways to create a connection pool.
The initial size is the number of open connections to the database once the application starts.
You can check that with the query:
select * from pg_stat_activity;
You'll see exactly the number you specified as initial size.
Max size is the maximum nuber of connections that can be open at the same time (you do not control that directly, underlying pool implementation will scale the number of opened connections with higher load)

Dealing with NoHostAvailableException with phantom DSL

When trying to insert several thousand records at once into a remote Cassandra db, I reproducibly run into timeouts (with 5 to 6 thousand elements on a slow connection)
error:
All host(s) tried for query failed (tried: /...:9042
(com.datastax.driver.core.exceptions.OperationTimedOutException: [/...]
Timed out waiting for server response))
com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried: /...:9042
(com.datastax.driver.core.exceptions.OperationTimedOutException: [/...]
Timed out waiting for server response))
the model:
class RecordModel extends CassandraTable[ConcreteRecordModel, Record] {
object id extends StringColumn(this) with PartitionKey[String]
...
abstract class ConcreteRecordModel extends RecordModel
with RootConnector with ResultSetFutureHelper {
def store(rec: Record): Future[ResultSet] =
insert.value(_.id, rec.id).value(...).future()
def store(recs: List[Record]): Future[List[ResultSet]] = Future.traverse(recs)(store)
the connector:
val connector = ContactPoints(hosts).withClusterBuilder(
_.withCredentials(
config.getString("username"),
config.getString("password")
).withPoolingOptions(
new PoolingOptions().setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 2000)
.setPoolTimeoutMillis(10000)
)
).keySpace(keyspace)
I have tried tweaking the pooling options, separately and together. But even doubling all of the REMOTE settings did not change the timeout noticeably
current workaround, which I would like to avoid - splitting the list into batches and wait for completion of each:
def store(recs: List[Record]): Future[List[ResultSet]] = {
val rs: Iterator[List[ResultSet]] = recs.grouped(1000) map { slice =>
Await.result(Future.traverse(slice)(store), 100 seconds)
}
Future.successful(rs.to[List].flatten)
}
What would be a good way to handle this issue?
Thank you
EDIT
The errors do suggest failing/overloaded cluster, but I suspect network plays a major role here. The numbers provided above are from a remote machine. They are MUCH higher, when the same C* is fed from a machine in the same datacenter. Another suspicious detail is that feeding the same C* instance with quill does not encounter any timeout issues, remote or not.
What I really dislike about throttling is that the batch sizes are random and static, while they should be adaptible.
Sounds like you're hitting the limits of your cluster. If you want to avoid timeouts you will need to add more capacity to be able to handle the load. If you want to just do burst writes you should throttle them (as you are doing), as sending too many queries to too few nodes will inhibit performance. You can also increase the timeouts on the server side (read_request_timeout_in_ms, write_request_timeout_in_ms, request_timeout_in_ms) if you want to wait until you can write however this is not advisable as you will not give Cassandra any time to recover and likely cause large amounts of ParNew GC.

Slick confused about numThreads and best practice for good performance

I am using the PlayFrameWork with Slick and using it in a system that is all I/O database heavy. In my application.conf file I have this setting:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 20.0
}
}
}
}
}
This obviously gives me 20 threads per core for the play application and as I understand it Slick creates it's own threadpool, is the NumThreads field in Slick mean that that's the total number of threads or is it (NumThreads x CPU's)? And is there any best practice for best performance? I currently have my settings configured as:
database {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "dbname"
user = "postgres"
password = "password"
}
numThreads = 10
}
numThreads is simple number of thread in Thread pool. Slick use this thread pool for executing querying.
The following config keys are supported for all connection pools, both built-in and third-party:
numThreads (Int, optional, default: 20): The number of concurrent threads in the thread pool for asynchronous execution of
database actions. See the HikariCP wiki for more imformation about
sizing the thread pool correctly. Note that for asynchronous
execution in Slick you should tune the thread pool size (this
parameter) accordingly instead of the maximum connection pool size.
queueSize (Int, optional, default: 1000): The size of the queue for database actions which cannot be executed immediately when all
threads are busy. Beyond this limit new actions fail immediately.
Set to 0 for no queue (direct hand-off) or to -1 for an unlimited
queue size (not recommended).
The pool is tuned for asynchronous execution by default. Apart from the connection parameters you should only have to set numThreads and queueSize in most cases. In this scenario there is contention over the thread pool (via its queue), not over the connections, so you can have a rather large limit on the maximum number of connections (based on what the database server can still handle, not what is most efficient). Slick will use more connections than there are threads in the pool when sequencing non-database actions inside a transaction.
The following config keys are supported for HikariCP:
url (String, required): JDBC URL
driver or driverClassName (String, optional): JDBC driver class to load user (String, optional)*: User name
password (String, optional): Password
isolation (String, optional): Transaction isolation level for new connections. Allowed values are: NONE, READ_COMMITTED,
READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE.
catalog (String, optional): Default catalog for new connections.
readOnly (Boolean, optional): Read Only flag for new connections.
properties (Map, optional): Properties to pass to the driver or DataSource.
dataSourceClass (String, optional): The name of the DataSource class provided by the JDBC driver. This is preferred over using
driver. Note that url is ignored when this key is set (You have to
use properties to configure the database connection instead).
maxConnections (Int, optional, default: numThreads * 5): The maximum number of connections in the pool.
minConnections (Int, optional, default: same as numThreads): The minimum number of connections to keep in the pool.
connectionTimeout (Duration, optional, default: 1s): The maximum time to wait before a call to getConnection is timed out. If this
time is exceeded without a connection becoming available, a
SQLException will be thrown. 1000ms is the minimum value.
validationTimeout (Duration, optional, default: 1s): The maximum amount of time that a connection will be tested for aliveness. 1000ms
is the minimum value.
idleTimeout (Duration, optional, default: 10min): The maximum amount of time that a connection is allowed to sit idle in the pool.
A value of 0 means that idle connections are never removed from the
pool.
maxLifetime (Duration, optional, default: 30min): The maximum
lifetime of a connection in the pool. When an idle connection reaches
this timeout, even if recently used, it will be retired from the
pool. A value of 0 indicates no maximum lifetime.
connectionInitSql (String, optional): A SQL statement that will be executed after every new connection creation before adding it to the
pool. If this SQL is not valid or throws an exception, it will be
treated as a connection failure and the standard retry logic will be
followed.
initializationFailFast (Boolean, optional, default: false):
Controls whether the pool will "fail fast" if the pool cannot be
seeded with initial connections successfully. If connections cannot
be created at pool startup time, a RuntimeException will be thrown.
This property has no effect if minConnections is 0.
leakDetectionThreshold (Duration, optional, default: 0): The amount of time that a connection can be out of the pool before a message is
logged indicating a possible connection leak. A value of 0 means leak
detection is disabled. Lowest acceptable value for enabling leak
detection is 10s.
connectionTestQuery (String, optional): A statement
that will be executed just before a connection is obtained from the
pool to validate that the connection to the database is still alive.
It is database dependent and should be a query that takes very little
processing by the database (e.g. "VALUES 1"). When not set, the JDBC4
Connection.isValid() method is used instead (which is usually
preferable).
registerMbeans (Boolean, optional, default: false): Whether or not JMX Management Beans ("MBeans") are registered.
Slick have very transparent configuration setting.Best practice for good performance, There is no thumb rule. It depends on your database(how many parallel connection provides) and your application. It is all about tuning between database & application.

Akka and ReactiveMongo

I am trying to find the best approach to sharing the same pool of connection between actors withing the cluster workers. I have the following structure:
Master Actor -> Worker Actors(can be up to 100 or more) -> MongoDB
Between workers and MongoDB I want to put reactivemongo, however I am not sure how exactly to provide connection pool sharing between all actors.
According to reactivemongo documentation:
A MongoDriver instance manages an actor system; a connection manages a pool of connections. In general, MongoDriver or create a MongoConnection are never instantiated more than once. You can provide a list of one ore more servers; the driver will guess if it's a standalone server or a replica set configuration. Even with one replica node, the driver will probe for other nodes and add them automatically.
Should I just create it in Master actor and then bundle with each message?
So, this would be in Master actor:
val driver = new MongoDriver
val connection = driver.connection(List("localhost"))
And then I pass connection to actors in a message. Or should I query a connection in each Work Actor and pass just driver in a message?
Any help is very appreciated.
Thanks.
I would create the driver and connection in the master actor. I would then set up the the worker actors to take an instance of MongoConnection as a constructor argument so that each worker has a reference to the connection (which is really a proxy to a pool of connections). Then, in something like preStart, have the master actor create the workers (which I am assuming are routed) and supply the connection as an arg. A very simplified example could look like this:
class MongoMaster extends Actor{
val driver = new MongoDriver
val connection = driver.connection(List("localhost"))
override def preStart = {
context.actorOf(Props(classOf[MongoWorker], connection).withRouter(FromConfig()))
}
def receive = {
//do whatever you need here
...
}
}
class MongoWorker(conn:MongoConnection) extends Actor{
def receive = {
...
}
}
This code is not exact, but at least it shows the high level concepts I described.
The answer by cmbaxter works as long as you don't need to instantiate the worker actors remotely. MongoConnection is not serializable.
I found this article https://github.com/eigengo/akka-patterns/wiki/Configuration very helpful. The basic idea is to implement a trait called Configured, which is populated by the main application. The actors can then use that trait to gain access to local, non-serializable objects such as MongoConnection.