Slick confused about numThreads and best practice for good performance - scala

I am using the PlayFrameWork with Slick and using it in a system that is all I/O database heavy. In my application.conf file I have this setting:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-factor = 20.0
}
}
}
}
}
This obviously gives me 20 threads per core for the play application and as I understand it Slick creates it's own threadpool, is the NumThreads field in Slick mean that that's the total number of threads or is it (NumThreads x CPU's)? And is there any best practice for best performance? I currently have my settings configured as:
database {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "dbname"
user = "postgres"
password = "password"
}
numThreads = 10
}

numThreads is simple number of thread in Thread pool. Slick use this thread pool for executing querying.
The following config keys are supported for all connection pools, both built-in and third-party:
numThreads (Int, optional, default: 20): The number of concurrent threads in the thread pool for asynchronous execution of
database actions. See the HikariCP wiki for more imformation about
sizing the thread pool correctly. Note that for asynchronous
execution in Slick you should tune the thread pool size (this
parameter) accordingly instead of the maximum connection pool size.
queueSize (Int, optional, default: 1000): The size of the queue for database actions which cannot be executed immediately when all
threads are busy. Beyond this limit new actions fail immediately.
Set to 0 for no queue (direct hand-off) or to -1 for an unlimited
queue size (not recommended).
The pool is tuned for asynchronous execution by default. Apart from the connection parameters you should only have to set numThreads and queueSize in most cases. In this scenario there is contention over the thread pool (via its queue), not over the connections, so you can have a rather large limit on the maximum number of connections (based on what the database server can still handle, not what is most efficient). Slick will use more connections than there are threads in the pool when sequencing non-database actions inside a transaction.
The following config keys are supported for HikariCP:
url (String, required): JDBC URL
driver or driverClassName (String, optional): JDBC driver class to load user (String, optional)*: User name
password (String, optional): Password
isolation (String, optional): Transaction isolation level for new connections. Allowed values are: NONE, READ_COMMITTED,
READ_UNCOMMITTED, REPEATABLE_READ, SERIALIZABLE.
catalog (String, optional): Default catalog for new connections.
readOnly (Boolean, optional): Read Only flag for new connections.
properties (Map, optional): Properties to pass to the driver or DataSource.
dataSourceClass (String, optional): The name of the DataSource class provided by the JDBC driver. This is preferred over using
driver. Note that url is ignored when this key is set (You have to
use properties to configure the database connection instead).
maxConnections (Int, optional, default: numThreads * 5): The maximum number of connections in the pool.
minConnections (Int, optional, default: same as numThreads): The minimum number of connections to keep in the pool.
connectionTimeout (Duration, optional, default: 1s): The maximum time to wait before a call to getConnection is timed out. If this
time is exceeded without a connection becoming available, a
SQLException will be thrown. 1000ms is the minimum value.
validationTimeout (Duration, optional, default: 1s): The maximum amount of time that a connection will be tested for aliveness. 1000ms
is the minimum value.
idleTimeout (Duration, optional, default: 10min): The maximum amount of time that a connection is allowed to sit idle in the pool.
A value of 0 means that idle connections are never removed from the
pool.
maxLifetime (Duration, optional, default: 30min): The maximum
lifetime of a connection in the pool. When an idle connection reaches
this timeout, even if recently used, it will be retired from the
pool. A value of 0 indicates no maximum lifetime.
connectionInitSql (String, optional): A SQL statement that will be executed after every new connection creation before adding it to the
pool. If this SQL is not valid or throws an exception, it will be
treated as a connection failure and the standard retry logic will be
followed.
initializationFailFast (Boolean, optional, default: false):
Controls whether the pool will "fail fast" if the pool cannot be
seeded with initial connections successfully. If connections cannot
be created at pool startup time, a RuntimeException will be thrown.
This property has no effect if minConnections is 0.
leakDetectionThreshold (Duration, optional, default: 0): The amount of time that a connection can be out of the pool before a message is
logged indicating a possible connection leak. A value of 0 means leak
detection is disabled. Lowest acceptable value for enabling leak
detection is 10s.
connectionTestQuery (String, optional): A statement
that will be executed just before a connection is obtained from the
pool to validate that the connection to the database is still alive.
It is database dependent and should be a query that takes very little
processing by the database (e.g. "VALUES 1"). When not set, the JDBC4
Connection.isValid() method is used instead (which is usually
preferable).
registerMbeans (Boolean, optional, default: false): Whether or not JMX Management Beans ("MBeans") are registered.
Slick have very transparent configuration setting.Best practice for good performance, There is no thumb rule. It depends on your database(how many parallel connection provides) and your application. It is all about tuning between database & application.

Related

Does setMinimumIdle() respects the setIdleTimeout () in HikariCP Datasource

I am trying to create a connection pool by using HikariCP for Postgres. The pool should always have one active session/connection and the remaining connections should be on-demand which Hikari will take care of. To do the same, configured Datasource like below
private static HikariDataSource dataSource = null;
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://hostname:port/dbname");
config.setUsername("USERNAME");
config.setPassword("PASSWORD");
config.setMinimumIdle(1);
config.setMaximumPoolSize(5);
config.setIdleTimeout(10000);
dataSource = new HikariDataSource(config);
MinimumIdle(1) always keeps the one active session/open connection to DB unless the program or server closed and the remaining 4 connections will be created on-demand if they do not exist already in the pool.
IdleTimeout(10000) removes any connection object which is idle in the pool for more than 10 seconds and observed that it won't apply to any of the connections set to MinimumIdle(Here it is 1).
Is my understanding correct and serves my requirement? Appreciate your suggestions here.

How to abort mongo operation after given time limit using the pymongo MongoClient constructor?

According to this response you can set a time limit for a query operation via find() parameter or a collection method:
cursor = db.collection.find(max_time_ms=1)
or
cursor = db.collection.find().max_time_ms(1)
The doc says:
max_time_ms (optional): Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. Pass this as an alternative to calling max_time_ms() on the cursor.
We're currently experiencing a problem that a query runs for ~30 minutes before it eats all the RAM and the server dies. I hope this parameter gives a hard limit on the query and after the given time the server gives up.
Since our app is full of finds and cursors: is there a way how to set this parameter directly in the MongoClient constructor?
The doc says:
socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).
connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).
serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to 30000 (30 seconds).
...couldn't find another timeout and none of these seem to be the equivalent of max_time_ms. Am I missing something?

Scala Play 2.5 with Slick 3 and Spec2

I have a play application using Slick that I want to test using Spec2, but I keep getting the error org.postgresql.util.PSQLException: FATAL: sorry, too many clients already. I have tried to shut down the database connection by using
val mockApp = new GuiceApplicationBuilder()
val db = mockApp.injector.instanceOf[DBApi].database("default")
...
override def afterAll = {
db.getConnection().close()
db.shutdown()
}
But the error persists. The Slick configuration is
slick.dbs.default.driver="slick.driver.PostgresDriver$"
slick.dbs.default.db.driver="org.postgresql.Driver"
slick.dbs.default.db.url="jdbc:postgresql://db:5432/hygge_db"
slick.dbs.default.db.user="*****"
slick.dbs.default.db.password="*****"
getConnection of DbApi either gets connection from underlying data-source's (JdbcDataSource I presume) pool or creates a new one. I see no pool specified in your configuration, so I think it always creates a new one for you. So if you didn't close connection inside the test - getConnection won't help - it will just try to create a new one or take random connection from pool (if pooling is enabled).
So the solution is to either configure connection pooling:
When using a connection pool (which is always recommended in
production environments) the minimum size of the connection pool
should also be set to at least the same size. The maximum size of the
connection pool can be set much higher than in a blocking application.
Any connections beyond the size of the thread pool will only be used
when other connections are required to keep a database session open
(e.g. while waiting for the result from an asynchronous computation in
the middle of a transaction) but are not actively doing any work on
the database.
so you can just set maximum available connections number in your config:
connectionPool = 5
Or you can share same connection (you'll probably have to ensure sequentiality then):
object SharedConnectionForAllTests{
val connection = db.getConnection()
def close() = connection.close()
}
It's better to inject it with Spring/Guice of course, so you could conviniently manage connection's lifecycle.

How to free Redis Scala client allocated by RedisClientPool?

I am using debasishg/scala-redis as my Redis Client.
I want it to support multi threaded executions. Following their documentation: https://github.com/debasishg/scala-redis I defined
val clients = new RedisClientPool("localhost", 6379)
and then using it on each access to redis:
clients.withClient {
client => {
...
}
}
My question is, do I need to free each allocated client? And if so, what is a correct way to do it?
If you look at the constructor for RedisClientPool, there is a default value maxIdle ("the maximum number of objects that can sit idle in the pool", as per this), and a default value for poolWaitTimeout. You can change those values, but basically if you wait poolWaitTimeout you are guaranteed to have your ressources cleaned, except for the maxIdle clients on stand-by.
Also, if you can't stand the idea of idle clients, you can shut down the whole pool with mypool.close, and create it again when needed, but depending on your use case that might defeat the purpose of using a pool (if it's a cron job I guess that's fine).

Behaviour of Hikari setConnectionTimeout

Just looking for an explanation of rationale for this bit of code (PoolUtiltites:293 in version 2.2.4):
dataSource.setLoginTimeout((int) TimeUnit.MILLISECONDS.toSeconds(Math.min(1000L, connectionTimeout)));
This code and the setConnectionTimeout method means that I get this behaviour:
connectionTimeout == 0, then loginTimeout = Integer.MAX_VALUE
connectionTimeout > 0 && < 100, then HikariConfig throws IllegalArgumentException
connectionTimeout >= 100 && <= 1000, then loginTimeout = connectionTimeout
connectionTeimout > 1000, then loginTimeout = 1000
That looks really weird to me!
It's almost like the Math.min should be Math.max ???
In my current project I'd like to fail connections after 30s, which is impossible in the current setup.
I'm using the 4.1 postgres jdbc driver, but I think this is not relevant to the issue above.
Many thanks - and cool pooling library!!!
Ok, there are a couple of moving parts here. First, Math.min() is a bug, it should be Math.max(). In light of that (it will be fixed) consider the following:
It is important to note that connections are created asynchronously in the pool. The setConnectionTimeout() sets the maximum time (in milliseconds) that a call to getConnection() will wait for a connection before timing out.
The DataSource loginTimeout is the maximum time that physical connection initiation to the database can take before timing out. Because HikariCP obtains connections asynchronously, if the connection attempt fails, HikariCP will continue to retry, but your calls to getConnection() will timeout appropriately. We are using the connectionTimeout in kind of a double duty for loginTimeout.
For example, lets say the pool is completely empty, and you have configured a connectionTimeout of 30 seconds. When you call getConnection() HikariCP, realizing that there are no idle connections available, starts trying to obtain a new one. There is little point in having a loginTimeout exceeding 30 seconds, in this case.
The intent of the Math.max() call is to ensure that we never set loginTimeout to 0 if the user has configured connectionTimeout to 250ms. TimeUnit.MILLESECONDS.toSeconds() would return 0 without the Math.max(). If the user has configured a connectionTimeout of 0, meaning they never want to timeout, the time conversion of Integer.MAX_VALUE results in several thousand years as a timeout (virtually never).
Having said that, and in light of how HikariCP connections to the database are obtained asynchronously, even without the Math.max() fix, you should be able to achieve application-level connection timeouts of 30s. Unless physical connections to your database exceed 1000ms you would be unaffected by the Math.min().
We are putting out a 2.2.5-rc3 release candidate in the next few hours. I will slot this fix in.