Airflow sql alchemy pool size ignored - postgresql

I am running airlfow local executor with a postgres database and I am getting a: (psycopg2.OperationalError) FATAL: remaining connection slots are reserved
My configs:
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 1800
sql_alchemy_reconnect_timeout = 300
Still I can see that airflow always keeps more connections open than the pool size.
How can we limit Airflow to actually use the pool limit?
airflow_version = 1.10.2; postgres_version = 9.6

You have forks of the main process as workers, each of which manage their own thread pool.
Check the implementation of LocalExecutor; as it is using multiprocessing under the hood. SqlAlchemy will close any open connections upon forking the LocalWorker; but the pool size will be equivalent to the parent, so at max, theoretically you'd have k * (n + 1) connections, where n is your parallelism constant and k is your sql_alchemy_pool_size.

People having the same problem with celery executor can try reducing scheduler.max_threads (AIRFLOW__SCHEDULER__MAX_THREADS) .
Theoretically Max no connections from scheduler = (sql_alchemy_pool_size + sql_alchemy_max_overflow) x max_threads + 1[Dag processor Manager] + 1[Main scheduler process]
Check out https://stackoverflow.com/a/57411598/5860608 for further info.

Related

Spring Boot 1.5.3 Creating more connection than specified in application.properties

I am working on a project where i have dual datasource configured. On testing i have limit the no of max-active connections to five but when i checked on database, i found that application create around 25+ connections.
Code Sample
# Number of ms to wait before throwing an exception if no connection is available.
spring.datasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.datasource.tomcat.max-active=1
spring.datasource.tomcat.max-idle=1
spring.datasource.tomcat.min-idle=1
spring.datasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.datasource.tomcat.test-on-borrow=true
spring.datasource.tomcat.test-while-idle = true
spring.datasource.tomcat.validation-query = true
spring.datasource.tomcat.time-between-eviction-runs-millis = 360000
spring.rdatasource.tomcat.max-wait=1000
# Maximum number of active connections that can be allocated from this pool at the same time.
spring.rdatasource.tomcat.max-active=1
spring.rdatasource.tomcat.max-idle=1
spring.rdatasource.tomcat.min-idle=1
spring.rdatasource.tomcat.initial-size=1
# Validate the connection before borrowing it from the pool.
spring.rdatasource.tomcat.test-on-borrow=true
spring.rdatasource.tomcat.test-while-idle= true
spring.rdatasource.tomcat.validation-query = true
spring.rdatasource.tomcat.time-between-eviction-runs-millis = 360000
above connection is working fine, but exceeding no of connection to database. User which i am using is limited to 10 connection.
when i hit request to application than i am getting
query wait timeout error with unable to create initial pool size.
I am using tomcat connection pooling
Please provide me the solution so application will run with 10 connections limit which is set at database.

How to fill data upto a size in multiple disk?

I am creating 4 mountpoint disk in Windows OS. I need to copy files up to a threshold value (say 50 GB).
I tried with vdbench. It works fine, but it throws an exception at last.
compratio=4
dedupratio=1
dedupunit=256k
* Host Definition section
hd=default,user=Administator,shell=vdbench,jvms=1
hd=localhost,system=localhost
********************************************************************************
* Storage Definition section
fsd=fsd1,anchor=C:\UnMapTest-Volume1\disk1\,depth=1,width=1,files=1,size=5g
fsd=fsd2,anchor=C:\UnMapTest-Volume2\disk2\,depth=1,width=1,files=1,size=5g
fwd=fwd1,fsd=fsd*,operation=write,xfersize=1m,fileio=sequential,fileselect=random,threads=10
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=1h,interval=1
Below is the exception from vdbench. Due to this my calling script would fail.
05:29:14.287 Message from slave localhost-0:
05:29:14.289 file=C:\UnMapTest-Volume1\disk1\\vdb.1_1.dir\vdb_f0001.file,busy=true
05:29:14.290 Thread: FwgThread write C:\UnMapTest-Volume1\disk1\ rd=rd1 For loops: None
05:29:14.291
05:29:14.292 last_ok_request: Thu Dec 28 05:28:57 PST 2017
05:29:14.292 Duration: 16.92 seconds
05:29:14.293 consecutive_blocks: 10001
05:29:14.294 last_block: FILE_BUSY File busy
05:29:14.294 operation: write
05:29:14.295
05:29:14.296 Do you maybe have more threads running than that you have
05:29:14.296 files and therefore some threads ultimately give up after 10000 tries?
05:29:14.300 *
05:29:14.301 ******************************************************
05:29:14.302 * Slave localhost-0 aborting: Too many thread blocks *
05:29:14.302 ******************************************************
05:29:14.303 *
05:29:21.235
05:29:21.235 Slave localhost-0 prematurely terminated.
05:29:21.235
05:29:21.235 Slave aborted. Abort message received:
05:29:21.235 Too many thread blocks
05:29:21.235
05:29:21.235 Look at file localhost-0.stdout.html for more information.
05:29:21.735
05:29:21.735 Slave localhost-0 prematurely terminated.
05:29:21.735
java.lang.RuntimeException: Slave localhost-0 prematurely terminated.
at Vdb.common.failure(common.java:335)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:198)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
I am using PowerShell in a Windows machine. Even if some other tools like Diskspd is having way to fill data up to some threshold then please provide me.
I found the answer by myself.
I have done this using Diskspd.exe as below
The following command fill 50 GB data in the mentioned disk folder
.\diskspd.exe -c50G -b4K -t2 C:\UnMapTest-Volume1\disk1\testfile1.dat
It is very simple than Vdbench for my requirement.
Caution : But it is not having real data so array side disk size is
not shown up with the size

orientdb 2.2.26 cluster setup - queries hang

Orientdb version - 2.2.26
CLuster - 3 node setup, readQuorum = 2, writeQuorum = "majority", ridBag.embeddedToSbtreeBonsaiThreshold = 2147483647
Nodes - CentOS 7.0, 24 cores and 96 GB RAM
Gremlin-scala/tinkerpop APIs are used for querying and inserting.
This code works fine on single node setup.
Code checks for existing vertex in graph. If vertex does not exist, then the insert operation are batched and send to the db within a transaction.
I see following warnings in orientdb log on all three nodes -
2017-09-15 16:37:31:025 WARNI [dev2] Timeout (852567ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.354 task=record_read(#65:22)) [ODistributedDatabaseImpl]
2017-09-15 16:52:18:239 WARNI [dev2] Timeout (1049042ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.568 task=record_read(#63:24)) [ODistributedDatabaseImpl]
2017-09-15 17:25:22:477 WARNI [dev2] Timeout (1984236ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.889 task=record_read(#63:24)) [ODistributedDatabaseImpl]
There is no problem on network. Firewall is disabled on all three nodes.
Are these logs related to the problem ?
What else I should check to fix the problem ?

Akka http server dispatcher number constantly increasing

I'm testing an akka http service on AWS ECS. Each instance is added to a load balancer which regularly makes requests to a health check route. Since this is a test environment I can control for no other traffic going to the server. I notice the debug log indicating that the "default dispatcher" number is consistently increasing:
[DEBUG] [01/03/2017 22:33:03.007] [default-akka.actor.default-dispatcher-41200] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:29.142] [default-akka.actor.default-dispatcher-41196] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:33.035] [default-akka.actor.default-dispatcher-41204] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:33:59.174] [default-akka.actor.default-dispatcher-41187] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:03.066] [default-akka.actor.default-dispatcher-41186] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:29.204] [default-akka.actor.default-dispatcher-41179] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
[DEBUG] [01/03/2017 22:34:33.097] [default-akka.actor.default-dispatcher-41210] [akka://default/system/IO-TCP/selectors/$a/0] New connection accepted
This trend is never reversed and will get up into the tens of thousands pretty soon. Is this normal behavior or indicative of an issue?
Edit: I've updated the log snippet to show that the dispatcher thread number goes way beyond what I would expect.
Edit #2: Here is the health check route code:
class HealthCheckRoutes()(implicit executionContext: ExecutionContext)
extends LogHelper {
val routes = pathPrefix("health-check") {
pathEndOrSingleSlash {
complete(OK -> "Ok")
}
}
}
Probably, yes. I think that's the thread name.
If you do a thread dump on the server, does it have a great many open threads?
It looks like your server is leaking a thread per connection.
(It will probably be much easier to debug and diagnose this on your development machine, rather than on the EC2 VM. Try to reproduce it locally.)
For you Question, check this comment:
Akka http server dispatcher number constantly increasing
About dispatcher:
It is no problem to use default dispatcher for operations like health check.
Threads are controlled by the dispatcher you specified, or default-dispatcher if not specified.
default-dispatcher is setting as following, which means the thread pool size is between 8 to 64 or equal to (number of processors * 3).
default-dispatcher {
type = "Dispatcher"
executor = "default-executor"
default-executor {
fallback = "fork-join-executor"
}
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
# Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
# like peeking mode which "pop".
task-peeking-mode = "FIFO"
}
Dispathcer Document:
http://doc.akka.io/docs/akka/2.4.16/scala/dispatchers.html
Configuration reference:
http://doc.akka.io/docs/akka/2.4.16/general/configuration.html#akka-actor
BTW for operations take a long time and blocks other operations, here is how to specify a custom dispatcher in Akka HTTP for them:
http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html
According to this akka-http github issue there doesn't seem to be a problem: https://github.com/akka/akka-http/issues/722

Can't obtain database connection within 5 seconds - Sidekiq workers, Unicorn, Redis ToGo on Heroku

I've a bunch background tasks (Sidekiq workers) that update database, and I keep getting this failing thread exception.
Heroku Log: WARN: could not obtain a database connection within 5.000 seconds (waited 5.001 seconds)
Heroku
Heroku Postgres :: Olive - 20 connections limit.
Redis ToGo - 10 connections limit.
Sidekiq - 2 connections.
Each client request create ~50 threads - finally ~20 threads trying to update db.
Now I know this is too much threads trying to make connection (updating Active:Record..).
I don't mind them to wait in try again until success.
-config/unicorn.rb
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 3)
timeout 30
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
if defined?(ActiveRecord::Base)
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['pool'] = ENV['DB_POOL'] || 2
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
ActiveRecord::Base.establish_connection(config)
end
end
-config/initializers/sidekiq.rb
require 'sidekiq'
Sidekiq.configure_server do |config|
if(database_url = ENV['DATABASE_URL'])
p pool_size = Sidekiq.options[:concurrency] + 2
p ENV['DATABASE_URL'] = "#{database_url}?pool=#{pool_size}"
ActiveRecord::Base.establish_connection
end
end
--condig/sidekiq.yml
:concurrency: 2
Thanks a lot for all the help,
Eldar