I am running a couple of spring boot apps in Kubernetes. Each app is using spring JPA to connect to a Postgresql 10.6 database. What I have noticed is when the pods are killed unexpectedly the connections to the database are not released.
Running SELECT sum(numbackends) FROM pg_stat_database; on the database returns lets say 50, after killing a couple of pods running the the spring app and rerunning the query this number jumps to 60 this eventually causes the number of connections to postgresql to exceeds the maximum and prevent restarted pod's applications connecting to the database.
I have experimented with the postgresql option idle_in_transaction_session_timeout setting it to 15s but this does not drop the old connections and the number keeps increasing.
I am making use of the com.zaxxer.hikari.HikariDataSource datasource for my spring apps and was wondering if there was a way to prevent this from happening, either on the posgresql or spring boot's side.
Any advise or suggestions are welcome.
Spring boot version: 2.0.3.RELEASE
Java version: 1.8
Postgresql version 10.6
This issue can arise not only with kubernetes pods but also with a simple application running on a server which is killed forcibly ( like kill -9 pid on Linux) and not given opportunity to process to clean up via a shutdown hook to spring boot. I think in this case nothing on the application side can help. You can however try cleanup of inactive connections by several methods on database side as mentioned here.
Related
We are multiple microservices(With Liquibase) running on Azure AKS cluster as a pod.
Frequently we have noticed DB locks and pods will crash as it will fail in health checks.
Is there a way to overcome this scenario as it is impacting a lot. We have to manually unlock DB table, so that pod will start.
In one of the logs, I’ve noticed below error
I believe, it needs to be handled from Application(Springboot).
You can write a piece of code that executes at the start of application that will release the lock if found. Then the database connection won't fail.
Currently using the same for our environment.
Application stopped connecting to the mongodb secondary replicaset. We have the read preference set to secondary.
mongodb://db0.example.com,db1.example.com,db2.example.com/?replicaSet=myRepl&readPreference=secondary&maxStalenessSeconds=120
Connections always go to the primary overloading the primary node. This issue started after restarting patching and restart of the servers.
Tried mongo shell connectvity using above resulting in command being abruptly terminated. I see the process for that connect in the server in ps -ef|grep mongo
Any one faced this issue? Any troubleshooting tips are appreciated. Log's aren't showing anything related to the terminated/stopped connection process.
We were able to fix the issue. It was an issue on the spring boot side. When the right bean (we have two beans - one for primary and one for secondary connections) was injected, the connection was established to the secondary node for heavy reading and reporting purposes.
I tried to find information about the number of connections that Airflow establishes with the metadata database instance (Postgres in my case).
By running select * from pg_stat_activity I realized it creates at least 7 connections whose states change between idle and idle in transaction. The queries are registered as COMMIT or SELECT 1 (mostly). This was using the LocalExecutor on Airflow 2.1, but I tested with an installation of Airflow 1.10 having the same results.
Is anyone aware of where these connections come from? And, is there a way (and a reason) to change this?
Yes. Airflow will Open big number of connections - basically every process it creates will almost for sure open at least one connection. This is "known" characteristics of Apache Airflow.
If you are using MySQL - this is not a big issue as MySQL is good in handling multiple connections (it multiplexes incoming connnections via threads). Postgres uses process-per-connection approach which is much more resource-hungry.
The recommended way to handle that (Postgres is the most stable backend for Airflow) is to use PGBouncer to proxy such connections to Postgres.
In our Official Helm Chart, PGBouncer is used by default, when Postgres is used. https://airflow.apache.org/docs/helm-chart/stable/index.html
I Highly recommend this approach.
I set up a development environment in docker swarm environment, which consists of 2 nodes, a few networks and a few microservices. Following gives an example how it looks in cluster.
Service Network Node Image Version
nginx reverse proxy backend, frontend node 1 latest stable-alpine
Service A backend, database node 2 8-jre-alpine
Service B backend, database node 2 8-jre-alpine
Service C backend, database node 1 8-jre-alpine
Database postgresql database node 1 latest alpine
Services are spring boot 2.1.7 applications with boot-data-jpa. All the services above contain database connection to postgresql instance. For database I configured only following properties in application.properties:
spring.datasource.url
spring.datasource.username
spring.datasource.password
spring.datasource.driver-class-name
spring.jpa.hibernate.ddl-auto=
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
After some time I see that the connection limit in postgresql exceeds, which does not allow to create a new connection.
2019-09-21 13:01:07.031 1 --- [onnection adder] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Cannot acquire connection from data source org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
A similar error is shown also when I try to connect over ssh to database.
psql: FATAL: sorry, too many clients already
What I tried till now:
spring.datasource.hikari.leak-detection-threshold=20000
which didnt help.
I found several answers to this problem like:
increase connection limit in postgresql
No I dont want to do this. It is just a temporary solution. It will pollute the connection again but a bit later maybe.
add idle timeout in hikaripCP configuration
Default configuration of hikariCP has already a default value of 10mins, which doesnt help
add max life time to hikariCP configuration
Default configuration of hikariCP has already a default value of 30mins, which doesnt help
reduce number of idle connection in hikariCP configuration
Default configuration of hikariCP has already a default value of 10, which doesnt help
set min idle in hikariCP configuration
Default is 10 and I am fine with it. b
I am expecting a connection around 30 for the services but I find nearly 100 connections. Restarting the services or stopping them does not close the idle connections neither. What are your suggestions? Is it a docker specific problem? Did someone experience the same problem?
I have a WildFly which hosts an app which is calling some long running SQL queries (say queries or SP calls which take 10-20 mins or more).
Previously this WildFly was pointing to SQL Server 2008, now to Postgres 11.
Previously when I killed/rebooted WildFly, I had noticed that pretty quickly (if not instantaneously) the long-running SP/query calls that were triggered from the Java code (running in WildFly) were being killed too.
Now... with Postgres I noticed that these long running queries remain there and keep running in Postgres even after the app server has been shut down.
What is causing this? Who was killing them in SQL Server (the server-side itself or the JDBC driver or ...)? Is there some setting/parameter in Postgres which controls this behavior i.e. what the DB server will do with the query provided that the client who triggered the query has been shut down.
EDIT: We do a graceful WildFly shutdown by sending a command to WF to shutdown itself. Still the behavior seems different between SQL Server and Postgres.
If shutting down the application server causes JDBC calls that terminate the database session, this should not happen. If it doesn't close the JDBC connection properly, I'd call that a bug in the application server. If it does, but the queries on the backend are not canceled, I'd call that a bug in the JDBC driver.
Anyway, a workaround is to set tcp_keepalives_idle to a low value so that the server detects dead TCP connections quickly and terminates the query.