Background
I have a PgPool-II cluster (ver 4.1.4) running on 3 centos 8 machines (virtual); SQL1, SQL2 and SQL3 (each on different hardware). On SQL1 and SQL2 PostgreSQL-12 are running (currently SQL1 is master and SQL2 in streaming replication standby).
In the db cluster there are currently 4 databases for 2 different software environments. One customer (cust) environment with quite much traffic and one educational (edu) environment with basically no user activity at all.
Both environments have one db each and also shares two databases (only for reading).
The application is written in net core 3.1 at the moment and uses npgsql and entity framework core for connecting to the pgpool cluster.
Except "normal" application sql requests to the databases with entity framework there are also periodical calls with psql in order to run the "show pool_nodes;" in pgpool. This could not be done with entity framework, hence psql instead.
Each environment also has one "main api" which handles internet traffic, and one "service api" which runs background tasks. Both uses entity framework to call the database. And psql is also sometimes invoked from the "service api" as described above.
On top of this all applications also have an A and B system.
So to sum up:
2 Environments (cust, edu) has A and B system, which also has "main api" and "service api" each => 8 applications, (12 if counting that all "service api" also invokes psql every 5 min).
The applications are on 2 different machines (A and B), "main api" and "service api" runs on the same machine for one environment and system.
Each entity framework application can also make parallell/multiple simultaneous requests depending on the user activity to the api.
My question
Every now and then there is an error from the pgpool cluster: Sorry, too many clients already.
Usually when connecting with psql, but sometimes this also happens from entity framework.
My initial thought was that this was because the databases had to many clients connected, but running pg_stat_activity just seconds after the error shows that there are way less connections (around 50), then the 150 max_connections in psql config. I could not find any errors at all in psql logs in the "log" folder in psql data directory.
But in the pgpool.log file there are an error entry:
Oct 30 16:34:19 sql1 pgpool[4062984]: [109497-1] 2020-10-30 16:34:19: pid 4062984: ERROR: Sorry, too many clients already
pgpool has num_init_children = 32 and max_pool = 4 so I do not really see where the problem might be coming from.
Some files that might be needed for more info:
pg_stat_activity (Taken 11s after the error)
pgpool.log
pgpool.conf
postgresql.conf
This problem might exist from a bug that according to one of the developers of pgpool will have its fix included in the 4.1.5 update november 19, 2020.
The bug makes the counter for existing connections not counting down if a query is cancelled.
Simply upgrade pgpool from 4.1.2 to 4.1.5 (Newer versions of pgpool are available as well)
I had the same problem with pgpool 4.1.2, Sorry, too many clients already was occurring almost every hour.
The problem got fixed after I upgraded pgpool from 4.1.2 to 4.1.5. We never experienced the error afterward
Related
I tried to find information about the number of connections that Airflow establishes with the metadata database instance (Postgres in my case).
By running select * from pg_stat_activity I realized it creates at least 7 connections whose states change between idle and idle in transaction. The queries are registered as COMMIT or SELECT 1 (mostly). This was using the LocalExecutor on Airflow 2.1, but I tested with an installation of Airflow 1.10 having the same results.
Is anyone aware of where these connections come from? And, is there a way (and a reason) to change this?
Yes. Airflow will Open big number of connections - basically every process it creates will almost for sure open at least one connection. This is "known" characteristics of Apache Airflow.
If you are using MySQL - this is not a big issue as MySQL is good in handling multiple connections (it multiplexes incoming connnections via threads). Postgres uses process-per-connection approach which is much more resource-hungry.
The recommended way to handle that (Postgres is the most stable backend for Airflow) is to use PGBouncer to proxy such connections to Postgres.
In our Official Helm Chart, PGBouncer is used by default, when Postgres is used. https://airflow.apache.org/docs/helm-chart/stable/index.html
I Highly recommend this approach.
I am facing same issue regularly which happens 1-3 times in a month and mostly on weekends.
To explain, CPU utilization is exceeding past 100% from last 32 hours.
EC2 Instance is t3.medium
Postgres version is 10.6
OS : Amazon Linux 2
I have tried gather all the information I could get using command provided in reference https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source
But didn't found any inconsistency or leak in my database, although while checking for process consuming all CPU resources I found following command is the culprit running for more than 32 hours.
/var/lib/postgresql/10/main/postgresql -u pg_linux_copy -B
This command is running from 3 separate processes at the moment and running from last 32 hours, 16 hours & 16 hours respectively.
Searching about about this didn't even returned a single result on google which is heartbreaking.
If I kill the process, everything turns back to normal.
What is the issue and what can I do to prevent this from happening again in future?
I was recently contacted by AWS EC2 Abuse team regarding my instance involved in some intrusion attack to some other server.
To my surprise, I found out that as I had used very week password root for default postgres account for my database and also had the postgres port public, the attacker silently gained access to instance and used my instance to try gaining access to another instance.
I am still not sure, how was he able to try ssh command by gaining access to master database account.
To summarise, One reason for unusual database spikes on server could be someone attacking your system.
I'm trying to determine if database replication is supported by Kentico 11, I see documentation from 7.0 regarding database replication, but nothing for version 8 onward. If it is supported, does it only still support merge replication? We want to have a warm site that runs off of a replicated database that we can fail over to in the case that our primary site is down.
this was removed from the documentation as it is not related to Kentico. The DB replication or also DB mirroring is SQL server side configuration. If it is configured correctly, Kentico will not even notice that the fail-over DB is used. This is all handled on the SQL server side.
I'm testing some work on a Google Cloud SQL instance of Postgres 9.6 and want to see how enabling parallel queries will improve (or not) the performance. I have followed the process here:
https://blog.2ndquadrant.com/postgresql96-parallel-sequential-scan/
But the explain plan doesn't indicate that its using worker nodes. To validate that I did it correctly I installed Postgres on my local machine and made the changes and it all worked and the explain plan showed workers being used.
Does anyone know of specific reasons or extra steps needed in Google Cloud SQL to get this working.
Thanks
Matt
[EDIT]
The steps I took to change the setting on GCP were:
from the DB Overview page (the one with the CPU Utilisation graph)
Click on Connect using Cloud Shell (underneath the connect to this instance box)
execute this command sudo nano /etc/postgresql/9.6/main/postgresql.conf
remove the # from max_parallel_workers_per_gather and max_worker_processes
change max_parallel_workers_per_gather to 8 (only 4 cores so I believe that 3 is the max to show benefits, but I don't believe any harm is done using 8)
ctrl-x then y
click the restart button on the DB overview page
Having repeated those steps this morning, I see the changes are now undone in the config file. I'm guessing this is due to the method i'm using to connect to the DB, ie using cloud shell...
And just noticed that using SET max_parallel_workers_per_gather = 8; works and I get the workers being used in the explain plan. So my question then is how do I change it in config for ALL sessions as opposed to on a per session basis.
Scenario
Multiple application servers host web services written in Java, running in SpringSource dm Server. To implement a new requirement, they will need to query a read-only PostgreSQL database.
Issue
To support redundancy, at least two PostgreSQL instances will be running. Access to PostgreSQL must be load balanced and must auto-fail over to currently running instances if an instance should go down. Auto-discovery of newly running instances is desirable but not required.
Research
I have reviewed the official PostgreSQL documentation on this issue. However, that focuses on the more general case of read/write access to the database. Top google results tend to lead to older newsgroup messages or dead projects such as Sequoia or DB Balancer, as well as one active project PG Pool II
Question
What are your real-world experiences with PG Pool II? What other simple and reliable alternatives are available?
PostgreSQL's wiki also lists clustering solutions, and the page on Replication, Clustering, and Connection Pooling has a table showing which solutions are suitable for load balancing.
I'm looking forward to PostgreSQL 9.0's combination of Hot Standby and Streaming Replication.
Have you looked at SQL Relay?
The standard solution for something like this is to look at Slony, Londiste or Bucardo. They all provide async replication to many slaves, where the slaves are read-only.
You then implement the load-balancing independent of this - on the TCP layer with something like HAProxy. Such a solution will be able to do failover of the read connections (though you'll still loose transaction visibility on a failover, and have to start new transaction on the new slave - but that's fine for most people)
Then all you have left is failover of the master role. There are supported ways of doing it on all these systems. None of them are automatic by default (because automatic failover of a database master role is really dangerous - consider the situation you are in once you've got split brain), but they can be automated easily if the requirement needs this for the master as well.