I'm testing some work on a Google Cloud SQL instance of Postgres 9.6 and want to see how enabling parallel queries will improve (or not) the performance. I have followed the process here:
https://blog.2ndquadrant.com/postgresql96-parallel-sequential-scan/
But the explain plan doesn't indicate that its using worker nodes. To validate that I did it correctly I installed Postgres on my local machine and made the changes and it all worked and the explain plan showed workers being used.
Does anyone know of specific reasons or extra steps needed in Google Cloud SQL to get this working.
Thanks
Matt
[EDIT]
The steps I took to change the setting on GCP were:
from the DB Overview page (the one with the CPU Utilisation graph)
Click on Connect using Cloud Shell (underneath the connect to this instance box)
execute this command sudo nano /etc/postgresql/9.6/main/postgresql.conf
remove the # from max_parallel_workers_per_gather and max_worker_processes
change max_parallel_workers_per_gather to 8 (only 4 cores so I believe that 3 is the max to show benefits, but I don't believe any harm is done using 8)
ctrl-x then y
click the restart button on the DB overview page
Having repeated those steps this morning, I see the changes are now undone in the config file. I'm guessing this is due to the method i'm using to connect to the DB, ie using cloud shell...
And just noticed that using SET max_parallel_workers_per_gather = 8; works and I get the workers being used in the explain plan. So my question then is how do I change it in config for ALL sessions as opposed to on a per session basis.
Related
Background
I have a PgPool-II cluster (ver 4.1.4) running on 3 centos 8 machines (virtual); SQL1, SQL2 and SQL3 (each on different hardware). On SQL1 and SQL2 PostgreSQL-12 are running (currently SQL1 is master and SQL2 in streaming replication standby).
In the db cluster there are currently 4 databases for 2 different software environments. One customer (cust) environment with quite much traffic and one educational (edu) environment with basically no user activity at all.
Both environments have one db each and also shares two databases (only for reading).
The application is written in net core 3.1 at the moment and uses npgsql and entity framework core for connecting to the pgpool cluster.
Except "normal" application sql requests to the databases with entity framework there are also periodical calls with psql in order to run the "show pool_nodes;" in pgpool. This could not be done with entity framework, hence psql instead.
Each environment also has one "main api" which handles internet traffic, and one "service api" which runs background tasks. Both uses entity framework to call the database. And psql is also sometimes invoked from the "service api" as described above.
On top of this all applications also have an A and B system.
So to sum up:
2 Environments (cust, edu) has A and B system, which also has "main api" and "service api" each => 8 applications, (12 if counting that all "service api" also invokes psql every 5 min).
The applications are on 2 different machines (A and B), "main api" and "service api" runs on the same machine for one environment and system.
Each entity framework application can also make parallell/multiple simultaneous requests depending on the user activity to the api.
My question
Every now and then there is an error from the pgpool cluster: Sorry, too many clients already.
Usually when connecting with psql, but sometimes this also happens from entity framework.
My initial thought was that this was because the databases had to many clients connected, but running pg_stat_activity just seconds after the error shows that there are way less connections (around 50), then the 150 max_connections in psql config. I could not find any errors at all in psql logs in the "log" folder in psql data directory.
But in the pgpool.log file there are an error entry:
Oct 30 16:34:19 sql1 pgpool[4062984]: [109497-1] 2020-10-30 16:34:19: pid 4062984: ERROR: Sorry, too many clients already
pgpool has num_init_children = 32 and max_pool = 4 so I do not really see where the problem might be coming from.
Some files that might be needed for more info:
pg_stat_activity (Taken 11s after the error)
pgpool.log
pgpool.conf
postgresql.conf
This problem might exist from a bug that according to one of the developers of pgpool will have its fix included in the 4.1.5 update november 19, 2020.
The bug makes the counter for existing connections not counting down if a query is cancelled.
Simply upgrade pgpool from 4.1.2 to 4.1.5 (Newer versions of pgpool are available as well)
I had the same problem with pgpool 4.1.2, Sorry, too many clients already was occurring almost every hour.
The problem got fixed after I upgraded pgpool from 4.1.2 to 4.1.5. We never experienced the error afterward
I am facing same issue regularly which happens 1-3 times in a month and mostly on weekends.
To explain, CPU utilization is exceeding past 100% from last 32 hours.
EC2 Instance is t3.medium
Postgres version is 10.6
OS : Amazon Linux 2
I have tried gather all the information I could get using command provided in reference https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source
But didn't found any inconsistency or leak in my database, although while checking for process consuming all CPU resources I found following command is the culprit running for more than 32 hours.
/var/lib/postgresql/10/main/postgresql -u pg_linux_copy -B
This command is running from 3 separate processes at the moment and running from last 32 hours, 16 hours & 16 hours respectively.
Searching about about this didn't even returned a single result on google which is heartbreaking.
If I kill the process, everything turns back to normal.
What is the issue and what can I do to prevent this from happening again in future?
I was recently contacted by AWS EC2 Abuse team regarding my instance involved in some intrusion attack to some other server.
To my surprise, I found out that as I had used very week password root for default postgres account for my database and also had the postgres port public, the attacker silently gained access to instance and used my instance to try gaining access to another instance.
I am still not sure, how was he able to try ssh command by gaining access to master database account.
To summarise, One reason for unusual database spikes on server could be someone attacking your system.
I am looking for a free solution to monitor some key stats for my mongo db cluster.
1) No of read writes happening
2) Replication lags
3) Memory status
4) Time taken by read writes
etc
.
.
After spending a couple of days I know the commands which I can use to get the same.
Also there are a lot of hosted options which directly solve the above problem.
eg: the new free monitoring https://docs.mongodb.com/manual/administration/free-monitoring/
Since I am not using the enterprise edition I am unable to find any good locally setup monitoring tool. I cant share the data over network, so need suggestions on how to setup local monitoring for the same.
We are running load test against an application that hits a Postgres database.
During the test, we suddenly get an increase in error rate.
After analysing the platform and application behaviour, we notice that:
CPU of Postgres RDS is 100%
Freeable memory drops on this same server
And in the postgres logs, we see:
2018-08-21 08:19:48 UTC::#:[XXXXX]:LOG: server process (PID XXXX) was terminated by signal 9: Killed
After investigating and reading documentation, it appears one possibility is linux oomkiller running having killed the process.
But since we're on RDS, we cannot access system logs /var/log messages to confirm.
So can somebody:
confirm that oom killer really runs on AWS RDS for Postgres
give us a way to check this ?
give us a way to compute max memory used by Postgres based on number of connections ?
I didn't find the answer here:
http://postgresql.freeideas.cz/server-process-was-terminated-by-signal-9-killed/
https://www.postgresql.org/message-id/CAOR%3Dd%3D25iOzXpZFY%3DSjL%3DWD0noBL2Fio9LwpvO2%3DSTnjTW%3DMqQ%40mail.gmail.com
https://www.postgresql.org/message-id/04e301d1fee9%24537ab200%24fa701600%24%40JetBrains.com
AWS maintains a page with best practices for their RDS service: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html
In terms of memory allocation, that's the recommendation:
An Amazon RDS performance best practice is to allocate enough RAM so
that your working set resides almost completely in memory. To tell if
your working set is almost all in memory, check the ReadIOPS metric
(using Amazon CloudWatch) while the DB instance is under load. The
value of ReadIOPS should be small and stable. If scaling up the DB
instance class—to a class with more RAM—results in a dramatic drop in
ReadIOPS, your working set was not almost completely in memory.
Continue to scale up until ReadIOPS no longer drops dramatically after
a scaling operation, or ReadIOPS is reduced to a very small amount.
For information on monitoring a DB instance's metrics, see Viewing DB Instance Metrics.
Also, that's their recommendation to troubleshoot possible OS issues:
Amazon RDS provides metrics in real time for the operating system (OS)
that your DB instance runs on. You can view the metrics for your DB
instance using the console, or consume the Enhanced Monitoring JSON
output from Amazon CloudWatch Logs in a monitoring system of your
choice. For more information about Enhanced Monitoring, see Enhanced
Monitoring
There's a lot of good recommendations there, including query tuning.
Note that, as a last resort, you could switch to Aurora, which is compatible with PostgreSQL:
Aurora features a distributed, fault-tolerant, self-healing storage
system that auto-scales up to 64TB per database instance. Aurora
delivers high performance and availability with up to 15 low-latency
read replicas, point-in-time recovery, continuous backup to Amazon S3,
and replication across three Availability Zones.
EDIT: talking specifically about your issue w/ PostgreSQL, check this Stack Exchange thread -- they had a long connection with auto commit set to false.
We had a long connection with auto commit set to false:
connection.setAutoCommit(false)
During that time we were doing a lot
of small queries and a few queries with a cursor:
statement.setFetchSize(SOME_FETCH_SIZE)
In JDBC you create a connection object, and from that connection you
create statements. When you execute the statments you get a result
set.
Now, every one of these objects needs to be closed, but if you close
statement, the entry set is closed, and if you close the connection
all the statements are closed and their result sets.
We were used to short living queries with connections of their own so
we never closed statements assuming the connection will handle the
things once it is closed.
The problem was now with this long transaction (~24 hours) which never
closed the connection. The statements were never closed. Apparently,
the statement object holds resources both on the server that runs the
code and on the PostgreSQL database.
My best guess to what resources are left in the DB is the things
related to the cursor. The statements that used the cursor were never
closed, so the result set they returned never closed as well. This
meant the database didn't free the relevant cursor resources in the
DB, and since it was over a huge table it took a lot of RAM.
Hope it helps!
TLDR: If you need PostgreSQL on AWS and you need rock solid stability, run PostgreSQL on EC2 (for now) and do some kernel tuning for overcommitting
I'll try to be concise, but you're not the only one who has seen this and it is a known (internal to Amazon) issue with RDS and Aurora PostgreSQL.
OOM Killer on RDS/Aurora
The OOM killer does run on RDS and Aurora instances because they are backed by linux VMs and OOM is an integral part of the kernel.
Root Cause
The root cause is that the default Linux kernel configuration assumes that you have virtual memory (swap file or partition), but EC2 instances (and the VMs that back RDS and Aurora) do not have virtual memory by default. There is a single partition and no swap file is defined. When linux thinks it has virtual memory, it uses a strategy called "overcommitting" which means that it allows processes to request and be granted a larger amount of memory than the amount of ram the system actually has. Two tunable parameters govern this behavior:
vm.overcommit_memory - governs whether the kernel allows overcommitting (0=yes=default)
vm.overcommit_ratio - what percent of system+swap the kernel can overcommit. If you have 8GB of ram and 8GB of swap, and your vm.overcommit_ratio = 75, the kernel will grant up to 12GB or memory to processes.
We set up an EC2 instance (where we could tune these parameters) and the following settings completely stopped PostgreSQL backends from getting killed:
vm.overcommit_memory = 2
vm.overcommit_ratio = 75
vm.overcommit_memory = 2 tells linux not to overcommit (work within the constraints of system memory) and vm.overcommit_ratio = 75 tells linux not to grant requests for more than 75% of memory (only allow user processes to get up to 75% of memory).
We have an open case with AWS and they have committed to coming up with a long-term fix (using kernel tuning params or cgroups, etc) but we don't have an ETA yet. If you are having this problem, I encourage you to open a case with AWS and reference case #5881116231 so they are aware that you are impacted by this issue, too.
In short, if you need stability in the near term, use PostgreSQL on EC2. If you must use RDS or Aurora PostgreSQL, you will need to oversize your instance (at additional cost to you) and hope for the best as oversizing doesn't guarantee you won't still have the problem.
I need a tool to manage a cluster of mongodbs. With an increasing number of machines, it is hard to maintain each machine without a tool.
More details:
The database grows around 50 MB per day, so they are approximately 1.5 GB per month. The mongodb is great for this because just increase a machine in your cluster resolve the size problem. The problem is that this change requires entering the host configuration and make the changes manually. I'd like to optimize the time of the team with a tool that allows remote execution, for example, run and save scripts or similar.
You can use Juju to create mongodb cluster :
https://github.com/charms/mongodb
http://www.youtube.com/playlist?list=PLyZVZdGTRf8g-5E7ppGGpxrraStyi493V
http://www.jorgecastro.org/2014/03/17/introducing-juju-bundles/
You need a provisioning tool, like Vagrant. Or SSH wrapper like Fabric.
MongoDB Cloud Manager has a Public API that you can use for cluster deployment automation.
Here's the link to the (very well-written) official docs.