Google Cloud SQL PG11 : could not resize shared memory segment - postgresql

I recently upgraded a Postgres 9.6 instance to 11.1 on Google Cloud SQL. Since then I've begun to notice a large number of the following error across multiple queries:
org.postgresql.util.PSQLException: ERROR: could not resize shared
memory segment "/PostgreSQL.78044234" to 2097152 bytes: No space left
on device
From what I've read, this is probably due to changes that came in PG10, and the typical solution involves increasing the instance's shared memory. To my knowledge this isn't possible on Google Cloud SQL though. I've also tried adjusting work_mem with no positive effect.
This may not matter, but for completeness, the instance is configured with 30 gigs of RAM, 120 gigs of SSD hd space and 8 CPU's. I'd assume that Google would provide an appropriate shared memory setting for those specs, but perhaps not? Any ideas?
UPDATE
Setting the database flag random_page_cost to 1 appears to have reduced the impact the issue. This isn't a full solution though so would still love to get a proper fix if one is out there.
Credit goes to this blog post for the idea.
UPDATE 2
The original issue report was closed and a new internal issue that isnt viewable by the public was created. According to a GCP Account Manager's email reply however, a fix was rolled out by Google on 8/11/2019.

This worked for me, I think google needs to change a flag on how they're starting the postgres container on their end that we can't influence inside postgres.
https://www.postgresql.org/message-id/CAEepm%3D2wXSfmS601nUVCftJKRPF%3DPRX%2BDYZxMeT8M2WwLSanVQ%40mail.gmail.com
Bingo. Somehow your container tech is limiting shared memory. That
error is working as designed. You could figure out how to fix the
mount options, or you could disable parallelism with
max_parallel_workers_per_gather = 0.
show max_parallel_workers_per_gather;
-- 2
-- Run your query
-- Query fails
alter user ${MY_PROD_USER} set max_parallel_workers_per_gather=0;
-- Run query again -- query should work
alter user ${MY_PROD_USER} set max_parallel_workers_per_gather=2;
-- -- Run query again -- fails

You may consider increasing Tier of the instance, that will have influence on machine memory, vCPU cores, and resources available to your Cloud SQL instance. Check available machine types
In Google Cloud SQL PostgreSQL is also possible to change database flags, that have influence on memory consumption:
max_connections: some memory resources can be allocated per-client, so the maximum number of clients suggests the maximum possible memory use
shared_buffers: determines how much memory is dedicated to PostgreSQL to use for caching data
autovacuum - should be on.
I recommend lowering the limits, to lower memory consumption.

Related

How to determine how much "slack" in postgres database?

I've got a postgres database which I recently vacuumed. I understand that process marks space as available for future use, but for the most part does not return it to the OS.
I need to track how close I am to using up that available "slack space" so I can ensure the entire database does not start to grow again.
Is there a way to see how much empty space the database has inside it?
I'd prefer to just do a VACUUM FULL and monitor disk consumption, but I can't lock the table for a prolonged period, nor do I have the disk space.
Running version 13 on headless Ubuntu if that's important.
Just like internal free space is not given back to the OS, it also isn't shared between tables or other relations (like indexes). So having freespace in one table isn't going to help if a different table is the one growing. You can use pg_freespacemap to get a fast approximate answer for each table, or pgstattuple for more detailed data.

AWS RDS I/O usage very different from Postgres blocks read

I created a test Postgres database in AWS RDS. Created a 100 million row, 2 column table. Ran select * on that table. Postgres reports "Buffers: shared hit=24722 read=521226" but AWS reports IOPS in the hundreds. Why this huge discrepancy? Broadly, I'm trying to figure out how to estimate the number of AWS I/O operations a query might cost.
PostgreSQL does not have insight into what the kernel/FS get up to. If PostgreSQL issues a system call to read the data, then it reports that buffer as "read". If it was actually served out of the kernel's filesystem cache, rather than truly from disk, PostgreSQL has no way of knowing that (although you can make some reasonable statistical guesses if track_io_timing is on), while AWS's IO monitoring tool would know.
If you set shared_buffers to a large fraction of memory, then there would be little room left for a filesystem cache, so most buffers reported as read should truly have been read from disk. This might not be a good way run the system, but it might provide some clarity to your EXPLAIN plans. I've also heard rumors that Amazon Aurora reimplemented the storage system so that it uses directIO, or something similar, and so doesn't use the filesystem cache at all.

Is there a limit for AWS RDS max_connection settings for my instance?

I have a free tier RDS PostgreSQL database db.t2.micro (1 core, 1 Gib). My architecture did the RDS collapse because it reaches the max number of concurrent connections.
When I query select * from pg_settings where name='max_connections', the result was 87.
I found this formula about the capacity of the max_connection attribute for each instance based on the memory:
LEAST({DBInstanceClassMemory/9531392},5000)
For my instance, the number was 104, and I modified the parameter group to this value, and still my RDS collapse.
I made a last attempt updating the max_connections to 500 believing that it'll not work because the limit is 104. But to my surprise the database worked and could handle all the concurrent connections (above 104):
Obviously, I'm missing something.
Is really there a limit of max_connections for my instance?
If I change the max_connection setting the pricing for my instance change?
Also, I'm curious about What represents the horizontal line in the graphic, because it is at the level of my initial max_connection setting, and before the change, it was not present
Thanks!
Based on the information at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Limits.html#RDS_Limits.MaxConnections, you should've hit the limit there.
That said, I guess one of two possible things are happening here:
AWS do not enforce the limits. I guess they could be suggesting the limits based on instance size here.
There is some kind of "burst" allowed. Similar to IOPS there could be a small "burst" balance that you can scale to temporarily without affecting service.

Google cloud sql incorrect innodb_buffer_pool_size?

I upgraded my Cloud SQL machine from a 'db-f1-micro' 0.6GB RAM machine to a 'db-n1-standard-1' 3.75GB RAM machine last week. Running:
SELECT ##innodb_buffer_pool_size;
The output is:
1375731712
which I believe is 1.38GB. Here's the memory utilization for the primary and replica:
This seems oddly low for this machine type but researching (How to set innodb_buffer_pool_size in mysql in google cloud sql?) it doesn't appear I can alter the innodb_buffer_pool_size. Is this somehow dynamically set and slowly increasing over time? Doesn't appear to be near the 75-80% range google appears to aim for on these.
What is the value of innodb_buffer_pool_chunk_size and innodb_buffer_pool_instances?
innodb_buffer_pool_size must always be equal to or a multiple of the product of these two values, and will be automatically adjusted to be so. Chunk size can only be modified at startup, as explained in the docs page for InnoDB Buffer Pool Size configuration.
For Google CloudSQL in particular, not only the absolute, but also the relative size of the innodb_buffer_pool_size depends on instance type. I work for GCP support, and after some research in our documentation, I can tell that pool size is automatically configured based on an internal formula, which is subject to change. Improvements are being made to make instances more resilient against OOMs, and the buffer pool size has an important role in this.
So, it is expected behaviour that with your new instance type, and possibly different innodb_buffer_pool_chunk_size and innodb_buffer_pool_instances, you might get a quite changed memory usage. Currently, the user does not have control over the innodb_buffer_pool_size.

Postgres why is swap-usage growing? How to reduce it? - AWS RDS

Having a postgres DB on AWS-RDS the Swap Usage in constantly rising.
Why is it rising? I tried rebooting but it does not sink. AWS writes that high swap usage is "indicative of performance issues"
I am writing data to this DB. CPU and Memory do look healthy:
To be precise i have a
db.t2.micro-Instance and at the moment ~30/100 GB Data in 5 Tables - General Purpose SSD. With the default postgresql.conf.
The swap-graph looks as follows:
Swap Usage warning:
Well It seems that your queries are using a memory volume over your available. So you should look at your queries execution plan and find out largest loads. That queries exceeds the memory available for postgresql. Usually over-much joining (i.e. bad database structure, which would be better denonarmalized if applicable), or lots of nested queries, or queries with IN clauses - those are typical suspects. I guess amazon delivered as much as possible for postgresql.conf and those default values are quite good for this tiny machine.
But once again unless your swap size is not exceeding your available memory and your are on a SSD - there would be not that much harm of it
check the
select * from pg_stat_activity;
and see if which process taking long and how many processes sleeping, try to change your RDS DBparameter according to your need.
Obviously you ran out of memory. db.t2.micro has only 1GB of RAM. You should look in htop output to see which processes takes most of memory and try to optimize memory usage. Also there is nice utility called pgtop (http://ptop.projects.pgfoundry.org/) which shows current queries, number of rows read, etc. You can use it to view your postgress state in realtime. By the way, if you cannot install pgtop you can get just the same information from posgres internal tools - check out documentation of postgres stats collector https://www.postgresql.org/docs/9.6/static/monitoring-stats.html
Actually it is difficult to say what the problem is exactly but db.t2.micro is a very limited instance. You should consider taking a biggier instance especially if you are using postgres in production.