How to reliably memory constrain a postgres database

How to reliably memory constrain a postgres database - postgresql

I run postgres 10.4 on a very small machine with strict memory constraints (e.g. 200MB) on Debian. System swap space must be disabled in my case but SSD Disk space is plenty available (e.g. > 500GB). I am using a waterfall approach to distribute all available memory to the different uses in postgres following this logic:
The available memory is 200MB
---
max_connections = 10
max_worker_processes = 2
shared_buffers = 50MB
work_mem = (200MB - shared_buffers) * 0.8 / max_connections
maintenance_work_mem = (200MB - shared_buffers) * 0.1 / max_worker_processes
temp_buffers = (200MB - shared_buffers) * 0.05
wal_buffers = (200MB - shared_buffers) * 0.05
temp_file_limit = -1 (i.e. unlimited)
effective_cache_size = 200MB / 2
It is crucial for me, that sessions or even the postmaster are never canceled due to memory restrictions to ensure stable operation of postgres. In low memory situations postgres should work with temp files instead of memory.
I still get out of memory errors in some situations. (e.g. when I have a large insert into a table.)
How do I need to set all parameters to guarantee that postgres will not try to get more memory than there is available.

You can refer to this official document for the in-depth study of the memory configuration of postgresql server - https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
It has all the limits and properly suggested values for each memory param, these params can be set using the server configuration attributes like # of CPUs, RAM capacity etc.
Else, use this online tool to try out different configurations and make sure that the server doesn't require more memory than what is available to it - https://pgtune.leopard.in.ua/#/

Related

postgres not releasing memory after big query

When I run a query in postgres it will load in memory all the data it needs for query execution but when the query finishes executing the memory consumed by postgres is not released.
I run postgres with the Timescaledb extension installed, which already stores 93 million records, the request to get all these 93 million records takes 13GB of RAM and 21mins. After the request, RAM is reduced to 9GB and holds it.
How can I release all this memory?
Here's my postgresql.conf memory section.
shared_buffers = 8GB
huge_pages = try
huge_page_size = 0
temp_buffers = 8MB
max_prepared_transactions = 0
work_mem = 5118kB
hash_mem_multiplier = 1.0
maintenance_work_mem = 2047MB
autovacuum_work_mem = -1
logical_decoding_work_mem = 64MB
max_stack_depth = 2MB
shared_memory_type = mmap
effective_cache_size = 4GB
UPD 15.12.2022
#jjanes Yes, you were right. As a colleague and I found out, memory usage is not exactly what we thought about from the very beginning.
As we also understood from the free -m -h, when a request uses more memory than configured, the remaining data begins to be loaded into cache/buffer.
When we clean cache/buffer after big query, memory usage for this service drops to 8,53GB. This occupied memory +- coincides with the value of shared_buffers = 8GB, it practically does not decrease. For large requests, postgres still continues to use to request large chunks of memory from cache/buffer.
It turns out that we need to somehow clean the cache of postgres itself, since it does not release it itself?

how does memory allocation for postgres work?

shared_buffers - In a regular PostgreSQL installation, say I am allocating 25% of my memory to shared_buffers that means it leaves 75% for rest such as OS, page cache and work_mems etc. Is my understanding correct?
If so, AWS Aurora for Postgres uses 75% of memory for shared_buffers, then it would leave just 25% for other things?
Does the memory specified for work_mem, fully gets allocated to all sessions irrespective of whether they do any sorting or hashing operations?

Your first statement is necessarily true:
If 75% of the RAM are used for shared buffers, then only 25% are available for other things like process private memory.
work_mem is the upper limit of memory that one operation (“node”) in an execution plan is ready to use for operations like creating a hash or a bitmap or sorting. That does not mean that every operation allocates that much memory, it is just a limit.
Without any claim for absolute reliability, here is my personal rule of thumb:
shared_buffers + max_connections * work_mem should be less or equal to the RAM available. Then you are unlikely to run out of memory.

Postgres memory calculation effective_cache_size

I am verifying some of the configuration in my production Postgres instance. Our DB server has 32 GB RAM. From pg_settings, I see that effective_cache_size is set to:
postgres=> select name, setting, unit from pg_settings where name like 'effective_cache_size';
name | setting | unit
----------------------+---------+------
effective_cache_size | 7851762 | 8kB
(1 row)
As per my understanding, this value accounts to 7851762 X 8 KB = 62.8 GB. If my calculation is right, we are basically telling the optimizer that we have 62 GB for this parameter whereas we have only 32 GB of physical RAM.
Please correct me if I am calculating this parameter wrong. I always get confused with calculating parameter allocations for units with 8 KB.

7851762 times 8 kB is approximately 60 GB.
I would configure the setting to 30 GB if the machine is dedicated to the PostgreSQL database.
This parameter tells PostgreSQL how much memory there is available for caching its files. If the value is high, PostgreSQL will estimate nested loop joins with an index scan on the inner side cheaper, because it assumes that the index will probably be cached.

How to increase the connection limit for the Google Cloud SQL Postgres database?

The number of connections for Google Cloud SQL PostgreSQL databases is relatively low. Depending on the plan this is somewhere between 25 and 500, while the limit for MySQL in Google Cloud SQL is between 250 and 4000, reaching 4000 very quickly.
We currently have a number of trial instances for different customers running on Kubernetes and backed by the same Google Cloud SQL Postgres server. Each instance uses a separate set of database, roles and connections (one per service). We've already reached the limit of connections for our plan (50) and we're not even close to getting to the memory or cpu limits. Connection pooling seems not to be an option, because the connections are with different users.
I'm wondering now why the limit is so low and if there's a way to increase the limit without having to upgrade to a more expensive plan.

It looks like google just released this as a beta feature.
When creating or editing a database instance, you can add a flag called max_connections, where you can enter a new limit between 14 and 262143 connections.

There is a Feature request in the Public Issue Tracker to expose and hence control max_connections in PostgreSQL. This comment (I am reproducing it here) explains the reasons to set the number of connections the way it is now:
Per-tier max_connections is now fully rolled out. As shown on
https://cloud.google.com/sql/faq#sizeqps, the limits are now:
Memory size, in GiB | Maximum concurrent connections
--------------------+-------------------------------
0.6 (db-f1-micro) | 25
1.7 (db-g1-small) | 50
3.75 up to 6 | 100
6 up to 7.5 | 150
7.5 up to 15 | 200
15 up to 30 | 250
30 up to 60 | 300
60 up to 120 | 400
120 and above | 500
I understand your frustration about the micro/small instances having fewer than 100
concurrent connections and the lack of control of this flag. We arrived at these values by
taking the available RAM, reducing it by overhead, shared buffers, autovacuum memory and
then dividing the remaining ram by typical per-connection memory and rounding off. This
gives us the number of connections that can be used without risk of hitting out-of-memory
condition
The basic premise of a fully managed service with an attached SLA is that we provide safe
hosting. This is what motivates us using a max_connections that is safe against OOM.
Your option is, as you have discarded connection pooling, to use an instance with higher memory.
UPDATE:
As mentioned in a comment of the mentioned thread, there has been changes to the max connections settings, which are now:
Futhermore the defaults can now be overridden with flags, up to 260K connections.

For the Terraform gang, you can update the parameter using database_flags:
resource "google_sql_database_instance" "main" {
name = "main-instance"
database_version = "POSTGRES_14"
region = "us-central1"
settings {
tier = "db-f1-micro"
database_flags {
name = "max_connections"
value = 100
}
}
}
Note that at the time of writing the db-f1-micro default max_connections is 25, refs https://cloud.google.com/sql/docs/postgres/flags#postgres-m

Postgres: Checkpoints Are Occurring Too Frequently

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.
We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:
2015-01-15 12:32:53 EST [11403]: [10841-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG: checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG: checkpoints are occurring too frequently (3 seconds apart)
These are the related settings:
checkpoint_completion_target 0.7
checkpoint_segments 64
checkpoint_timeout 5min
checkpoint_warning 30s
wal_block_size 8192
wal_buffers 4MB
wal_keep_segments 5000
wal_level hot_standby
wal_receiver_status_interval 10s
wal_segment_size 16MB
wal_sync_method fdatasync
wal_writer_delay 200ms
work_mem 96MB
shared_buffers 24GB
effective_cache_size 128GB
So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.
1) Do you see any settings we can improve on? Let me know if you need other settings documented.
2) Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?
The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.
Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?
Thanks!
PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.

Your server is generating so much WAL data due to the wal_level set to hot_standby. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments. But they are just that - warnings - it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.
Changing synchronous_commit does not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.
It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse