Rule of thumb for Max Postgres connections to set? - postgresql

As a rule of thumb, how many max connections should I set my Postgres server to have? For instance, if I have 8 GB of memory, and quad core 3.2 GHZ machine, and the server is dedicated to only Postgres, how many max connections would be safe?

There is no real rule of thumb since it really depends on your load.
If you do lots of tiny queries than you can easily increase the amount of connections.
If you have a few heavy queries, than you will probably increase the work_mem so you'll run out of memory with a lot of connections.
The basic thing is:
don't have more connections than your memory allows.
don't kill and recreate connections if possible (pgbouncer springs to mind)

Related

PostgreSQL use of Linux hugepages

I am having some trouble understanding linux hugepages within PostgreSQL. From what I have googled:
Configured huge pages will be allocated and not swapped out of RAM.
Huge pages may improve performance due to a lower number of pages to be managed by the kernel.
Individual huge pages are contiguous blocks of memory.
My Postgres cluster conf:
shared_buffers set to 4GB
max_connections 30
all other conf is set as default
Huge pages is set but when starting defaulted to not using them. After setting huge pages on in PostgreSQL and in Linux to 4096/2 with vm.nr_hugepages and sysctl -p PostgreSQL would't start because it couldn't allocate memory enough. After trying several vm.nr_hugepages values the lowest it seamed to work was 3500.
My questions are:
How does PostgreSQL calculate the amount of memory it will need in advance?
Read that a memlimit should be set to PostgreSQL after setting hugepages ¿If it is only going to use huge pages wouldn't this be a limit already? Assuming no other process uses them.
Related to previous question: what would happen if it ran out of hugepages?
Will hugepages be used by all PostgreSQL processes, a sort operation for instance. The fact of having to allocate 3500 pages in order to have PostgreSQL starting ok is a bit confusing because I thought they would be mainly used by shared_buffers.
Thanks in advance!
Edit:
The system I am testing on is an Intel 8 cores with 32GB RAM. The main purpose of the setup is ETLs, receive files which will be loaded with COPY (several GBs per file), transform with some more or less complex SQL, persist results in several tables with a design similar to a DWH (star schemas), 40TB of total storage for PostgreSQL (4x10TB drives) and 512GB SSD for Ubuntu server 18.04. Some of the transformations that will be done will require tablescans or scans of a big part of the tables, in DB2 I have the option of using a block-based buffer pool (https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0009651.html). I thought I could achieve something "similar" with hugepages in PostgreSQL ¿Would this be possible?

Should I decrease max_connections in PostgreSQL when using PgBouncer?

Why shall I decrease max_connections in PostgreSQL when I use PgBouncer? Will there be a difference if I set max_connections in PostgreSQL's config equal 100 or 1000 when I use PgBouncer to limit connections below either?
Each possible connection reserves some resources in shared memory, and some backend private memory is also scaled to it. Reserving this memory when it will never be used is a waste of resources. This was more of an issue in the past, when shared memory resources were much more fiddly than they are on modern OS.
Also, there is some code which needs to iterate over all of those resources, possibly while holding locks, so it takes more time to do that if there is more data to iterate over. The exact nature of the iteration and locks have changed from version to version, as code was optimized to make it more scalable to large number of CPUs.
Neither of these effects is likely to be huge when the most of the possible connections are not actually used. Maybe the most important reason to lower max_connections is to get instant diagnosis in case pgbouncer has been misconfigured and is not doing its job correctly.

PostgreSQL + Apache connection pooling: MaxIdle and MaxActive vs RAM usage

We are running a PostgreSQL database in the Google Cloud. On top of this we have an app. In the app we can configure runtime connection pooling settings for the database.
Our Google SQL server has 30GB ram so the default max_connections is 500 as I read in the Google docs. In our app we have set the following connection pooling (Apache commons pooling) settings:
MaxActive: 200
MaxIdle: 200
MinIdle: 50
We are experiencing issues with these settings. First of all, we often run into the MaxActive limit. I can see a flatline in the connections graph at 200 connections a couple times a day. At those moments our logs are flooded with SQL connection errors.
The server is using around 28GB ram on peak moments (with 200 active connections). So we are close to the RAM limit as well.
Instead of blindly increasing the RAM and MaxActive I was hoping to get some insights on what would be a best practice in our situation. I see 2 solutions to our problem:
Increase RAM, increase MaxActive and increase MaxIdle (not very cost efficient)
Increase MaxActive, keep MaxIdle the same (or even lower) and keep MinIdle the same (or even lower)
Option 1 would be more cost expensive so I am wondering about option 2. Because I lower the Idle connections I would take up less RAM. However, will this have a noticeable impact on performance? I was thought to keep MaxIdle as close to MaxActive as possible, to ensure least overhead in creating new connections.
I have done quite some research, but I came to the conclusion that tuning these settings are very situation specific and there is not really a general best practice on these settings. I could not find a definitive answer to the performance impact of option 1 vs option 2.
Ps. we are also experiencing some slow queries in our app, so of course we can optimize things or change the design of our app to decrease the amount of concurrent connections.
I really hope someone can give some helpful insights / advice / best practices. Thanks a lot in advance!

Increase data insert speed of PostgreSQL

I am stuck in a problem that PostgreSQL data writes are very slow.
I developed my application in Java (using JDBC) to insert data into a PostgreSQL DB. It works well on our remote development server. However, after I deploy it to the production server, it causes a problem.
The insert speed of PostgreSQL on the production server is only ~150 records/s for 200000K records, while it is ~1000 records/s for the same data set on the development server.
Firstly, I tried to change the configuration in postgresql.conf as follows:
effective_cache_size = 4GB
max_wal_size = 2GB
work_mem = 128MB
shared buffers = 512MB
After I changed the configuration and restarted, it only affects the query speed, while the insert speed does not change (~150 records/s).
I have checked my server memory info, there is a lot of free memory ~4GB. The inserter only uses 0.5% of 8GB (~40MB).
So my questions are:
Is this a problem of a storage disk, such as SSD and HDD or virtual
and physical etc.? Why is the insert speed still very slow, although I have changed the configuration? Is there any way
for increasing the insert speed?
Note: the problem does not relate to the insert query structure.
I have used the same query in the same condition elsewhere (I set up an
environment in 2 servers in the same way). I do not know why the
DEVELOPMENT server (4GB) works better than the PRODUCTION server
(8GB).
The only one of your parameters that has an influence on INSERT performance is max_wal_size. High values prevent frequent checkpoints.
Use iostat -x 1 on the database server to see how busy your disks are. If they are quite busy, you are probably I/O bottlenecked. Maybe the I/O subsystem on your test server is better?
If you are running the INSERTs in many small transactions, you may be bottlenecked by fsync to the WAL. The symptom is a busy disk with not much I/O being performed.
In that case batch the INSERTs in larger transactions. The difference you observe could then be due to different configuration: Maybe you set synchronous_commit or (horribile dictu!) fsync to off on the test server.

MongoDB Sharding On One Machine

Does it make sense to implement mongodb sharding with say 100 shards on one beefier machine just to achieve higher concurrenct write into the database as I am told, there is a global lock for each monogod.exe process? Assuming that is possible, will that aproach give me higher write concurrency?
Running multiple mongods on a machine is not a good idea. Every one of the mongod processes will try to use all the available memory, forcing other mongod's memory mapped pages out of memory. This will create an enormous amount of swapping in most cases.
The global database lock is generally not a problem as is demonstrated in: http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
Only use one mongod per machine (but it's fine to add a mongos or config server as well), unless it's for some simple testing.
cheers,
Derick
I totally disagree. We run 8 shards per box in our setup. It consists of two head nodes each with two other machines for replication. 6 boxes total. These are beefy boxes with about 120GB of RAM, 32 Cores and 2TB each. By having 8 shards per box (we could go higher by the way this is set at 8 for historic purposes) we make sure we utilize the CPU efficiently. The RAM sorts itself out. You do have to watch the metrics and make sure you aren't paging too much but with SSD drives (which we have) if you do spill onto the disk drives it isn't too bad.
The only use case where I found running several mongod on the same server was to increase replication speed on high latency connection.
As highlighted by Derick, the write lock is not really your issue when running mongodb.
To answer your question : yes you can demonstrate mongo scaling with several instance per machine (4 instances per server sems to be enough) if your test does not involve too much data (otherwise page out will dramatically decrase your performance, I have already tested it)
However, instances will still compete for resources. All you will manage to do is to shift the database lock issue to a resource lock issue.
Yes, you can and in fact that's what we do for 50+ mil write-heavy database. Just make sure all your indexes per mongod fit into the RAM and there's room for growth and maintenance.
However, there's a small trade-off: Depending on what your target QPS is, this kind of sharing requires machines with more horsepower, whereas sharding on a single machine will not and in most cases you can do away with commodity, cheaper hardware.
Whatever the case is, do the series of performance tests (ageinst IO, Network, PQS etc) and establish your baseline carefully and consider SSD drives for storage and this may sound biased, but Linux XFS storage is also something to consider.