Postgresql 9.1 Out of Memory During Create Table as .. Select - postgresql

I am on Ubuntu Linux 11 and Postgresql 9.1. I use CREATE TABLE .. SELECT over a dblink, and with a table of around 2 million rows I get
ERROR: out of memory
DETAIL: Failed on request of size 432.
So I am taking contents of an entire table from one database, and inserting (or creating them) inside another database (on the same machine). I am using default values of Postgresql, however I experimented with values from pgtune as well to no avail. During the insert I do see memory usage going up, however the error occurs before my machine's limit is reached. ulimit -a says
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30865
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30865
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
If I do create table as ... select inside the same database, then it works without problems. Any ideas?
Edit: I tried adjusting the various memory settings in the postgresql.conf and it didn't help. What am I missing?

My guess from this is that intermediate sets are being allocated to memory only and cannot be materialized per se. Your best options are to find a workaround or work with the dblink people to correct this problem. Some potential workarounds are:
Creating a csv file with COPY and insert that into your db.
Chunking the query to say, 100k rows at a time.
To be clear, my guess would be that dblink handles things by allocating a result set, allocating memory required, and handing the data on to Postgresql. It is possible this might be done in a way that lets requests be proxied over quickly (and transferred over the network connection) when they might not be allocated entirely in memory in the dblink module itself.
However for INSERT ... SELECT it may be allocating the entire result set in memory first, and then trying to process it and insert it into the table at once.
However this is a gut feeling without a detailed review of the code (I did open dblink.c and scanned it quickly). You have to remember here that PostgreSQL is acting simultaneously as the db client to the other server and as a db server itself, so the memory gotchas both of libpq and in the backend will come together.
Edit: after a little more review it looks like this is mostly right. dblink uses cursors internally. My guess is everything is being fetched from the cursor before insert so it can have a go at it at once.

Related

PostgreSQL use of Linux hugepages

I am having some trouble understanding linux hugepages within PostgreSQL. From what I have googled:
Configured huge pages will be allocated and not swapped out of RAM.
Huge pages may improve performance due to a lower number of pages to be managed by the kernel.
Individual huge pages are contiguous blocks of memory.
My Postgres cluster conf:
shared_buffers set to 4GB
max_connections 30
all other conf is set as default
Huge pages is set but when starting defaulted to not using them. After setting huge pages on in PostgreSQL and in Linux to 4096/2 with vm.nr_hugepages and sysctl -p PostgreSQL would't start because it couldn't allocate memory enough. After trying several vm.nr_hugepages values the lowest it seamed to work was 3500.
My questions are:
How does PostgreSQL calculate the amount of memory it will need in advance?
Read that a memlimit should be set to PostgreSQL after setting hugepages ¿If it is only going to use huge pages wouldn't this be a limit already? Assuming no other process uses them.
Related to previous question: what would happen if it ran out of hugepages?
Will hugepages be used by all PostgreSQL processes, a sort operation for instance. The fact of having to allocate 3500 pages in order to have PostgreSQL starting ok is a bit confusing because I thought they would be mainly used by shared_buffers.
Thanks in advance!
Edit:
The system I am testing on is an Intel 8 cores with 32GB RAM. The main purpose of the setup is ETLs, receive files which will be loaded with COPY (several GBs per file), transform with some more or less complex SQL, persist results in several tables with a design similar to a DWH (star schemas), 40TB of total storage for PostgreSQL (4x10TB drives) and 512GB SSD for Ubuntu server 18.04. Some of the transformations that will be done will require tablescans or scans of a big part of the tables, in DB2 I have the option of using a block-based buffer pool (https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0009651.html). I thought I could achieve something "similar" with hugepages in PostgreSQL ¿Would this be possible?

Postgres Checkpointer process always running

I am running PostgreSQL server and limited the shared_buffers to 4GB.
When I insert a large number of records in the database, the checkpointer process start consuming the RAM. This process neither ends nor decreases the RAM consumption even after a day.
Any idea why this is happening?
That is perfectly fine. The memory reported as allocated to the process is in fact the shared memory of shared_buffers that remains allocated for the entire life of the PostgreSQL server process.
Since it is the job of the checkpointer to read dirty pages from shared buffers and write them to disk, it is to be expected that the process reads a lot if that memory.
If you want to reduce the amount of disk I/O the checkpointer has to perform during bulk inserts, increase max_wal_size.
To see how much RAM is still free on your machine, consult the “available” field of the free output.
The checkpointer is a permanent process. It can nap for long periods of time when there is nothing to do, but it doesn't go away while the server is still running. Most of the memory it uses is the same memory being shared by everyone else. The top command (which I think is what you are showing there) includes all the shared memory a process has ever touched, and doesn't prorate it for the number of other processes it is being shared with.
Indeed if shared_buffers is 4GB, then the numbers you report here are suspiciously low, not high.

PostgreSQL Table in memory

I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).
The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD.
I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem.
I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters).
But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.
Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory.
Any advice?
Regards,
Kapil
You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.
That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.
That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.
Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.
The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.
It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.
If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?
Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.
You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.

How to keep 32 bit mongodb memory usage down on changing dataset

I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.
To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.
Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.
It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.

what is the suggested number of bytes each time for files too large to be memory mapped at one time?

I am opening files using memory map. The files are apparently too big (6GB on a 32-bit PC) to be mapped in one ago. So I am thinking of mapping part of it each time and adjusting the offsets in the next mapping.
Is there an optimal number of bytes for each mapping or is there a way to determine such a figure?
Thanks.
There is no optimal size. With a 32-bit process, there is only 4 GB of address space total, and usually only 2 GB is available for user mode processes. This 2 GB is then fragmented by code and data from the exe and DLL's, heap allocations, thread stacks, and so on. Given this, you will probably not find more than 1 GB of contigous space to map a file into memory.
The optimal number depends on your app, but I would be concerned mapping more than 512 MB into a 32-bit process. Even with limiting yourself to 512 MB, you might run into some issues depending on your application. Alternatively, if you can go 64-bit there should be no issues mapping multiple gigabytes of a file into memory - you address space is so large this shouldn't cause any issues.
You could use an API like VirtualQuery to find the largest contigous space - but then your actually forcing out of memory errors to occur as you are removing large amounts of address space.
EDIT: I just realized my answer is Windows specific, but you didn't which platform you are discussing. I presume other platforms have similar limiting factors for memory-mapped files.
Does the file need to be memory mapped?
I've edited 8gb video files on a 733Mhz PIII (not pleasant, but doable).