What is the downside to increase shared buffer in PostgreSQL

What is the downside to increase shared buffer in PostgreSQL - postgresql

I've noticed a significant performance drop when data is not loaded into shared_buffer when querying PostgreSQL, the difference can be almost 100 times. So in the process of optimizing the query, I was wondering if there is anyway to increase performance by increasing the shared_buffer.
Then I started to investigate the shared_buffer in PostgreSQL. and I found that the recommend value is 25% of the OS memory and PostgreSQL will take advantage of OS cache to accelerate the query. But from what I've seen with my own db, reading from disk vs shared_buffer has huge difference, so I would like to query from shared_buffer for the most time.
So I wondered, what's the downside if I increase the shared_buffer in PostgreSQL? What if I only increase the shared_buffer in my readonly instance?

A downside of increasing the buffer cache is double buffering. When you need to read a page into shared_buffers, it might first need to evict an existing page to make room for it. But then the OS cache might need to evict a page from itself as well so to make room for it to read the page from the actual disk. Then you end up with the same page being located in both places, which wastes cache space. So then instead of reading a page from the OS cache you are more likely to need to read it from actual disk, which is far slower. From a double-buffering perspective, you probably want shared_buffers to be much less than half of the system RAM (using OS cache as the main cache) or much larger than half (using shared_buffers as the main cache)
Another downside is that if it is too large, you might start to get out-of-memory errors or invoke the OOM killer or otherwise destabilize the system.
Another problem is that after some operations, like DROP TABLE, TRUNCATE, or the ending of a COPY in some circumstances, PostgreSQL needs to invalidate a lot of buffers and chooses to do so by scouring the entire buffer cache. If you do a lot of those operations, that time can really add up with large buffer cache settings.

Some workloads (I know about DROP TABLE, but there may be others) perform better with a smaller shared_buffers. But essentially, it is a matter of trial and error (or better yet: reproducible performance tests).
If you can make shared_buffers big enough that it can hold everything you need from the database, that is probably a good choice.

Related

why kafka index files use memory mapped files ,but log files don't?

We know that kafka use memory mapped files for it's index files ,however it's log files don't use the memory mapped files technology.
My question is why index files use memory mapped files, however log files don't ?

Implementing both log and index appending with mmap approach will bring data consistency problem. mmap is not 100% guarantee to flush the data from memory to file(assuming the flush reply on OS instead of an explicitly calling on munmap(2)), if the index update get flushed but log data not get flushed successfully due to some reason, the data in the log can not be understood anymore.
BTW, for a append-only data, in the write direction, we only need to care about next-to-write block(buffer), so the huge data should not impact this.

That how many bytes can be mapped into the memory relates to the address space. For example, a 32-bit architecture can only address 4GB or even smaller portions of files. Kafka logs which are often larger enough might have only portions mapped at a time, therefore complicating reading them.
However, index files are sparse which means they are relatively small in size. Mapping them into the memory could speed up the lookup process and that's the primary benefit memory-mapped files offer.

Logs are where the messages are stored, the index files point to the position in the logs.
There is a nice, colorful blog post, explaining what is going on.

Having a fast index to improve read performance is a common optimization in databases where writes are append-only(Almost all LSTM databases do some form of this). Also as others have pointed out:
indexes are sparse, so smaller memory footprint. Even the sparsity of the index is configurable, which is useful as data grows.
Append only write patterns are faster than random seeks(especially true for SSDs), and therefore don't need a lot of attention for optimization.

if mmap log file, as physical memory is limited, it may cause page fault frequently which is a seriously expensive overhead. use sendFile system call is more suitable

is a reduction in free disk space a good overall indicator of a `work_mem` setting that is too low?

As I understand it (after a fair amount of searching online)...
1- If a component of a query (sort, join, etc.) uses more RAM/memory than my work_mem setting or the total memory used by all current operations on the server exceeds available OS memory, the query will start writing to disk.
Is that true?
2- Postgres (and many other good DB engines) use memory to cache a lot so queries go faster; therefore, the server should indicate low free memory even if the server isn't really starved for memory. So low free memory doesn't really indicate anything other than a good DB engine and healthy utilization.
Is that true?
3- If both #1 and #2 above are true, holding everything else content, if I want a board indicator of a work_mem setting that is too low or not enough overall OS memory, I should look to see if the server free disk space is going down?
Am I thinking about this correctly?
links:
https://www.postgresql.org/docs/current/static/runtime-config-resource.html
http://patshaughnessy.net/2016/1/22/is-your-postgres-query-starved-for-memory
https://www.enterprisedb.com/monitor-cpu-and-memory-percentage-used-each-process-postgresqlppas-9
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
I know I can set log_temp_files and look at individual temp files to tune the work_mem setting, but I wanted an overall gauge I could use to determine if possibly work_mem is too low before I start digging around looking at temp file sizes that exceed my work_mem setting.
I have PostgreSQL 10.

Processing a query takes a number of steps:
generate (all)possible plans
estimate the cost of execution of these plans (in terms of resources: disk I/O,buffers,memory,CPU), based on tuning constants and statistics.
pick the "optimal" plan , based on tuning constants
execute the chosen plan.
In most cases, a plan that is expected (step2) to need more work_mem than your work_mem setting will not be chosen in step3. (because "spilling to disk" is considered very expensive)
Once step4 detects that it is needing more work_mem, its only choice is to spill to disk. Shit happens... At least this doesn't rely on the OS's page-swapping the the overcommitted memory.)
The rules are very simple:
hash-joins are often optimal but will cost memory
don't try to use more memory than you have
if there is a difference between expected(step2) and observed(step4) memory, your statistics are wrong. You will be punished by spill-to-disk.
a lack of usable indexes will cause hash joins or seqscans.
sorting uses work_mem, too. The mechanism is similar :bad estimates yield bad plans.
CTE's are often/allways(?) materialized. This will splill to disk once your bufferspace overflows.
CTE's don't have statistics, and don't have indices.
A few guidelines/advice:
use a correct data model (and don't denormalize)
use the correct PK/FK's and secundary indices.
run ANALYZE the_table_name; to gather fresh statistics after huge modifications to the table's structure or data.
Monitoring:
check the Postgres logfile
check the query plan, compare observed <--> expected
monitor the system resource usage (on Linux: via top/vmstat/iostat)

What about expected performance in Pentaho?

I am using Pentaho to create ETL's and I am very focused on performance. I develop an ETL process that copy 163.000.000 rows from Sql server 2088 to PostgreSQL and it takes 17h.
I do not know how good or bad is this performance. Do you know how to measure if the time that takes some process is good? At least as a reference to know if I need to keep working heavily on performance or not.
Furthermore, I would like to know if it is normal that in the first 2 minutes of ETL process it load 2M rows. I calculate how long will take to load all the rows. The expected result is 6 hours, but then the performance decrease and it takes 17h.
I have been investigating in goole and I do not find any time references neither any explanations about performance.

Divide and conquer, and proceed by elimination.
First, add a LIMIT to your query so it takes 10 minutes instead of 17 hours, this will make it a lot easier to try different things.
Are the processes running on different machines? If so, measure network bandwidth utilization to make sure it isn't a bottleneck. Transfer a huge file, make sure the bandwidth is really there.
Are the processes running on the same machine? Maybe one is starving the other for IO. Are source and destination the same hard drive? Different hard drives? SSDs? You need to explain...
Examine IO and CPU usage of both processes. Does one process max out one cpu core?
Does a process max out one of the disks? Check iowait, iops, IO bandwidth, etc.
How many columns? Two INTs, 500 FLOATs, or a huge BLOB with a 12 megabyte PDF in each row? Performance would vary between these cases...
Now, I will assume the problem is on the POSTGRES side.
Create a dummy table, identical to your target table, which has:
Exact same columns (CREATE TABLE dummy LIKE table)
No indexes, No constraints (I think it is the default, double check the created table)
BEFORE INSERT trigger on it which returns NULL and drop the row.
The rows will be processed, just not inserted.
Is it fast now? OK, so the problem was insertion.
Do it again, but this time using an UNLOGGED TABLE (or a TEMPORARY TABLE). These do not have any crash-resistance because they don't use the journal, but for importing data it's OK.... if it crashes during the insert you're gonna wipe it out and restart anyway.
Still No indexes, No constraints. Is it fast?
If slow => IO write bandwidth issue, possibly caused by something else hitting the disks
If fast => IO is OK, problem not found yet!
With the table loaded with data, add indexes and constraints one by one, find out if you got, say, a CHECK that uses a slow SQL function, or a FK into a table which has no index, that kind of stuff. Just check how long it takes to create the constraint.
Note: on an import like this you would normally add indices and constraints after the import.
My gut feeling is that PG is checkpointing like crazy due to the large volume of data, due to too-low checkpointing settings in the config. Or some issue like that, probably random IO writes related. You put the WAL on a fast SSD, right?

17H is too much. Far too much. For 200 Million rows, 6 hours is even a lot.
Hints for optimization:
Check the memory size: edit the spoon.bat, find the line containing -Xmx and change it to half your machine memory size. Details varies with java version. Example for PDI V7.1.
Check if the query from the source database is not too long (because too complex, or server memory size, or ?).
Check the target commit size (try 25000 for PostgresSQL), the Use batch update for inserts in on, and also that the index and constraints are disabled.
Play with the Enable lazy conversion in the Table input. Warning, you may produce difficult to identify and debug errors due to data casting.
In the transformation property you can tune the Nr of rows in rowset (click anywhere, select Property, then the tab Miscelaneous). On the same tab check the transformation is NOT transactional.

design mongodb to load entire content in memory

I am involved in a project where they get enough RAM to store the entire database in memory. According to the manager, that is what 10Gen recommended. This is counter intuitive. Is that really the way you want to use Mongodb?

It is not counter intuitive... I find it quite intuitive, actually.
In How much faster is the memory usually than the disk? you can read:
(...) memory is only about 6 times faster when you're doing sequential
access (350 Mvalues/sec for memory compared with 58 Mvalues/sec for
disk); but it's about 100,000 times faster when you're doing random
access.
So if you can fit all your data in RAM, it is quite good because you are going to be really fast reading your data.
Regarding MongoDB, from the FAQ's:
It’s certainly possible to run MongoDB on a machine with a small
amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
The problem is that you usually have much more data than memory available. And then you have to go to disk, and disk I/O is slow. Regarding database performance, avoiding full scan queries is key (much more important when accessing to disk). Therefore, if your data set does not fit in memory, you should aim at having indexes for the vast majority of your access patterns and try to fit those indexes in memory:
If you have created indexes for your queries and your working data set
fits in RAM, MongoDB serves all queries from memory.

It all depends on the size of your database. I am guessing that you said your database was actually quite small, otherwise I cannot see how someone at 10gen gave such advice, I mean not even #Stennie gives such advice (he is 10gen by the way).
Even if your database is small I don't see how the manager recommended that. MongoDB does not do memory management of its own as such it does not "pin" data into pages like memcached does or other memory based databases do.
This means that the paging of mongods data can be quite unpredicatable, a.k.a you will spend more time trying to keep things in RAM than paging in data. This is why it is better to just make sure your working set fits and it can loaded with speed, such things are based upon your hardware and queries.
#Stennies comment pretty much sums up the stance you should be taking with MongoDB.

Shared buffer in postgres

I am curious about the role played by shared buffer in postgres. Shared buffer maintains all the recently accessed disk pages and dirty pages. If a new page needs to be brought in and there is no space left in shared buffer, a victim dirty page is written back to the disk.
However, I am confused about this statement-
PostgreSQL depends on the OS for caching. (http://www.varlena.com/GeneralBits/Tidbits/perf.html#shbuf)"
How does postgres depends on the OS for caching? And how does it change the behavior of shared buffer?

Postgresql uses the OS cache and its own data cache. The two are useful according to your database usage.
OS cache is very fast but basic: it removes older data with the new one. It is useful for very versatile query results.
PG cache is slower (still much faster than disk) but it keeps usage counters of the most used data. Useful for recurrent results/index.

I think this link is clearer (and more up-to-date).
My understanding is that shared_buffers is where PostgreSQL processes work and share information, but above a certain limit (15% to 25% of the server RAM) diminishing returns makes it more interesting to leave more RAM to the OS to perform caching itself.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse