PostgreSQL poor cache performance or cache-misses

PostgreSQL poor cache performance or cache-misses - postgresql

PostgreSQLv14 source code compiled on GCCv12 with --enable-debug.
And Benchmark environment used as HammerDBv4.1 to test OLTP(TPC-C) on PostgreSQL
Here to the goal to identity cache misses(L1, L2 and L3 caches)annotate PostgreSQL source code or Methods() on running PostgreSQL bin's
Tired following commands:
valgrind --tool=cachegrind ./bin/tclsh8.6 ./hammerdbcli auto test_32vu.tcl (not shown PostgreSQL Methods() or calls)
valgrind --tool=cachegrind --trace-children=yes ./usr/local/postgresqlv14/bin/postgres -D /usr/local/postgres/data
example(excepted:Cachegrind: Why so many cache misses?)
But the above command not help out.And also tired with other tools like perf.
Perf also not helped on it :
https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2#amd_
Please let me know the right way to check annotate source code of cache misses in PostgreSQL by using Valgrind or other tools like perf.

Related

Can gem5 only simulate executable binary? How to run full system gem5 simulation

I am trying to simulate hardware changes such as cache on the application performance. However, what I want is arbitrary applications such as NodeJS, bash shell, java...
build/X86/gem5.opt \
configs/example/se.py \
--cmd /usr/bin/node \
--options /path/to/my/node.js
(1) Is this the correct way? Or do I have to feed an executable binary?
Using the command way, I got the error though:
fatal: syscall epoll_create1 (#291) unimplemented.
I found similar Q1 Q2
(2) If I did right in (1), how can I fix the errors. Maybe more than one unimplemented syscall. The Q2 answer says try gem5 full system model. I have little experience with gem5, so can you give me an example of using gem5 full system model to run a node, bash or whatever application that is not binary but command-line type?

PostgreSQL in-memory [duplicate]

I want to run a small PostgreSQL database which runs in memory only, for each unit test I write. For instance:
#Before
void setUp() {
String port = runPostgresOnRandomPort();
connectTo("postgres://localhost:"+port+"/in_memory_db");
// ...
}
Ideally I'll have a single postgres executable checked into the version control, which the unit test will use.
Something like HSQL, but for postgres. How can I do that?
Were can I get such a Postgres version? How can I instruct it not to use the disk?

(Moving my answer from Using in-memory PostgreSQL and generalizing it):
You can't run Pg in-process, in-memory
I can't figure out how to run in-memory Postgres database for testing. Is it possible?
No, it is not possible. PostgreSQL is implemented in C and compiled to platform code. Unlike H2 or Derby you can't just load the jar and fire it up as a throwaway in-memory DB.
Its storage is filesystem based, and it doesn't have any built-in storage abstraction that would allow you to use a purely in-memory datastore. You can point it at a ramdisk, tempfs, or other ephemeral file system storage though.
Unlike SQLite, which is also written in C and compiled to platform code, PostgreSQL can't be loaded in-process either. It requires multiple processes (one per connection) because it's a multiprocessing, not a multithreading, architecture. The multiprocessing requirement means you must launch the postmaster as a standalone process.
Use throwaway containers
Since I originally wrote this the use of containers has become widespread, well understood and easy.
It should be a no-brainer to just configure a throw-away postgres instance in a Docker container for your test uses, then tear it down at the end. You can speed it up with hacks like LD_PRELOADing libeatmydata to disable that pesky "don't corrupt my data horribly on crash" feature ;).
There are a lot of wrappers to automate this for you for any test suite and language or toolchain you would like.
Alternative: preconfigure a connection
(Written before easy containerization; no longer recommended)
I suggest simply writing your tests to expect a particular hostname/username/password to work, and having the test harness CREATE DATABASE a throwaway database, then DROP DATABASE at the end of the run. Get the database connection details from a properties file, build target properties, environment variable, etc.
It's safe to use an existing PostgreSQL instance you already have databases you care about in, so long as the user you supply to your unit tests is not a superuser, only a user with CREATEDB rights. At worst you'll create performance issues in the other databases. I prefer to run a completely isolated PostgreSQL install for testing for that reason.
Instead: Launch a throwaway PostgreSQL instance for testing
Alternately, if you're really keen you could have your test harness locate the initdb and postgres binaries, run initdb to create a database, modify pg_hba.conf to trust, run postgres to start it on a random port, create a user, create a DB, and run the tests. You could even bundle the PostgreSQL binaries for multiple architectures in a jar and unpack the ones for the current architecture to a temporary directory before running the tests.
Personally I think that's a major pain that should be avoided; it's way easier to just have a test DB configured. However, it's become a little easier with the advent of include_dir support in postgresql.conf; now you can just append one line, then write a generated config file for all the rest.
Faster testing with PostgreSQL
For more information about how to safely improve the performance of PostgreSQL for testing purposes, see a detailed answer I wrote on this topic earlier: Optimise PostgreSQL for fast testing
H2's PostgreSQL dialect is not a true substitute
Some people instead use the H2 database in PostgreSQL dialect mode to run tests. I think that's almost as bad as the Rails people using SQLite for testing and PostgreSQL for production deployment.
H2 supports some PostgreSQL extensions and emulates the PostgreSQL dialect. However, it's just that - an emulation. You'll find areas where H2 accepts a query but PostgreSQL doesn't, where behaviour differs, etc. You'll also find plenty of places where PostgreSQL supports doing something that H2 just can't - like window functions, at the time of writing.
If you understand the limitations of this approach and your database access is simple, H2 might be OK. But in that case you're probably a better candidate for an ORM that abstracts the database because you're not using its interesting features anyway - and in that case, you don't have to care about database compatibility as much anymore.
Tablespaces are not the answer!
Do not use a tablespace to create an "in-memory" database. Not only is it unnecessary as it won't help performance significantly anyway, but it's also a great way to disrupt access to any other you might care about in the same PostgreSQL install. The 9.4 documentation now contains the following warning:
WARNING
Even though located outside the main PostgreSQL data directory,
tablespaces are an integral part of the database cluster and cannot be
treated as an autonomous collection of data files. They are dependent
on metadata contained in the main data directory, and therefore cannot
be attached to a different database cluster or backed up individually.
Similarly, if you lose a tablespace (file deletion, disk failure,
etc), the database cluster might become unreadable or unable to start.
Placing a tablespace on a temporary file system like a ramdisk risks
the reliability of the entire cluster.
because I noticed too many people were doing this and running into trouble.
(If you've done this you can mkdir the missing tablespace directory to get PostgreSQL to start again, then DROP the missing databases, tables etc. It's better to just not do it.)

Or you could create a TABLESPACE in a ramfs / tempfs and create all your objects there.
I recently was pointed to an article about doing exactly that on Linux. The original link is dead. But it was archived (provided by Arsinclair):
https://web.archive.org/web/20160319031016/http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/
Warning
This can endanger the integrity of your whole database cluster.
Read the added warning in the manual.
So this is only an option for expendable data.
For unit-testing it should work just fine. If you are running other databases on the same machine, be sure to use a separate database cluster (which has its own port) to be safe.

This is not possible with Postgres. It does not offer an in-process/in-memory engine like HSQLDB or MySQL.
If you want to create a self-contained environment you can put the Postgres binaries into SVN (but it's more than just a single executable).
You will need to run initdb to setup your test database before you can do anything with this. This can be done from a batch file or by using Runtime.exec(). But note that initdb is not something that is fast. You will definitely not want to run that for each test. You might get away running this before your test-suite though.
However while this can be done, I'd recommend to have a dedicated Postgres installation where you simply recreate your test database before running your tests.
You can re-create the test-database by using a template database which makes creating it quite fast (a lot faster than running initdb for each test run)

Now it is possible to run an in-memory instance of PostgreSQL in your JUnit tests via the Embedded PostgreSQL Component from OpenTable: https://github.com/opentable/otj-pg-embedded.
By adding the dependency to the otj-pg-embedded library (https://mvnrepository.com/artifact/com.opentable.components/otj-pg-embedded) you can start and stop your own instance of PostgreSQL in your #Before and #Afer hooks:
EmbeddedPostgres pg = EmbeddedPostgres.start();
They even offer a JUnit rule to automatically have JUnit starting and stopping your PostgreSQL database server for you:
#Rule
public SingleInstancePostgresRule pg = EmbeddedPostgresRules.singleInstance();

You could use TestContainers to spin up a PosgreSQL docker container for tests:
http://testcontainers.viewdocs.io/testcontainers-java/usage/database_containers/
TestContainers provide a JUnit #Rule/#ClassRule: this mode starts a database inside a container before your tests and tears it down afterwards.
Example:
public class SimplePostgreSQLTest {
#Rule
public PostgreSQLContainer postgres = new PostgreSQLContainer();
#Test
public void testSimple() throws SQLException {
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setJdbcUrl(postgres.getJdbcUrl());
hikariConfig.setUsername(postgres.getUsername());
hikariConfig.setPassword(postgres.getPassword());
HikariDataSource ds = new HikariDataSource(hikariConfig);
Statement statement = ds.getConnection().createStatement();
statement.execute("SELECT 1");
ResultSet resultSet = statement.getResultSet();
resultSet.next();
int resultSetInt = resultSet.getInt(1);
assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
}
}

If you are using NodeJS, you can use pg-mem (disclaimer: I'm the author) to emulate the most common features of a postgres db.
You will have a full in-memory, isolated, platform-agnostic database replicating PG behaviour (it even runs in browsers).
I wrote an article to show how to use it for your unit tests here.

There is now an in-memory version of PostgreSQL from Russian Search company named Yandex: https://github.com/yandex-qatools/postgresql-embedded
It's based on Flapdoodle OSS's embed process.
Example of using (from github page):
// starting Postgres
final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6);
// predefined data directory
// final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6, "/path/to/predefined/data/directory");
final String url = postgres.start("localhost", 5432, "dbName", "userName", "password");
// connecting to a running Postgres and feeding up the database
final Connection conn = DriverManager.getConnection(url);
conn.createStatement().execute("CREATE TABLE films (code char(5));");
I'm using it some time. It works well.
UPDATED: this project is not being actively maintained anymore
Please be adviced that the main maintainer of this project has successfuly
migrated to the use of Test Containers project. This is the best possible
alternative nowadays.

If you can use docker you can mount postgresql data directory in memory for testing
docker run --tmpfs=/data -e PGDATA=/data postgres

You can also use PostgreSQL configuration settings (such as those detailed in the question and accepted answer here) to achieve performance without necessarily resorting to an in-memory database.

If you're using java, there is a library I've seen effectively used that provides an in memory "embedded" postgres environment used mostly for unit tests.
https://github.com/opentable/otj-pg-embedded
This might be able to solve your use case if you've come to this search result looking for the answer.

If have full control over your environment, you arguably want to run postgreSQL on zfs.

App to monitor PostgreSQL queries in real time?

I'd like to monitor the queries getting sent to my database from an application. To that end, I've found pg_stat_activity, but more often then not, the rows which are returned read " in transaction". I'm either doing something wrong, am not fast enough to see the queries come through, am confused, or all of the above!
Can someone recommend the most idiot-proof way to monitor queries running against PostgreSQL? I'd prefer some sort of easy-to-use UI based solution (example: SQL Server's "Profiler"), but I'm not too choosy.

PgAdmin offers a pretty easy-to-use tool called server monitor
(Tools ->ServerStatus)

With PostgreSQL 8.4 or higher you can use the contrib module pg_stat_statements to gather query execution statistics of the database server.
Run the SQL script of this contrib module pg_stat_statements.sql (on ubuntu it can be found in /usr/share/postgresql/<version>/contrib) in your database and add this sample configuration to your postgresql.conf (requires re-start):
custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top # top,all,none
pg_stat_statements.save = off
To see what queries are executed in real time you might want to just configure the server log to show all queries or queries with a minimum execution time. To do so set the logging configuration parameters log_statement and log_min_duration_statement in your postgresql.conf accordingly.

pg_activity is what we use.
https://github.com/dalibo/pg_activity
It's a great tool with a top-like interface.
You can install and run it on Ubuntu 21.10 with:
sudo apt install pg-activity
pg_activity

If you are using Docker Compose, you can add this line to your docker-compose.yaml file:
command: ["postgres", "-c", "log_statement=all"]
now you can see postgres query logs in docker-compose logs with
docker-compose logs -f
or if you want to see only postgres logs
docker-compose logs -f [postgres-service-name]
https://stackoverflow.com/a/58806511/10053470

I haven't tried it myself unfortunately, but I think that pgFouine can show you some statistics.
Although, it seems it does not show you queries in real time, but rather generates a report of queries afterwards, perhaps it still satisfies your demand?
You can take a look at
http://pgfouine.projects.postgresql.org/

Running PostgreSQL in memory only

I want to run a small PostgreSQL database which runs in memory only, for each unit test I write. For instance:
#Before
void setUp() {
String port = runPostgresOnRandomPort();
connectTo("postgres://localhost:"+port+"/in_memory_db");
// ...
}
Ideally I'll have a single postgres executable checked into the version control, which the unit test will use.
Something like HSQL, but for postgres. How can I do that?
Were can I get such a Postgres version? How can I instruct it not to use the disk?

Or you could create a TABLESPACE in a ramfs / tempfs and create all your objects there.
I recently was pointed to an article about doing exactly that on Linux. The original link is dead. But it was archived (provided by Arsinclair):
https://web.archive.org/web/20160319031016/http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/
Warning
This can endanger the integrity of your whole database cluster.
Read the added warning in the manual.
So this is only an option for expendable data.
For unit-testing it should work just fine. If you are running other databases on the same machine, be sure to use a separate database cluster (which has its own port) to be safe.

This is not possible with Postgres. It does not offer an in-process/in-memory engine like HSQLDB or MySQL.
If you want to create a self-contained environment you can put the Postgres binaries into SVN (but it's more than just a single executable).
You will need to run initdb to setup your test database before you can do anything with this. This can be done from a batch file or by using Runtime.exec(). But note that initdb is not something that is fast. You will definitely not want to run that for each test. You might get away running this before your test-suite though.
However while this can be done, I'd recommend to have a dedicated Postgres installation where you simply recreate your test database before running your tests.
You can re-create the test-database by using a template database which makes creating it quite fast (a lot faster than running initdb for each test run)

Now it is possible to run an in-memory instance of PostgreSQL in your JUnit tests via the Embedded PostgreSQL Component from OpenTable: https://github.com/opentable/otj-pg-embedded.
By adding the dependency to the otj-pg-embedded library (https://mvnrepository.com/artifact/com.opentable.components/otj-pg-embedded) you can start and stop your own instance of PostgreSQL in your #Before and #Afer hooks:
EmbeddedPostgres pg = EmbeddedPostgres.start();
They even offer a JUnit rule to automatically have JUnit starting and stopping your PostgreSQL database server for you:
#Rule
public SingleInstancePostgresRule pg = EmbeddedPostgresRules.singleInstance();

You could use TestContainers to spin up a PosgreSQL docker container for tests:
http://testcontainers.viewdocs.io/testcontainers-java/usage/database_containers/
TestContainers provide a JUnit #Rule/#ClassRule: this mode starts a database inside a container before your tests and tears it down afterwards.
Example:
public class SimplePostgreSQLTest {
#Rule
public PostgreSQLContainer postgres = new PostgreSQLContainer();
#Test
public void testSimple() throws SQLException {
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setJdbcUrl(postgres.getJdbcUrl());
hikariConfig.setUsername(postgres.getUsername());
hikariConfig.setPassword(postgres.getPassword());
HikariDataSource ds = new HikariDataSource(hikariConfig);
Statement statement = ds.getConnection().createStatement();
statement.execute("SELECT 1");
ResultSet resultSet = statement.getResultSet();
resultSet.next();
int resultSetInt = resultSet.getInt(1);
assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
}
}

If you are using NodeJS, you can use pg-mem (disclaimer: I'm the author) to emulate the most common features of a postgres db.
You will have a full in-memory, isolated, platform-agnostic database replicating PG behaviour (it even runs in browsers).
I wrote an article to show how to use it for your unit tests here.

There is now an in-memory version of PostgreSQL from Russian Search company named Yandex: https://github.com/yandex-qatools/postgresql-embedded
It's based on Flapdoodle OSS's embed process.
Example of using (from github page):
// starting Postgres
final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6);
// predefined data directory
// final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6, "/path/to/predefined/data/directory");
final String url = postgres.start("localhost", 5432, "dbName", "userName", "password");
// connecting to a running Postgres and feeding up the database
final Connection conn = DriverManager.getConnection(url);
conn.createStatement().execute("CREATE TABLE films (code char(5));");
I'm using it some time. It works well.
UPDATED: this project is not being actively maintained anymore
Please be adviced that the main maintainer of this project has successfuly
migrated to the use of Test Containers project. This is the best possible
alternative nowadays.

If you can use docker you can mount postgresql data directory in memory for testing
docker run --tmpfs=/data -e PGDATA=/data postgres

You can also use PostgreSQL configuration settings (such as those detailed in the question and accepted answer here) to achieve performance without necessarily resorting to an in-memory database.

If you're using java, there is a library I've seen effectively used that provides an in memory "embedded" postgres environment used mostly for unit tests.
https://github.com/opentable/otj-pg-embedded
This might be able to solve your use case if you've come to this search result looking for the answer.

If have full control over your environment, you arguably want to run postgreSQL on zfs.

See and clear Postgres caches/buffers?

Sometimes I run a Postgres query and it takes 30 seconds. Then, I immediately run the same query and it takes 2 seconds. It appears that Postgres has some sort of caching. Can I somehow see what that cache is holding? Can I force all caches to be cleared for tuning purposes?
I'm basically looking for a Postgres version of the following SQL Server command:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
But I would also like to know how to see what is actually contained in that buffer.

You can see what's in the PostgreSQL buffer cache using the pg_buffercache module. I've done a presentation called "Inside the PostgreSQL Buffer Cache" that explains what you're seeing, and I show some more complicated queries to help interpret that information that go along with that.
It's also possible to look at the operating system cache too on some systems, see [pg_osmem.py] for one somewhat rough example.
There's no way to clear the caches easily. On Linux you can stop the database server and use the drop_caches facility to clear the OS cache; be sure to heed the warning there to run sync first.

I haven't seen any commands to flush the caches in PostgreSQL. What you see is likely just normal index and data caches being read from disk and held in memory. by both postgresql and the caches in the OS. To get rid of all that, the only way I know of:
What you should do is:
Shutdown the database server (pg_ctl, sudo service postgresql stop, sudo systemctl stop postgresql, etc.)
echo 3 > /proc/sys/vm/drop_caches
This will clear out the OS file/block caches - very important though I don't know how to do that on other OSs. (In case of permission denied, try sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" as in that question)
Start the database server (e.g. sudo service postgresql start, sudo systemctl start postgresql)

Greg Smith's answer about drop_caches was very helpful. I did find it necessary to stop and start the postgresql service, in addition to dropping the caches. Here's a shell script that does the trick. (My environment is Ubuntu 14.04 and PostgreSQL 9.3.)
#!/usr/bin/sudo bash
service postgresql stop
sync
echo 3 > /proc/sys/vm/drop_caches
service postgresql start
I tested with a query that took 19 seconds the first time, and less than 2 seconds on subsequent attempts. After running this script, the query once again took 19 seconds.

I use this command on my linux box:
sync; /etc/init.d/postgresql-9.0 stop; echo 1 > /proc/sys/vm/drop_caches; /etc/init.d/postgresql-9.0 start
It completely gets rid of the cache.

I had this error.
psql:/cygdrive/e/test_insertion.sql:9: ERROR: type of parameter 53
(t_stat_gardien) does not match that when preparing the plan
(t_stat_avant)
I was looking for flushing the current plan and a found this:
DISCARD PLANS
I had this between my inserts and it solves my problem.

Yes, it is possible to clear both the shared buffers postgres cache AND the OS cache. Solution bellow is for Windows... others have already given the linux solution.
As many people already said, to clear the shared buffers you can just restart Postgres (no need to restart the server). But just doing this won't clear the OS cache.
To clear the OS cache used by Postgres, after stopping the service, use the excelent RamMap (https://technet.microsoft.com/en-us/sysinternals/rammap), from the excelent Sysinternals Suite.
Once you execute RamMap, just click "Empty"->"Empty Standby List" in the main menu.
Restart Postgres and you'll see now your next query will be damm slow due to no cache at all.
You can also execute the RamMap without closing Postgres, and probably will have the "no cache" results you want, since as people already said, shared buffers usually gives little impact compared to the OS cache. But for a reliable test, I would rather stop postgres as all before clearing the OS cache to make sure.
Note: AFAIK, I don't recommend clearing the other things besides "Standby list" when using RamMap, because the other data is somehow being used, and you can potentially cause problems/loose data if you do that. Remember that you are clearing memory not only used by postgres files, but any other app and OS as well.
Regards, Thiago L.

Yes, postgresql certainly has caching. The size is controlled by the setting shared_buffers. Other than that, there is as the previous answer mentions, the OS file cache which is also used.
If you want to look at what's in the cache, there is a contrib module called pg_buffercache available (in contrib/ in the source tree, in the contrib RPM, or wherever is appropriate for how you installed it). How to use it is listed in the standard PostgreSQL documentation.
There are no ways to clear out the buffer cache, other than to restart the server. You can drop the OS cache with the command mentioned in the other answer - provided your OS is Linux.

There is pg_buffercache module to look into shared_buffers cache. And at some point I needed to drop cache to make some performance tests on 'cold' cache so I wrote an pg_dropcache extension that does exactly this. Please check it out.

this is my shortcut
echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_caches; rcpostgresql stop; rcpostgresql start;

If you have a dedicated test database, you can set the parameter: shared buffers to 16. That should disable the cache for all queries.

The original heading was "See and Clear" buffers.
Postgres 13 with pg_buffercache extension provides a way to see doc page

On OSX there is a purge command for that:
sync && sudo purge
sync - force completion of pending disk writes (flush cache)
purge - force disk cache to be purged (flushed and emptied)
Credit goes to kenorb answering echo 3 > /proc/sys/vm/drop_caches on Mac OSX

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse