PostgreSQL in-memory [duplicate]

PostgreSQL in-memory [duplicate] - postgresql

I want to run a small PostgreSQL database which runs in memory only, for each unit test I write. For instance:
#Before
void setUp() {
String port = runPostgresOnRandomPort();
connectTo("postgres://localhost:"+port+"/in_memory_db");
// ...
}
Ideally I'll have a single postgres executable checked into the version control, which the unit test will use.
Something like HSQL, but for postgres. How can I do that?
Were can I get such a Postgres version? How can I instruct it not to use the disk?

(Moving my answer from Using in-memory PostgreSQL and generalizing it):
You can't run Pg in-process, in-memory
I can't figure out how to run in-memory Postgres database for testing. Is it possible?
No, it is not possible. PostgreSQL is implemented in C and compiled to platform code. Unlike H2 or Derby you can't just load the jar and fire it up as a throwaway in-memory DB.
Its storage is filesystem based, and it doesn't have any built-in storage abstraction that would allow you to use a purely in-memory datastore. You can point it at a ramdisk, tempfs, or other ephemeral file system storage though.
Unlike SQLite, which is also written in C and compiled to platform code, PostgreSQL can't be loaded in-process either. It requires multiple processes (one per connection) because it's a multiprocessing, not a multithreading, architecture. The multiprocessing requirement means you must launch the postmaster as a standalone process.
Use throwaway containers
Since I originally wrote this the use of containers has become widespread, well understood and easy.
It should be a no-brainer to just configure a throw-away postgres instance in a Docker container for your test uses, then tear it down at the end. You can speed it up with hacks like LD_PRELOADing libeatmydata to disable that pesky "don't corrupt my data horribly on crash" feature ;).
There are a lot of wrappers to automate this for you for any test suite and language or toolchain you would like.
Alternative: preconfigure a connection
(Written before easy containerization; no longer recommended)
I suggest simply writing your tests to expect a particular hostname/username/password to work, and having the test harness CREATE DATABASE a throwaway database, then DROP DATABASE at the end of the run. Get the database connection details from a properties file, build target properties, environment variable, etc.
It's safe to use an existing PostgreSQL instance you already have databases you care about in, so long as the user you supply to your unit tests is not a superuser, only a user with CREATEDB rights. At worst you'll create performance issues in the other databases. I prefer to run a completely isolated PostgreSQL install for testing for that reason.
Instead: Launch a throwaway PostgreSQL instance for testing
Alternately, if you're really keen you could have your test harness locate the initdb and postgres binaries, run initdb to create a database, modify pg_hba.conf to trust, run postgres to start it on a random port, create a user, create a DB, and run the tests. You could even bundle the PostgreSQL binaries for multiple architectures in a jar and unpack the ones for the current architecture to a temporary directory before running the tests.
Personally I think that's a major pain that should be avoided; it's way easier to just have a test DB configured. However, it's become a little easier with the advent of include_dir support in postgresql.conf; now you can just append one line, then write a generated config file for all the rest.
Faster testing with PostgreSQL
For more information about how to safely improve the performance of PostgreSQL for testing purposes, see a detailed answer I wrote on this topic earlier: Optimise PostgreSQL for fast testing
H2's PostgreSQL dialect is not a true substitute
Some people instead use the H2 database in PostgreSQL dialect mode to run tests. I think that's almost as bad as the Rails people using SQLite for testing and PostgreSQL for production deployment.
H2 supports some PostgreSQL extensions and emulates the PostgreSQL dialect. However, it's just that - an emulation. You'll find areas where H2 accepts a query but PostgreSQL doesn't, where behaviour differs, etc. You'll also find plenty of places where PostgreSQL supports doing something that H2 just can't - like window functions, at the time of writing.
If you understand the limitations of this approach and your database access is simple, H2 might be OK. But in that case you're probably a better candidate for an ORM that abstracts the database because you're not using its interesting features anyway - and in that case, you don't have to care about database compatibility as much anymore.
Tablespaces are not the answer!
Do not use a tablespace to create an "in-memory" database. Not only is it unnecessary as it won't help performance significantly anyway, but it's also a great way to disrupt access to any other you might care about in the same PostgreSQL install. The 9.4 documentation now contains the following warning:
WARNING
Even though located outside the main PostgreSQL data directory,
tablespaces are an integral part of the database cluster and cannot be
treated as an autonomous collection of data files. They are dependent
on metadata contained in the main data directory, and therefore cannot
be attached to a different database cluster or backed up individually.
Similarly, if you lose a tablespace (file deletion, disk failure,
etc), the database cluster might become unreadable or unable to start.
Placing a tablespace on a temporary file system like a ramdisk risks
the reliability of the entire cluster.
because I noticed too many people were doing this and running into trouble.
(If you've done this you can mkdir the missing tablespace directory to get PostgreSQL to start again, then DROP the missing databases, tables etc. It's better to just not do it.)

Or you could create a TABLESPACE in a ramfs / tempfs and create all your objects there.
I recently was pointed to an article about doing exactly that on Linux. The original link is dead. But it was archived (provided by Arsinclair):
https://web.archive.org/web/20160319031016/http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/
Warning
This can endanger the integrity of your whole database cluster.
Read the added warning in the manual.
So this is only an option for expendable data.
For unit-testing it should work just fine. If you are running other databases on the same machine, be sure to use a separate database cluster (which has its own port) to be safe.

This is not possible with Postgres. It does not offer an in-process/in-memory engine like HSQLDB or MySQL.
If you want to create a self-contained environment you can put the Postgres binaries into SVN (but it's more than just a single executable).
You will need to run initdb to setup your test database before you can do anything with this. This can be done from a batch file or by using Runtime.exec(). But note that initdb is not something that is fast. You will definitely not want to run that for each test. You might get away running this before your test-suite though.
However while this can be done, I'd recommend to have a dedicated Postgres installation where you simply recreate your test database before running your tests.
You can re-create the test-database by using a template database which makes creating it quite fast (a lot faster than running initdb for each test run)

Now it is possible to run an in-memory instance of PostgreSQL in your JUnit tests via the Embedded PostgreSQL Component from OpenTable: https://github.com/opentable/otj-pg-embedded.
By adding the dependency to the otj-pg-embedded library (https://mvnrepository.com/artifact/com.opentable.components/otj-pg-embedded) you can start and stop your own instance of PostgreSQL in your #Before and #Afer hooks:
EmbeddedPostgres pg = EmbeddedPostgres.start();
They even offer a JUnit rule to automatically have JUnit starting and stopping your PostgreSQL database server for you:
#Rule
public SingleInstancePostgresRule pg = EmbeddedPostgresRules.singleInstance();

You could use TestContainers to spin up a PosgreSQL docker container for tests:
http://testcontainers.viewdocs.io/testcontainers-java/usage/database_containers/
TestContainers provide a JUnit #Rule/#ClassRule: this mode starts a database inside a container before your tests and tears it down afterwards.
Example:
public class SimplePostgreSQLTest {
#Rule
public PostgreSQLContainer postgres = new PostgreSQLContainer();
#Test
public void testSimple() throws SQLException {
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setJdbcUrl(postgres.getJdbcUrl());
hikariConfig.setUsername(postgres.getUsername());
hikariConfig.setPassword(postgres.getPassword());
HikariDataSource ds = new HikariDataSource(hikariConfig);
Statement statement = ds.getConnection().createStatement();
statement.execute("SELECT 1");
ResultSet resultSet = statement.getResultSet();
resultSet.next();
int resultSetInt = resultSet.getInt(1);
assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
}
}

If you are using NodeJS, you can use pg-mem (disclaimer: I'm the author) to emulate the most common features of a postgres db.
You will have a full in-memory, isolated, platform-agnostic database replicating PG behaviour (it even runs in browsers).
I wrote an article to show how to use it for your unit tests here.

There is now an in-memory version of PostgreSQL from Russian Search company named Yandex: https://github.com/yandex-qatools/postgresql-embedded
It's based on Flapdoodle OSS's embed process.
Example of using (from github page):
// starting Postgres
final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6);
// predefined data directory
// final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6, "/path/to/predefined/data/directory");
final String url = postgres.start("localhost", 5432, "dbName", "userName", "password");
// connecting to a running Postgres and feeding up the database
final Connection conn = DriverManager.getConnection(url);
conn.createStatement().execute("CREATE TABLE films (code char(5));");
I'm using it some time. It works well.
UPDATED: this project is not being actively maintained anymore
Please be adviced that the main maintainer of this project has successfuly
migrated to the use of Test Containers project. This is the best possible
alternative nowadays.

If you can use docker you can mount postgresql data directory in memory for testing
docker run --tmpfs=/data -e PGDATA=/data postgres

You can also use PostgreSQL configuration settings (such as those detailed in the question and accepted answer here) to achieve performance without necessarily resorting to an in-memory database.

If you're using java, there is a library I've seen effectively used that provides an in memory "embedded" postgres environment used mostly for unit tests.
https://github.com/opentable/otj-pg-embedded
This might be able to solve your use case if you've come to this search result looking for the answer.

If have full control over your environment, you arguably want to run postgreSQL on zfs.

Related

Gcloud SQL upgrade postgres 9.6 to 11

I'd like to be able to upgrade my existing cloudsql postgres 9.6 instance to 11 to use some new pg 11 features.
I've been trying to figure out a good migration plan but it seems like the only option available is sql dump and restore. The database is 100Gig+ so this will take quite some time, and I'd like to avoid downtime as much as possible. Are there any options available? I was considering enabling statement logging: log_statement=mod, creating a dump, importing it into a pg-11 instance taking down the db + then scraping the logs to reply the latest updates into the pg-11 instance by downloading the logs and writing a script to re-run the inserts. Seems doable, but doesn't feel nice.
I am wondering if anyone faced this before and has had any other solutions?

Postgres 11 on Cloud SQL is still in Beta. It is not recommended to be using a product that is in Beta on a production environment.
However, should you choose to proceed, you must export the data by either creating a SQL dump or putting the data into a .csv file (depending on your needs)(best practices) create a Postgres 11 instance, and then import the data.
For the data that won’t be in the dump, you can either:
a) Do what you have suggested by logging the queries and then re-run the inserts
b) Create a dump, import it onto the new instance make it live and then take another dump of the old one again, compare to remove duplicates and import the differences. This will be difficult if you have auto-incrementing primary keys.
c) Create the schema on the Postgres 11 instance and deploy it. Then create the dump and import at a later time. If you have primary keys as auto incrementing, alter the schema to start at a value that you would like.

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.

Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.

http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...

PostgreSQL turn off durabilty

I want to make a script that will run postgres in-memory without durability.
I read this page: http://www.postgresql.org/docs/9.1/static/non-durability.html
But I didn't understand how I can set this parameters in script. Could you please, help me?
Thanks for help!

Most of those parameters, like fsync, can only be set in postgresql.conf. Changes are applied by re-starting PostgreSQL. They apply to the whole database cluster - all the databases in that PostgreSQL install. That's because the databases all share a single postmaster, write-ahead log, and set of shared system tables.
The only parameter listed there that you can set at the SQL level in a script is synchronous_commit. By setting synchronous_commit = 'off' you can say "it's OK to lose this transaction if the database crashes in the next few seconds, just make sure it still applies atomically".
I wrote more on this topic in a previous answer, Optimise PostgreSQL for fast testing.
If you want to set the other params with a script you can do so but you have to do it by opening and modifying postgresql.conf using the script, then re-starting PostgreSQL. Text-processing tools like sed make this kind of job easier.

If you're running a debian based linux distro, you can just do something like:
pg_createcluster -d /dev/shm/mypgcluster 8.4 ramcluster
to create a ram based cluster. Note that you'll have to do:
pg_drop cluster 8.4 ramcluster
and recreate it on reboot etc.

Running PostgreSQL in memory only

I want to run a small PostgreSQL database which runs in memory only, for each unit test I write. For instance:
#Before
void setUp() {
String port = runPostgresOnRandomPort();
connectTo("postgres://localhost:"+port+"/in_memory_db");
// ...
}
Ideally I'll have a single postgres executable checked into the version control, which the unit test will use.
Something like HSQL, but for postgres. How can I do that?
Were can I get such a Postgres version? How can I instruct it not to use the disk?

Or you could create a TABLESPACE in a ramfs / tempfs and create all your objects there.
I recently was pointed to an article about doing exactly that on Linux. The original link is dead. But it was archived (provided by Arsinclair):
https://web.archive.org/web/20160319031016/http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/
Warning
This can endanger the integrity of your whole database cluster.
Read the added warning in the manual.
So this is only an option for expendable data.
For unit-testing it should work just fine. If you are running other databases on the same machine, be sure to use a separate database cluster (which has its own port) to be safe.

This is not possible with Postgres. It does not offer an in-process/in-memory engine like HSQLDB or MySQL.
If you want to create a self-contained environment you can put the Postgres binaries into SVN (but it's more than just a single executable).
You will need to run initdb to setup your test database before you can do anything with this. This can be done from a batch file or by using Runtime.exec(). But note that initdb is not something that is fast. You will definitely not want to run that for each test. You might get away running this before your test-suite though.
However while this can be done, I'd recommend to have a dedicated Postgres installation where you simply recreate your test database before running your tests.
You can re-create the test-database by using a template database which makes creating it quite fast (a lot faster than running initdb for each test run)

Now it is possible to run an in-memory instance of PostgreSQL in your JUnit tests via the Embedded PostgreSQL Component from OpenTable: https://github.com/opentable/otj-pg-embedded.
By adding the dependency to the otj-pg-embedded library (https://mvnrepository.com/artifact/com.opentable.components/otj-pg-embedded) you can start and stop your own instance of PostgreSQL in your #Before and #Afer hooks:
EmbeddedPostgres pg = EmbeddedPostgres.start();
They even offer a JUnit rule to automatically have JUnit starting and stopping your PostgreSQL database server for you:
#Rule
public SingleInstancePostgresRule pg = EmbeddedPostgresRules.singleInstance();

You could use TestContainers to spin up a PosgreSQL docker container for tests:
http://testcontainers.viewdocs.io/testcontainers-java/usage/database_containers/
TestContainers provide a JUnit #Rule/#ClassRule: this mode starts a database inside a container before your tests and tears it down afterwards.
Example:
public class SimplePostgreSQLTest {
#Rule
public PostgreSQLContainer postgres = new PostgreSQLContainer();
#Test
public void testSimple() throws SQLException {
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setJdbcUrl(postgres.getJdbcUrl());
hikariConfig.setUsername(postgres.getUsername());
hikariConfig.setPassword(postgres.getPassword());
HikariDataSource ds = new HikariDataSource(hikariConfig);
Statement statement = ds.getConnection().createStatement();
statement.execute("SELECT 1");
ResultSet resultSet = statement.getResultSet();
resultSet.next();
int resultSetInt = resultSet.getInt(1);
assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
}
}

If you are using NodeJS, you can use pg-mem (disclaimer: I'm the author) to emulate the most common features of a postgres db.
You will have a full in-memory, isolated, platform-agnostic database replicating PG behaviour (it even runs in browsers).
I wrote an article to show how to use it for your unit tests here.

There is now an in-memory version of PostgreSQL from Russian Search company named Yandex: https://github.com/yandex-qatools/postgresql-embedded
It's based on Flapdoodle OSS's embed process.
Example of using (from github page):
// starting Postgres
final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6);
// predefined data directory
// final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6, "/path/to/predefined/data/directory");
final String url = postgres.start("localhost", 5432, "dbName", "userName", "password");
// connecting to a running Postgres and feeding up the database
final Connection conn = DriverManager.getConnection(url);
conn.createStatement().execute("CREATE TABLE films (code char(5));");
I'm using it some time. It works well.
UPDATED: this project is not being actively maintained anymore
Please be adviced that the main maintainer of this project has successfuly
migrated to the use of Test Containers project. This is the best possible
alternative nowadays.

If you can use docker you can mount postgresql data directory in memory for testing
docker run --tmpfs=/data -e PGDATA=/data postgres

You can also use PostgreSQL configuration settings (such as those detailed in the question and accepted answer here) to achieve performance without necessarily resorting to an in-memory database.

If you're using java, there is a library I've seen effectively used that provides an in memory "embedded" postgres environment used mostly for unit tests.
https://github.com/opentable/otj-pg-embedded
This might be able to solve your use case if you've come to this search result looking for the answer.

If have full control over your environment, you arguably want to run postgreSQL on zfs.

Can PostgreSQL be used with an on-disk database?

Currently, I have an application that uses Firebird in embedded mode to connect to a relatively simple database stored as a file on my hard drive. I want to switch to using PostgreSQL to do the same thing (Yes, I know it's overkill). I know that PostgreSQL cannot operate in embedded mode and that is fine - I can leave the server process running and that's OK with me.
I'm trying to figure out a connection string that will achieve this, but have been unsuccessful. I've tried variations on the following:
jdbc:postgresql:C:\myDB.fdb
jdbc:postgresql://C:\myDB.fdb
jdbc:postgresql://localhost:[port]/C:\myDB.fdb
but nothing seems to work. PostgreSQL's directions don't include an example for this case. Is this even possible?

You can trick it. If you are running PostGRESQL on a UNIXlike system, then you should be able to create a RAMDISK and use that for the database storage. Here's a pretty good step by step guide for RAMdisks on Linux.
In general though, I would suggest using SQLITE for an SQL db in RAM type of application.

Postgres databases are not a single file. There will be one file for each table and each index in the data directory, inside a directory for the database. All files will be named with the object ID (OID) of db / table / index.
The JDBC urls point to the database name, not any specific file:
jdbc:postgresql:foodb (localhost is implied)
If by "disk that behaves like memory", you mean that the db only exists for the lifetime of your program, there's no reason why you can't create a db at program start and drop it at program exit. Note that this is just DDL to create the DB, not creating the data dir via the init-db program. You could connect to the default 'postgres' db, create your db then connect to it.

Firebird 2.1 onwards supports global temporary tables, which only exist for the duration of the database connection.
Syntax goes something like CREATE GLOBAL TEMPORARY TABLE ... ON COMMIT PRESERVE ROWS

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse