How can I make and query read only snapshots in Postgres (or MySql)? - postgresql

I'd like to create a read-only snapshot of a database at the end of each day, and keep them around for a couple of months.
I'd then like to be able to run queries against a specific (named) snapshot.
Is this possible to achieve elegantly and with minimal resource usage (the database only changes very slowly, but has a few GBs of data - so almost all data is common to all snapshots).

The usual way to create a snapshot in PostgreSQL is to use pg_dump/pg_restore.
A much quicker method is to simply use CREATE DATABASE to clone your database.
CREATE DATABASE my_copy_db TEMPLATE my_production_db;
which will be much faster than a dump/restore. The only drawback to this solution is that the source database must not have any open connections.
The copy will not be read-only by default, but you could simply revoke the respective privileges from the users to ensure that

Related

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

How to save Postgres DB entire state and restore it for dev purpose?

I regularly explore different data models in dev, while I have one that is in prod that I should preserve.
Once I'm sure of the model I want, I have to craft a migration so that my production setup becomes this.
Unfortunatly, while I can easily git commit my data model definition and migrations, explore, then reset it as many time as I want, I don't know how to do that with postgres.
What I need is to say "my current schema, tables, functions, triggers and data are currently in a state I want to save". Then explore with it, destroy it, alter it. Then go back the way it was when I saved it.
Is there some kind of "save checkpoint" and "restore checkpoint" for the entire database ?
I know I at least 3 concepts that can be used for that : dumps, copying the data files and using the PITR, but I have no ideas how to use them efficiently for dev purpose to get something as easy and simple as a git checkout.
Using pgdumps will make me commit all stuff to git, which is not what I want. Or put things aside manually. And write all the procedure in a custom script. And wait for the dump/load. It's really far from a git checkout convenience.
Copying datafile needs the db to be restarted and takes twice the dataspace.
Using PITR seems very complicated.
If you want to reset your database for testing purposes and you do have a proper schema migration system in place, you can use Postgres' template system for this.
Create one database that is maintained through your schema migration and reflects the "current state".
If you want to run tests on that, create a new database using the "reference" database as the template. Note that the template can also contain data.
Then run your tests against that new database. To reset it, drop the database and re-create it from the template, e.g.:
create database base_template .... ;
Now populate base_template with everything you need (tables, views, functions, data, ...)
Then create a test database:
create database integration_test template = base_template ...;
Run your tests against the integration_test database. To reset it, simply drop and re-create it:
drop database integration_test;
create database integration_test template = base_template ...;
You just need to be careful that you run your schema migrations against the base_template database.
The only drawback is that you can't have any connections active to the base_template database when you create the clone.
Have you considered using something like pg_dump? (https://www.postgresql.org/docs/current/static/backup-dump.html)
You can probably create a bash script to dump your database, then read it back in once you are done experimenting (see the first link, plus ref for psql: https://www.postgresql.org/docs/current/static/app-psql.html)

Is it possible to archive WAL files for one PostgreSQL database within a single instance or must I create a second instance?

I run a couple of PostgreSQL databases (9.3), one of which does not need archiving the other of which I'd rather run in WAL archive mode by can get away with not.
I now have a need for a data which is archived.
As far as I can tell the setting is on an instance basis, so I wouldn't be able to just choose which databases to archive and which not, which would indicate that I will need to create a new PostgreSQL instance.
Am I missing something?
Also, FWIW, will I be able to create database links between databases on the two instances?
Thanks, --sw
You cannot to choose database for archiving - only all (or none) in PostgreSQL instance can be archived. There are not any pother possibility now.
You can send query to other PostgreSQL instance via dblink extension or with Foreign Data Wrappers API. FDW API should be preferred, although dblink has some usage still.

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

libpq code to create, list and delete databases (C++/VC++, PostgreSQL)

I am new to the PostgreSQL database. What my visual c++ application needs to do is to create multiple tables and add/retrieve data from them.
Each session of my application should create a new and distinct database. I can use the current date and time for a unique database name.
There should also be an option to delete all the databases.
I have worked out how to connect to a database, create tables, and add data to tables. I am not sure how to make a new database for each run or how to retrieve number and name of databases if user want to clear all databases.
Please help.
See the libpq examples in the documentation. The example program shows you how to list databases, and in general how to execute commands against the database. The example code there is trivial to adapt to creating and dropping databases.
Creating a database is a simple CREATE DATABASE SQL statement, same as any other libpq operation. You must connect to a temporary database (usually template1) to issue the CREATE DATABASE, then disconnect and make a new connection to the database you just created.
Rather than creating new databases, consider creating new schema instead. Much less hassle, since all you need to do is change the search_path or prefix your table references, you don't have to disconnect and reconnect to change schemas. See the documentation on schemas.
I question the wisdom of your design, though. It is rarely a good idea for applications to be creating and dropping databases (or tables, except temporary tables) as a normal part of their operation. Maybe if you elaborated on why you want to do this, we can come up with solutions that may be easier and/or perform better than your current approach.