pg_dump does not full copy of db - postgresql

I have a database db1 which works very slow. It executes hard queries with joins few minutes. I created a copy of that database: db2. Then I created and tuned some indexes and that db2 began to work much faster! After that my goal was to tune db1 working fast as db2. I made copy of db2 (pg_dump -Fc db2 > db2.dump) and restore it to new db1. Then I tested it's speed. But the speed of work not the same! The new db1 much slower (as it was before). What's the problem here? Does pg_dump dumps everything inside db? Data, indexes and so on? Please advice.

After loading,the statictics tables are still empty (they are not part of the dump). You need to run analyze manually.as explained in the postgres manual

Yes pg_dump dump all object definitions, table, view, indexes, type and do on. You have reset the cache and destroy all staistics, so when running the same query in a fresh restore you may encounter bad query times, you can do an ANALYZE to refresh all the stats, but consider that you have now in your cluster db2 + db1 instead of only db1 as before, so the cache is less available for db1, you have to destroy db2 before doing your analyze.

Related

I have loaded wrong psql dump into my database, anyway to revert?

Ok, I screwed up.
I dumped one of my psql (9.6.18) staging database with the following command
pg_dump -U postgres -d <dbname> > db.out
And after doing some testing, I "restored" the data using the following command.
psql -f db.out postgres
Notice the absence of -d option? yup. And that was supposed to be the username.
Annnd as the database happend to have the same name as its user, it overwrote the 'default' database (postgres), which had data that other QAs are using.
I cancelled the operation quickly as soon as I realised my mistake, but the damage was still done. Around 1/3 ~ 1/2 of the database is roughly identical to the staging database - at least in terms of the schema.
Is there any way to revert this? I am still looking for any other dumps if any of these guys made one. But I don't think there is any past two to three months. Seems like I got no choice but to own up and apologise to them in the morning.
Without a recent dump or some sort of PITR replication setup, you can't un-revert this easily. The only option is to manually go through the log of what was restored and remove/alter it in the postgres database. This will work for the schema, the data is another matter. FYI, the postgres database should not really be used as a 'working' database. It is there to be a database to connect to for doing other operations, such as CREATE DATABASE or to bootstrap your way into a cluster. If left empty then the above would not have been a problem. You could have done, from another database, DROP DATABASE postgres; and then CREATE DATABASE postgres.
Do you have a capture of the output of the psql -f db.out postgres run?
Since the pg_dump didn't specify --clean or -c, it should not have overwritten anything, just appended. And if your tables have unique or primary keys, most of the data copy operations should have failed with unique key violations and rolled back. Even one overlapping row (per table) would roll back the entire dataset for that table.
Without having the output, it will be hard to figure out what damage has actually been done.
You should also immediately copy the pg_xlog data someplace safe. If it comes down to it, you might be able to use pg_xlogdump to figure out what changes committed and what did not.

Backup specific tables in AWS RDS Postgres Instance

I have two databases on Amazon RDS, both Postgres. Database 1 and 2
I need to restore an instance from a snapshot of Database 1 for my Staging environment. (Database 2 is my current Staging DB).
However, I want the data from a few of the tables in Database 2 to overwrite the tables in the newly restored snapshot. What is the best way to do this?
When restoring RDS from a Snapshot, a new database instance is created. If you only wish to copy a portion of the snapshot:
Restore the snapshot to a new (temporary) database
Connect to the new database and dump the desired tables using pg_dump
Connect to your staging server and restore the tables using pg_restore (most probably deleting any matching existing tables first)
Delete the temporary database
pg_dump actually outputs SQL commands that are then used to recreate tables and restore data. Look at the content of a dump to understand how the restore process actually works.
I hope this still works for someone else.
With my team we faced a similar issue. We also had 2 Postgres databases and we also just needed to backup some tables from db1 to db2.
What we did is to use a lambda function using Python (from AWS lambda ofc) that connected to both databases and validates if db1.table1 has the same data as db2.table1, if not, then the lambda function should write the missing data from db1.table1 into db2.table1. The approach of using lambda was because we wanted to automate the process due to the main db (let's say db1) is constantly being updated. In addition, it allowed us to only backup our desired tables (let's say 3 tables out of 10), instead of backing up the whole database.
Note: Maybe you want to do these writes using temporary tables to avoid issues with any constraints you have in your tables.

How to restore/rewind my PostgreSQL database

We do nightly full backups of our db and I then use that dump to create my own dev-db. The creation of the dev-db takes roughly 10 minutes so its scheduled every morning by cron before I get to work. So I can now work with an almost live db.
But when I'm testing things it would sometimes be convenient to rollback the full db or just some specific tables to the initial backup. Of course I could do the full recreation of the dev-db but that would make me wait for another 10 minutes before I could run the tests again.
So is there an easy way to restore/rewind the database/table to a specific point in time or from a dump?
I have tried to use pg_restore like this to restore specific tables:
pg_restore -d my-dev-db -n stuff -t tableA -t tableB latest-live-db.dump
I have tried with options like -cand --data-only also. But there seems to be several issues here that I did not foresee:
The old data is not automatically removed when the restored data is copied back.
There is several foreign-key constraints that makes this impossible (correct me if I'm wrong) without explicitly removing the FK before the restore and then adding them back again.
PK-sequences that gets out of order does not concern me at all at this point but that might be an issue as well.
Edit: more things I tested/looked into:
pg_basebackup
A more brute force alternative to pg_basebackup is to stop the db-server, copy the db-files, then start the db-server.
Both of the alternatives above fail because I have several local databases running in the same cluster and that sums up to a lot of data on disk. There is no way to separate the databases this way! So the file copy action here will not give me any speed gain.
I'm assuming you are asking about a database not a cluster. The first thing that comes to my mind is to restore the backup to 2 different dbs, one with the dev_db name and the other with another name like dev_db_back. Then when you need a fresh db drop dev_db and rename dev_db_backup to dev_db with
drop database if exists dev_db;
alter database dev_db_backup rename to dev_db;
After that, to have another source to rename from, restore the backup to dev_db_backup again. This could be done by a script so the dropping, renaming and restoring would be automated. As dropping and renaming are instantaneous just start the script and the renaming is done without a need to wait for the new restore.
If it is common to need repeated restores in less 10 minutes intervals I think you can try to do what you are doing inside a transaction:
begin;
-- alter the db
-- test the alterations
commit; -- or ...
-- rollback;

what will happen if we restore postgresql dump file on existing database?

What will happen if we restore a pgdump file of earlier time on a running db?
I have restored an older sql file over existing database does it harm to DB and its functinality ?
In general, yes, it'll screw up the database. Rows that were deleted in the past will be back. Sequences may be reset. Dropped tables can be re-created. All sorts of things.
Without more details, particularly the command used when restoring the dump and the nature of the dump, it's hard to be sure in this specific case.
If you restored with:
psql -1 -v ON_ERROR_STOP=1 -f the_dump.sql
then it's possibly you might not have any damage, or might only have to re-set some sequences.

Slow database restore from SQL file, considering binary replication to rollback tests database

When using Cucumber with Capybara, I have to load test database data from SQL data dump.
Unfortunately it takes 10s for each scenario, which slows down the tests.
I have found something like: http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#How_to_Replicate
Do you think binary replication will be quicker then using SQL files?
Is there anything I can do to make the restore quicker (I restore just data, not structure)?
What approaches would you recommend to try?
You could try to put your test data into a "template" database (e.g. mydb_template)
To prepare the test scenario, you simply drop your database using DROP DATABASE mydb and recreate based on the template: CREATE DATABASE mydb TEMPLATE = mydb_template;.
Of course you'll need to connect to e.g. template0 or the postgres database in order to be able to drop mydb.
I think this could be faster than importing a dump.
I recall a discussion on the PG mailing list regarding this approach and some performance problems with large "templates" that were fixed with 9.0.
(I restore just data, not structure)
COPY is always fastest for importing just data. Other answer deals with restoring a whole database.