Herkou pg_restore to a database that has changed - postgresql

I'm wondering what should I do if I'm using Heroku Postgres and I want to dump the data of an App 1.0, then I want to pg_restore the data to a new version of the app, App 2.0. The problem is that App 2.0 has new fields and tables and the pg_restore documentation writes:
... will issue the commands necessary to reconstruct the database to
the state it was in at the time it was saved.
I don't want to reconstruct the database to the state it was on App 1.0, I only want to get the data and put it on the new database, the tables and fields I added should not conflict with the data in the dump file.
One option would be to pg_restore and "reconstruct the database to the state it was in at the time it was saved" and then run the migrations again. Is it the best way to go? there might be a better way, thanks for your suggestions.

You can try a pg_dump --data-only which will skip the table creation and only dump the data rows. Then when you restore, your data will go into existing tables. So you'll need to make sure that they already exist in the new database. I'm not sure offhand what will happen if the table definitions are different.
Alternatively, you could do a pg_dump --table <table> for only the tables you want to keep.

Related

how to take backup of specific data from a table and dump it in existing database and table?

I am working on a PostgreSQL database and I need to take a backup of specific data from a table and restore it into the same or a different database. I am looking for a way to accomplish this task using command-line tools or scripts.
I have tried using the pg_dump and pg_restore command-line utilities, but they seem to only allow for backing up and restoring an entire database or table, rather than specific data within a table.
Is there a way to take a backup of specific data from a table in PostgreSQL and restore it into a database? If so, could you provide an example of the command or script that would accomplish this task?
Thanks in advance for your help!

How to save Postgres DB entire state and restore it for dev purpose?

I regularly explore different data models in dev, while I have one that is in prod that I should preserve.
Once I'm sure of the model I want, I have to craft a migration so that my production setup becomes this.
Unfortunatly, while I can easily git commit my data model definition and migrations, explore, then reset it as many time as I want, I don't know how to do that with postgres.
What I need is to say "my current schema, tables, functions, triggers and data are currently in a state I want to save". Then explore with it, destroy it, alter it. Then go back the way it was when I saved it.
Is there some kind of "save checkpoint" and "restore checkpoint" for the entire database ?
I know I at least 3 concepts that can be used for that : dumps, copying the data files and using the PITR, but I have no ideas how to use them efficiently for dev purpose to get something as easy and simple as a git checkout.
Using pgdumps will make me commit all stuff to git, which is not what I want. Or put things aside manually. And write all the procedure in a custom script. And wait for the dump/load. It's really far from a git checkout convenience.
Copying datafile needs the db to be restarted and takes twice the dataspace.
Using PITR seems very complicated.
If you want to reset your database for testing purposes and you do have a proper schema migration system in place, you can use Postgres' template system for this.
Create one database that is maintained through your schema migration and reflects the "current state".
If you want to run tests on that, create a new database using the "reference" database as the template. Note that the template can also contain data.
Then run your tests against that new database. To reset it, drop the database and re-create it from the template, e.g.:
create database base_template .... ;
Now populate base_template with everything you need (tables, views, functions, data, ...)
Then create a test database:
create database integration_test template = base_template ...;
Run your tests against the integration_test database. To reset it, simply drop and re-create it:
drop database integration_test;
create database integration_test template = base_template ...;
You just need to be careful that you run your schema migrations against the base_template database.
The only drawback is that you can't have any connections active to the base_template database when you create the clone.
Have you considered using something like pg_dump? (https://www.postgresql.org/docs/current/static/backup-dump.html)
You can probably create a bash script to dump your database, then read it back in once you are done experimenting (see the first link, plus ref for psql: https://www.postgresql.org/docs/current/static/app-psql.html)

SQLite to PostgreSQL data-only transfer (to maintain alembic functionality)

There are a few questions and answers already on PostgreSQL import (as well as the specific SQLite->PostgreSQL situation). This question is about a specific corner-case.
Background
I have an existing, in-production web-app written in python (pyramid) and using alembic for easy schema migration. Due to the database creaking with unexpectedly high write-load (probably due to the convoluted nature of my own code), I've decided to migrate to PostgreSQL.
Data migration
There are a few recommendations on data migration. The simplest one involved using
sqlite3 my.db .dump > sqlitedumpfile.sql
and then importing it with
psql -d newpostgresdb < sqlitedumpfile.sql
This required a bit of editing of sqlitedumpfile. In particular, removing some incompatible operations, changing values (sqlite represents booleans as 0/1) etc. It ended up being too complicated to do programmatically for my data, and too much work to handle manually (some tables had 20k rows or so).
A good tool for data migration which I eventually settled on was pgloader, which 'worked' immediately. However, as is typical for data migration of this sort, this exposed various data inconsistencies in my database which I had to solve at source before doing the migration (in particular, removing foreign keys to non-unique columns which seemed a good idea at the time for convenient joins and removing orphan rows which relied on rows in other tables which had been deleted). After these were solved, I could just do
pgloader my.db postgresql:///newpostgresdb
And get all my data appropriately.
The problem?
pgloader worked really well for data but not so well for the table structure itself. This resulted in three problems:-
I had to create a new alembic revision with a ton of changes (mostly datatype related, but also some related to problem 2).
Constraint/index names were unreliable (unique numeric names generated). There's actually an option to disable this, and this was a problem because I needed a reliable upgrade path which was replicable in production without me having to manually tweak the alembic code.
Sequences/autoincrement just failed for most primary keys. This broke my webapp as I was not able to add new rows for some (not all) databases.
In contrast, re-creating a blank database using alembic to maintain the schema works well without changing any of my webapps code. However pgloader defaults to over-riding existing tables, so this would leave me nowhere as the data is what really needs migrating.
How do I get proper data migration using a schema I've already defined (and which works)?
What eventually worked was, in summary:-
Create the appropriate database structure in postgresql://newpostgresdb (I just used alembic upgrade head for this)
Use pgloader to move data over from sqlite to a different database in postgresql. As mentioned in the question, some data inconsistencies need to be solved before this step, but that's not relevant to this question itself.
createdb tempdb
pgloader my.db postgresql:///tempdb
Dump the data in tempdb using pg_dump
pg_dump -a -d tempdb > dumped_postgres_database
Edit the resulting dump to accomplish the following:-
SET session_replication_role = replica because some of my rows are circular in reference to other rows in the same table
Delete the alembic_version table, as we're restarting a new branch for alembic.
Regenerate any sequences, with the equivalent of SELECT pg_catalog.setval('"table_colname_seq"', (select max(colname) from table));
Finally, psql can be used to load the data to your actual database
psql -d newpostgresdb < dumped_postgres_database

Create a dev database with same table structure as production in Postgres 9.3

I want to create a 'development' database for my web application.
I'm using Postgres 9.3, and I would like 'devdb' to have the exact table structure as
my production 'appdb'. I do not want them to share data, but I want devdb to receive any changes made to table structures, if this is possible. (ie. if i add a new table in appdb, I want devdb to also have the new table, same thing if I remove a column)
Do I need to use schemas for this, and if so, how? My appdb currently has a schema of public.
Thanks!
I think your best bet is to use:
pg_dump --schema-only prod | psql dev
To keep the schemas in sync, either drop and reload the dev db, or script your schema changes so you can apply the change to both DBs. You should be doing that anyway, testing changes in dev before applying them to production.
(Tools like Liquibase can be interesting for this).
Attempts to link DDL definitions directly are unsafe. They create a dependency from production to dev. That's risky.
For example, if you were to use a table inheritance based approach then a long-running transaction holding a lock on the dev tables might cause delays on production.

How can I backup everything in Postgres 8, including indexes?

When I make a backup in postgres 8 it only backs up the schemas and data, but not the indexes. How can i do this?
Sounds like you're making a backup using the pg_dump utility. That saves the information needed to recreate the database from scratch. You don't need to dump the information in the indexes for that to work. You have the schema, and the schema includes the index definitions. If you load this backup, the indexes will be rebuilt from the data, the same way they were created in the first place: built as new rows are added.
If you want to do a physical backup of the database blocks on disk, which will include the indexes, you need to do a PITR backup instead. That's a much more complicated procedure, but the resulting backup will be instantly usable. The pg_dump style backups can take quite some time to restore.
If I understand you correctly, you want a dump of the indexes as well as the original table data.
pg_dump will output CREATE INDEX statements at the end of the dump, which will recreate the indexes in the new database.
You can do a PITR backup as suggested by Greg Smith, or stop the database and just copy the binaries.