Restore PostgreSQL dump with new primary key values - postgresql

I've got a problem with a PostgreSQL dump / restore. We have a production appliaction running with PostgresSQL 8.4. I need to create some values in the database in the testing environment and then import just this chunk of data into the production environment. The data is generated by the application and I need to use this approach because it needs testing before going into production.
Now that I described the environment, here is my problem:
In the testing database, I leave nothing but the data I need to move to the production database. The data is spread across multiple tables linked with foreign keys with multiple levels (like a tree). I then use pg_dump to export the desired tables into binary format.
When I try to import, the database will correctly import the root table entries with new primary key values, but does not import any of the data from the other tables. I believe that the problem is that foreign keys on child tables no longer recognizes the new primary keys.
Is there a way to achieve such an import which will update all the primary key values of all affected tables in the tree to correct serial (auto increment) values automatically and also update all foreign keys according to these new primary key values?
I have and idea how to do this with assistance of programming language while connected to both databases, but that would be very problematic to achieve for me since I don't have direct access to customers production server.
Thanks in advance!

That one seems to me like a complex migration issue. You can create PL/pgSQL migration scripts with inserts and use returning to get serials and use as foreign keys for other tables up the tree. I do not know the structure of your tree but in some cases reading sequence values in advance into arrays may be required due to complexity or performance reasons.
Other approach can be to examine production sequence values and estimate sequence values that will not be used in the near future. Fabricate test data in the test environment to have serial values that will not collide with production sequence values. Then load that data into the prod database and adjust sequence values of the prod environment so that test sequence values will not be used. It will leave a gap in your ID sequence so you must examine whether anything (like other processes) rely on the sequence values to be continuos.

Related

Idempotency and foreign key constraints when using Postgres pg_dump files with Flyway

I am trying to use some pg_dump generated migration scripts with Flyway. The first migration script is for schema only. The other migration scripts load seed data into various tables using the Postgres COPY command. These seed-data scripts are going to exist as Flyway repeatable migration scripts. This setup presents two issues.
When Flyway loads the seed data from the migration scripts, I'm getting foreign key constraint violations since I don't have the various tables being seeded in the correct order. There are a large number of tables to deal with, so is there an easy way to work around this so that I don't have to try to reorder my COPY's?
Since the seed data is going to be in repeatable migration scripts, these need to be idempotent. Is there a way to do this with the Postgres COPY command? I'm trying to avoid having to convert this to INSERTs since it will hurt performance and also make my migrations files huge.
The trick here for idempotency is to delete the data from the files in the correct dependency order, and when you've done that, to likewise load the data in the correct dependency order. The correct dependency order for deleting the data is worked out by obtaining the target tables for every foreign key constraint and ensuring that no data from a table is ever deleted when it is the target of a table whose data is yet to be deleted. This list of tables in dependency order is usually called a 'manifest' and is required also for CREATE statements and for the PgSQL COPY. The Public domain PowerShell-based Flyway Teamworks framework will create the manifest for you.

Can pg_dump set a table's sequence while also excluding its data?

I'm running pg_dump -F custom for database backups, with --exclude-table-data for a very large audit table. I'm then exporting that table data in a separate dump file. It isn't referentially integral with the main dump.
As part of my restore strategy, I'd like to be able to restore the main dump, bring my app online and continue using the database immediately, then bring the audit data back in behind it. The trouble is, as soon as new audit data comes in at sequence 1, the import of the audit data fails as soon as it tries to insert over the top of the new data.
Is it possible to include the setting of the sequence in the main dump without including the table data?
I have considered removing the primary key, but there are other tables I'd also like to do this with, and they definitely do need the PK.
I'm using postgresql 13.
Instead of a sequence, which can build with a rownumber use uuids and a timestamp, so you have unique values and the order of insert doesn't matter. Uuids are a bit slower the ints.
Another possibility that you save th last audit Id in another table and set the sequence new like https://www.postgresql.org/docs/9.1/sql-altersequence.html

Is there a way to start from max in a primary key in PostgreSQL if I import data from another db?

I have an old MSAccess DB and I want to convert it to PostgreSQL. I found DBeaver very useful. Some operations may be done by hand. This is the case of Primary Keys. You must manually set the primary keys. I didn't found another way to do it. So in PGAdmin, I'm setting all this stuff. The client is using constantly this db so I can get data only on holiday when the client is not working and import data on PostgreSQL. My goal is to set the database ready to receive data from the production database. So far, in PGAdmin, I'm setting the Primary Key identity on "Always" and starting from the last number of primary key, setting it by hand. I think this is not the right way to do it. And when I'll start to import the production data, I don't want to set by hand all that stuff. How can I set the primary key ready to start the autoincrement from max of ID?
Using an identity column is the good way. Just set the sequences to a value safely above the current maximum. It doesn't matter if you lose a million sequence values.

Build table of tables from other databases in Postgres - (Multiple-Server Parallel Query Execution?)

I am trying to find the best solution to build a database relation. I need something to create a table that will contain data split across other tables from different databases. All the tables got exactly the same structure (same column number, names and types).
In the single database, I would create a parent table with partitions. However, the volume of the data is too big to do it in a single database that's why I am trying to do a split. From the Postgres documentation what I think I am trying to do is "Multiple-Server Parallel Query Execution".
At the moment the only solution I think to implement is to build API of databases address and use it to get data across the network into the main parent database when needed. I also found Postgres external extension called Citus that might do the job but I don't know how to implement the unique key across multiple databases (or Shards like Citus call it).
Is there any better way to do it?
Citus would most likely solve your problem. It lets you use unique keys across shards if it is the distribution column, or if it is a composite key and contains the distribution column.
You can also use distributed-partitioned table in citus. That is a partitioned table on some column (timestamp ?) and hash distributed table on some other column (like what you use in your existing approach). Query parallelization and data collection would be handled by Citus for you.

Enforcing Foreign Key Constraint Over Table From pg_dump With --exclude-table-data

I'm currently working on dumping one of our customer's database in a way that allows us to create new databases from this customer's basic structure, but without bringing along their private data.
So far, I've had success with pg_dump combined with the --exclude_table and exclude-table-data commands, which allowed me to bring only the data I'll effectively need for this task.
However, there are a few tables that mix lines which references some of the data I left behind with other lines that references data that I had to bring, and this is causing me a few issues during the restore operation. Specifically, when the dump tries to enforce FOREIGN KEY constraints for certain columns on these tables, it fails because there are some lines with keys that have no matching data on the respective foreign table - because I chose to not bring this table's data!
I know I can log into the database after the dump is complete, delete any rows that reference data that no longer exists and create the constraint myself, but I'd like to automate the process as much as possible. Is there a way to tell pg_dump or pg_restore (or any other program) to not bring rows from table A if they reference table B if and table B's data was excluded from the backup? Or to tell Postgres that I'd like to have that specific foreign key to be active before importing the table's data?
For reference, I'm working with PostgreSQL 9.2 on a HREL 7 server.
What if you disable foreign key checking when you restore your database dump? And after that remove lonely rows from the referring table.
By the way, I recommend you to fix you database schema so there is no chance wrong tuples being inserted into your database.