Can pg_dump set a table's sequence while also excluding its data? - postgresql

I'm running pg_dump -F custom for database backups, with --exclude-table-data for a very large audit table. I'm then exporting that table data in a separate dump file. It isn't referentially integral with the main dump.
As part of my restore strategy, I'd like to be able to restore the main dump, bring my app online and continue using the database immediately, then bring the audit data back in behind it. The trouble is, as soon as new audit data comes in at sequence 1, the import of the audit data fails as soon as it tries to insert over the top of the new data.
Is it possible to include the setting of the sequence in the main dump without including the table data?
I have considered removing the primary key, but there are other tables I'd also like to do this with, and they definitely do need the PK.
I'm using postgresql 13.

Instead of a sequence, which can build with a rownumber use uuids and a timestamp, so you have unique values and the order of insert doesn't matter. Uuids are a bit slower the ints.
Another possibility that you save th last audit Id in another table and set the sequence new like https://www.postgresql.org/docs/9.1/sql-altersequence.html

Related

How to synchronise a foreign table with a local table?

I'm using the Oracle foreign data wrapper and would like to have local copies of some of my foreign tables locally. Is there another option than having materialized views and refreshing them manually?
Not really, unless you want to add functionality in Oracle:
If you add a trigger on the Oracle table that records all data modifications in another table, you could define a foreign table on that table. Then you can regularly run a function in PostgreSQL that takes the changes since you checked last time and applies them to a PostgreSQL table.
If you understand how “materialized view logs” work in Oracle (I don't, and I think the documentation doesn't tell), you could define a foreign table on that and use it like above. That might be cheaper.
Both of these ideas would still require you to regularly run something in PostgreSQL, but you might be cheaper. Perhaps (if you have the money) you could use Oracle Heterogenous Services to modify a PostgreSQL table whenever something changes in an Oracle table.

How can I automatically maintain a dump of modified rows in PostGreSql

So, I have a PostGreSQL DB. For some chosen tables in that DB I want to maintain a plain dump of the rows when modified. Note this dump is not a recovery or backup dump. It is just a file which will have the incremental rows. That is, whenever a row is inserted or updated, I want that appended to this file or to a file in a folder. Idea is to load that folder into say something like hive periodically so that I can run queries to check previous states of certain rows, columns. Now, these are very high transactional tables and the dump does not need to be real time. It can be in batches, every hour. I want to avoid a trigger firing hundreds of times every minute. I am looking for something which is off the shelf - already available in PostGreSQL. I did some research but everything is related to PostGreSQL backup - which is not the exact use case.
I have read some links like https://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/ Implementing history of PostgreSQL table etc - but these are based on insert update trigger and create the history table on PostGreSQL itself. I want to avoid both. I cannot have the history on PostGreSQL as it will be huge soon. And I do not want to keep writing to files through a trigger firing constantly.

Enforcing Foreign Key Constraint Over Table From pg_dump With --exclude-table-data

I'm currently working on dumping one of our customer's database in a way that allows us to create new databases from this customer's basic structure, but without bringing along their private data.
So far, I've had success with pg_dump combined with the --exclude_table and exclude-table-data commands, which allowed me to bring only the data I'll effectively need for this task.
However, there are a few tables that mix lines which references some of the data I left behind with other lines that references data that I had to bring, and this is causing me a few issues during the restore operation. Specifically, when the dump tries to enforce FOREIGN KEY constraints for certain columns on these tables, it fails because there are some lines with keys that have no matching data on the respective foreign table - because I chose to not bring this table's data!
I know I can log into the database after the dump is complete, delete any rows that reference data that no longer exists and create the constraint myself, but I'd like to automate the process as much as possible. Is there a way to tell pg_dump or pg_restore (or any other program) to not bring rows from table A if they reference table B if and table B's data was excluded from the backup? Or to tell Postgres that I'd like to have that specific foreign key to be active before importing the table's data?
For reference, I'm working with PostgreSQL 9.2 on a HREL 7 server.
What if you disable foreign key checking when you restore your database dump? And after that remove lonely rows from the referring table.
By the way, I recommend you to fix you database schema so there is no chance wrong tuples being inserted into your database.

SQLite to PostgreSQL data-only transfer (to maintain alembic functionality)

There are a few questions and answers already on PostgreSQL import (as well as the specific SQLite->PostgreSQL situation). This question is about a specific corner-case.
Background
I have an existing, in-production web-app written in python (pyramid) and using alembic for easy schema migration. Due to the database creaking with unexpectedly high write-load (probably due to the convoluted nature of my own code), I've decided to migrate to PostgreSQL.
Data migration
There are a few recommendations on data migration. The simplest one involved using
sqlite3 my.db .dump > sqlitedumpfile.sql
and then importing it with
psql -d newpostgresdb < sqlitedumpfile.sql
This required a bit of editing of sqlitedumpfile. In particular, removing some incompatible operations, changing values (sqlite represents booleans as 0/1) etc. It ended up being too complicated to do programmatically for my data, and too much work to handle manually (some tables had 20k rows or so).
A good tool for data migration which I eventually settled on was pgloader, which 'worked' immediately. However, as is typical for data migration of this sort, this exposed various data inconsistencies in my database which I had to solve at source before doing the migration (in particular, removing foreign keys to non-unique columns which seemed a good idea at the time for convenient joins and removing orphan rows which relied on rows in other tables which had been deleted). After these were solved, I could just do
pgloader my.db postgresql:///newpostgresdb
And get all my data appropriately.
The problem?
pgloader worked really well for data but not so well for the table structure itself. This resulted in three problems:-
I had to create a new alembic revision with a ton of changes (mostly datatype related, but also some related to problem 2).
Constraint/index names were unreliable (unique numeric names generated). There's actually an option to disable this, and this was a problem because I needed a reliable upgrade path which was replicable in production without me having to manually tweak the alembic code.
Sequences/autoincrement just failed for most primary keys. This broke my webapp as I was not able to add new rows for some (not all) databases.
In contrast, re-creating a blank database using alembic to maintain the schema works well without changing any of my webapps code. However pgloader defaults to over-riding existing tables, so this would leave me nowhere as the data is what really needs migrating.
How do I get proper data migration using a schema I've already defined (and which works)?
What eventually worked was, in summary:-
Create the appropriate database structure in postgresql://newpostgresdb (I just used alembic upgrade head for this)
Use pgloader to move data over from sqlite to a different database in postgresql. As mentioned in the question, some data inconsistencies need to be solved before this step, but that's not relevant to this question itself.
createdb tempdb
pgloader my.db postgresql:///tempdb
Dump the data in tempdb using pg_dump
pg_dump -a -d tempdb > dumped_postgres_database
Edit the resulting dump to accomplish the following:-
SET session_replication_role = replica because some of my rows are circular in reference to other rows in the same table
Delete the alembic_version table, as we're restarting a new branch for alembic.
Regenerate any sequences, with the equivalent of SELECT pg_catalog.setval('"table_colname_seq"', (select max(colname) from table));
Finally, psql can be used to load the data to your actual database
psql -d newpostgresdb < dumped_postgres_database

Restore PostgreSQL dump with new primary key values

I've got a problem with a PostgreSQL dump / restore. We have a production appliaction running with PostgresSQL 8.4. I need to create some values in the database in the testing environment and then import just this chunk of data into the production environment. The data is generated by the application and I need to use this approach because it needs testing before going into production.
Now that I described the environment, here is my problem:
In the testing database, I leave nothing but the data I need to move to the production database. The data is spread across multiple tables linked with foreign keys with multiple levels (like a tree). I then use pg_dump to export the desired tables into binary format.
When I try to import, the database will correctly import the root table entries with new primary key values, but does not import any of the data from the other tables. I believe that the problem is that foreign keys on child tables no longer recognizes the new primary keys.
Is there a way to achieve such an import which will update all the primary key values of all affected tables in the tree to correct serial (auto increment) values automatically and also update all foreign keys according to these new primary key values?
I have and idea how to do this with assistance of programming language while connected to both databases, but that would be very problematic to achieve for me since I don't have direct access to customers production server.
Thanks in advance!
That one seems to me like a complex migration issue. You can create PL/pgSQL migration scripts with inserts and use returning to get serials and use as foreign keys for other tables up the tree. I do not know the structure of your tree but in some cases reading sequence values in advance into arrays may be required due to complexity or performance reasons.
Other approach can be to examine production sequence values and estimate sequence values that will not be used in the near future. Fabricate test data in the test environment to have serial values that will not collide with production sequence values. Then load that data into the prod database and adjust sequence values of the prod environment so that test sequence values will not be used. It will leave a gap in your ID sequence so you must examine whether anything (like other processes) rely on the sequence values to be continuos.