What happens to existing data with psql dbname < pg_dump_file [duplicate] - postgresql

This question already has an answer here:
will pg_restore overwrite the existing tables?
(1 answer)
Closed 9 months ago.
I have an database on aws' rds and I use a pg_dump from a local version of the database, then psql dbname > pg_dump_file with proper arguments for remote upload to populate the database.
I'd like to know what is expected to happen if that rds db already contains data. More specifically:
Data present in the local dump, but absent in rds
Data present on rds, but absent in the local data
Data present in both but that have been modified
My current understanding:
New data will be added and be present in both after upload
Data in rds should be unaffected?
The data from the pg_dump will be present in both (assuming the same pk, but different fields otherwise)
Is that about correct? I've been reading this, but it's a little thin on how the restore is actually performed, so I'm having a harder time figuring that out. Thanks.
EDIT: following #wildplasser comment, by looking at the pg_dump file it appears that the following happens:
CREATE TABLE [....]
ALTER TABLE [setting table owner]
ALTER SEQUENCE [....]
For each table in the db. Then, again one table at a time:
COPY [tablename] (list of cols) FROM stdin;
[data to be copied]
Finally, more ALTER statements to set contraints, foreign keys etc.
So I guess the ultimate answer is "it depends". One could I suppose remove the CREATE TABLE [...], ALTER TABLE, ALTER SEQUENCE statements if those are already created as they should. I am not positive yet what happens if one tries CREATE TABLE with an existing table (error thrown perhaps?).
Then I guess the COPY statements would overwrite whatever already exists. Or perhaps throw an error. I'll have to test that. I'll write up an answer once I figure it out.

So the answer is a bit dull. Turns out that even if one removes the initial statements before the copy, if the table as an primary key (thus uniqueness constrains) then it won't work:
ERROR: duplicate key value violates unique constraint
So one gets shutdown pretty quickly there. One would have I guess to rewrite the dump as a list of UPDATE statements instead, but then I guess might as well write a script to do so. Unsure if pg_dump is all that useful in that case.

Related

Restoring PG database from dump fails due to generated columns

We use our Postgres database dumps as a way to backup/reset our staging DB. As part of that, we frequently remove all rows in the DB and insert the rows from the PG dump. However, the generated columns are included as part of the PG dump, but with their values instead of the DEFAULT keyword.
Trying to run the DB dump triggers cannot insert into column errors since one cannot insert values into a generated column. How do we dump our DB and recreate it from the dump despite the generated columns?
EDIT: Note that we cannot use GENERATED BY DEFAULT or OVERRIDING SYSTEM VALUE since those are only available for identity columns and not generated columns.
EDIT 2: It seems that it's a special case for us that the values are dumped instead of as DEFAULT. Any idea why that might be?

Is it possible to dump from Timescale without hypertable insertions?

I followed the manual on: https://docs.timescale.com/v1.0/using-timescaledb/backup
When I dump it into a binary file everything work out as expected (can restore it easily).
However, when I dump it into plain text SQL, insertions to hyper tables will be created. Is that possible to create INSERTION to the table itself?
Say I have an 'Auto' table with columns of id,brand,speed
and with only one row: 1,Opel,170
dumping into SQL will result like this:
INSERT INTO _timescaledb_catalog.hypertable VALUES ...
INSERT INTO _timescaledb_internal._hyper_382_8930_chunk VALUES (1, 'Opel',170);
What I need is this (and let TS do the work in the background):
INSERT INTO Auto VALUES (1,'Opel',170);
Is that possible somehow? (I know I can exclude tables from pg_dump but that wouldn't create the needed insertion)
Beatrice. Unfortunately, pg_dump will dump commands that mirror the underlying implementation of Timescale. For example, _hyper_382_8930_chunk is a chunk underlying the auto hypertable that you have.
Might I ask why you don't want pg_dump to behave this way? The SQL file that Postgres creates on a dump is intended to be used by pg_restore. So as long as you dump and restore and see correct state, there is no problem with dump/restore.
Perhaps you are asking a different question?

Postgres backup and overwrite one table

I have a postgres database, I am trying to backup a table with :
pg_dump --data-only --table=<table> <db> > dump.sql
Then days later I am trying to overwrite it (basically want to erase all data and add the data from my dump) by:
psql -d <db> -c --table=<table> < dump.sql
But It doesn't overwrite, it adds on it without deleting the existing data.
Any advice would be awesome, thanks!
You have basically two options, depending on your data and fkey constraints.
If there are no fkeys to the table, then the best thing to do is to truncate the table before loading it. Note that truncate behaves a little odd in transactions so the best thing to do is (in a transaction block):
Lock the table
Truncate
Load
This will avoid other transactions seeing an empty table.
If you have fkeys then you may want to load into a temporary table and then do an upsert. In this case you may still want to lock the table to avoid a race condition if it is possible other transactions may want to write to the table (also in a transaction block):
Load data into a temporary table
Lock the destination table (optional, see above)
use a writeable cte to "upsert" in the table.
Use a separate delete statement to delete data from the table.
Stage 3 is a little tricky. You might need to ask a separate question about it, but basically you will have two stages (and write this in consultation with the docs):
Update existing records
Insert non-existing records
Hope this helps.

how to restore a postgresql database to the exact same state?

I am trying to
create a snapshot of a PostgreSQL database (using pg_dump),
do some random tests, and
restore to the exact same state as the snapshot, and do some other random tests.
These can happen over many/different days. Also I am in a multi-user environment where I am not DB admin. In particular, I cannot create new DB.
However, when I restore db using
gunzip -c dump_file.gz | psql my_db
changes in step 2 above remain.
For example, if I make a copy of a table:
create table foo1 as (select * from foo);
and then restore, the copied table foo1 remains there.
Could some explain how can I restore to the exact same state as if step 2 never happened?
-- Update --
Following the comments #a_horse_with_no_name, I tried to to use
DROP OWNED BY my_db_user
to drop all my objects before restore, but I got an error associated with an extension that I cannot control, and my tables remain intact.
ERROR: cannot drop sequence bg_gid_seq because extension postgis_tiger_geocoder requires it
HINT: You can drop extension postgis_tiger_geocoder instead.
Any suggestions?
You have to remove everything that's there by dropping and recreating the database or something like that. pg_dump basically just makes an SQL script that, when applied, will ensure all the tables, stored procs, etc. exist and have their data. It doesn't remove anything.
You can use PostgreSQL Schemas.

PostgreSql: duplicate pkey error when inserting a new records to a restored database's table

I used the commands pg_dump and psql to backup my production DB and restore it into my development server.
Now when I try to simply insert a new record to one of my tables I get the following error message:
ERROR: duplicate key value violates unique constraint
"communication_methods_pkey" DETAIL: Key (id)=(13) already exists.
How come that the id is already in use? I need to update something in order to have the id increment counter back on the right track?
It sounds like the sequences used to do the primary key for each table are not on the correct value. It is interesting that pg_dump did not include a sequence setval at the end of it (I believe it is supposed to).
Postgres recommends the following process to correct sequences: https://wiki.postgresql.org/wiki/Fixing_Sequences
Essentially, it takes you through identifying all your sequences and creating a sql script to run to set them to 1 more than your inserted value's ids.