postgres: dump partial tables between databases while maintaining sequences - postgresql

I want to copy select bits of several tables from one database to another while maintaining the sequences and schema. I first dump the schema using pg_dump -s but when it comes to copying the data I'm a little at a loss. Here's what I've tried so far:
pg_dump -t <table1> gives me sequences but includes the whole table
copy (SELECT bits from table1) gives me partial tables but doesn't keep the sequences up to date.
How can I keep my sequences up to date while only dumping parts of the tables?

Nothing built-in will dump part of a table for you, so doing a -s/--schema-only dump and writing your own COPY statements is the way to go.
As noted in the pg_dump docs, the -t/--table option will also take a sequence name. You can combine this with the -a/--data-only flag to output just the sequence's setval(...) command:
pg_dump --data-only -t <sequence_name>
Of course, if your sequences are associated with a SERIAL column, you usually don't know (or care) exactly what they're called. In that case, you can (probably) rely on the default <table>_<column>_seq naming convention to dump them all at once:
pg_dump --data-only -t *_seq
If you have non-standard sequence names, or if you're unfortunate enough to have a table name which ends in _seq, you might need to generate the sequence list programmatically. In bash, something like this would probably do it:
pg_dump --data-only -t $(psql -tAc "SELECT string_agg(oid::regclass::text, ',') FROM pg_class WHERE relkind = 'S'")

Related

How to hash an entire table in postgres?

I'd like to get a hash for data in an entire table. I need to compare two databases after migration to validate that the data migration was successful. Is it possible to reliably and reproducibly generate a hash for an entire table in a database?
You can do this from the command line (replacing of course my_database and my_table):
psql my_database -c 'copy my_table to stdout' |sha1sum
If you want to use a query to limit columns, add ordering, etc., just modify the query:
psql my_database -c 'copy (select * from my_table order by my_id_column) to stdout' |sha1sum
Note that this does not hash anything except the column data. No schema information, constraints, indexes, metadata, permissions, etc.
Note also that sha1sum is an arbitrary hashing program; you can pipe this to any program that generates a hash. Some cuspy options are sha256sum and md5sum.

pg_dump with --exclude-table still includes those tables in the background COPY commands it runs?

I am trying to take a backup of a TimescaleDB database, excluding two very big hypertables.
That means that while the backup is running, I would not expect to see any COPY command of the underlying chunks, but I actually do!
Let's say TestDB is my database and it has two big hypertables on schema mySchema called hyper1 and hyper2, as well as other normal tables.
I run the following command:
pg_dump -U user -F t TestDB --exclude-table "mySchema.hyper1" --exclude-table "mySchema.hyper2" > TestDB_Backup.tar
Then I check the running queries (esp. because I did not expect it to take this long) and I find out that several COPY commands are running, for each chunk of the tables I actually excluded.
This is TimescaleDB version 1.7.4.
Did this ever happen to any of you and what is actually going on here?
ps. I am sorry I cannot really provide a repro for this and that this is more of a discussion than an actual programmatic problem, but I still hope someone has seen this before and can show me what am I missing :)
pg_dump dumps each child table separately and independently from their parents, thus when you exclude a hypertable, its chunk tables will be still dumped. Thus you observe all chunk tables are still dumped.
Note that excluding hypertables and chunks will not work to restore the dump correctly into a TimescaleDB instance, since TimescaleDB metadata will not match actual state of the database. TimescaleDB maintains catalog tables with information about hypertables and chunks and they are just another user tables for pg_dump, so it will dump them (which is important), but when they are restored they will contain all hypertables and chunks, which was in the database before the dump.
So you need to exclude data from the tables you want to exclude (not hypertables or chunks themselves), which will reduce dump and restore time. Then it will be necessary to drop the excluded hypertables after the restore. You exclude table data with pg_dump parameter --exclude-table-data. There is an issue in TimescaleDB GitHub repo, which discusses how to exclude hypertable data from a dump. The issue suggests how to generate the exclude string:
SELECT string_agg(format($$--exclude-table-data='%s.%s'$$,coalesce(cc.schema_name,c.schema_name), coalesce(cc.table_name, c.table_name)), ' ')
FROM _timescaledb_catalog.hypertable h
INNER JOIN _timescaledb_catalog.chunk c on c.hypertable_id = h.id
LEFT JOIN _timescaledb_catalog.chunk cc on c.compressed_chunk_id = cc.id
WHERE h.schema_name = <foo> AND h.table_name = <bar> ;
Alternatively, you can find hypertable_id and exclude data from all chunk tables prefixed with the hypertable id. Find hypertable_id from catalog table _timescaledb_catalog.hypertable:
SELECT id
FROM _timescaledb_catalog.hypertable
WHERE schema_name = 'mySchema' AND table_name = 'hyper1';
Let's say that the id is 2. Then dump the database according the instructions:
pg_dump -U user -Fc -f TestDB_Backup.bak \
--exclude-table-data='_timescaledb_internal._hyper_2*' TestDB

pg_dump for all metadata and only table data of selected tables

I want to create a script that will dump the whole schema and the data of only a few tables and write it to one file.
Use the --exclude-table-data option of pg_dump to define the tables whose data should be excluded from the dump.
multiple -t lists table you want take backup of, eg
MacBook-Air:~ vao$ pg_dump -d t -t pg_database -t a -t so | grep 'CREATE TABLE'
CREATE TABLE pg_database (
CREATE TABLE a (
CREATE TABLE so (
takes backup of structure and data of three mentioned tables. I use grep to hide other rows and yet give idea of backup contents
https://www.postgresql.org/docs/current/static/app-pgdump.html
-t table
--table=table
Dump only tables with names matching table. For this purpose, “table”
includes views, materialized views, sequences, and foreign tables.
Multiple tables can be selected by writing multiple -t switches.

Copy table data from one database to another

I have two databases on the same server and need to copy data from a table in the first db to a table in the second. A few caveats:
Both tables already exist (ie: I must not drop the 'copy-to' table first. I need to just add the data to the existing table)
The column names differ. So I need to specify exactly which columns to copy, and what their names are in the new table
After some digging I have only been able to find this:
pg_dump -t tablename dbname | psql otherdbname
But the above command doesn't take into account the two caveats I listed.
For a table t, with columns a and b in the source database, and x and y in the target:
psql -d sourcedb -c "copy t(a,b) to stdout" | psql -d targetdb -c "copy t(x,y) from stdin"
I'd use an ETL tool for this. There are free tools available, they can help you change column names and they are widely used and tested. Most tools allow external schedulers like the windows task scheduler or cron to run transformations based on whatever time schedule you need.
I personally have used Pentaho PDI for similar tasks in the past and it has always worked well for me. For your requirement I'd create a single transformation that first loads the table data from the source database, modify the column names in a "Select Values"-step and then insert the values into the target table using the "truncate" option to remove the existing rows from the target table. If your table is too big to be re-filled each time, you'd need to figure out a delta load procedure.

pg_dump only dumps table info, and very little table data?

pg_dump -U postgres mydb > mydb.bak.sql
From the docs, it doesn't seem like I need to pass any flag to include table data in a dump. Yet, the dump resulting from the above includes data for a strange, tiny subset of tables--aside from their create statements, most tables are only listed as
COPY <tablename> (vals) FROM stdin;
\.
Is there some circumstance where you have to explicitly tell pg_dump to include all table data?