I have two databases on the same server and need to copy data from a table in the first db to a table in the second. A few caveats:
Both tables already exist (ie: I must not drop the 'copy-to' table first. I need to just add the data to the existing table)
The column names differ. So I need to specify exactly which columns to copy, and what their names are in the new table
After some digging I have only been able to find this:
pg_dump -t tablename dbname | psql otherdbname
But the above command doesn't take into account the two caveats I listed.
For a table t, with columns a and b in the source database, and x and y in the target:
psql -d sourcedb -c "copy t(a,b) to stdout" | psql -d targetdb -c "copy t(x,y) from stdin"
I'd use an ETL tool for this. There are free tools available, they can help you change column names and they are widely used and tested. Most tools allow external schedulers like the windows task scheduler or cron to run transformations based on whatever time schedule you need.
I personally have used Pentaho PDI for similar tasks in the past and it has always worked well for me. For your requirement I'd create a single transformation that first loads the table data from the source database, modify the column names in a "Select Values"-step and then insert the values into the target table using the "truncate" option to remove the existing rows from the target table. If your table is too big to be re-filled each time, you'd need to figure out a delta load procedure.
Related
I am trying to take a backup of a TimescaleDB database, excluding two very big hypertables.
That means that while the backup is running, I would not expect to see any COPY command of the underlying chunks, but I actually do!
Let's say TestDB is my database and it has two big hypertables on schema mySchema called hyper1 and hyper2, as well as other normal tables.
I run the following command:
pg_dump -U user -F t TestDB --exclude-table "mySchema.hyper1" --exclude-table "mySchema.hyper2" > TestDB_Backup.tar
Then I check the running queries (esp. because I did not expect it to take this long) and I find out that several COPY commands are running, for each chunk of the tables I actually excluded.
This is TimescaleDB version 1.7.4.
Did this ever happen to any of you and what is actually going on here?
ps. I am sorry I cannot really provide a repro for this and that this is more of a discussion than an actual programmatic problem, but I still hope someone has seen this before and can show me what am I missing :)
pg_dump dumps each child table separately and independently from their parents, thus when you exclude a hypertable, its chunk tables will be still dumped. Thus you observe all chunk tables are still dumped.
Note that excluding hypertables and chunks will not work to restore the dump correctly into a TimescaleDB instance, since TimescaleDB metadata will not match actual state of the database. TimescaleDB maintains catalog tables with information about hypertables and chunks and they are just another user tables for pg_dump, so it will dump them (which is important), but when they are restored they will contain all hypertables and chunks, which was in the database before the dump.
So you need to exclude data from the tables you want to exclude (not hypertables or chunks themselves), which will reduce dump and restore time. Then it will be necessary to drop the excluded hypertables after the restore. You exclude table data with pg_dump parameter --exclude-table-data. There is an issue in TimescaleDB GitHub repo, which discusses how to exclude hypertable data from a dump. The issue suggests how to generate the exclude string:
SELECT string_agg(format($$--exclude-table-data='%s.%s'$$,coalesce(cc.schema_name,c.schema_name), coalesce(cc.table_name, c.table_name)), ' ')
FROM _timescaledb_catalog.hypertable h
INNER JOIN _timescaledb_catalog.chunk c on c.hypertable_id = h.id
LEFT JOIN _timescaledb_catalog.chunk cc on c.compressed_chunk_id = cc.id
WHERE h.schema_name = <foo> AND h.table_name = <bar> ;
Alternatively, you can find hypertable_id and exclude data from all chunk tables prefixed with the hypertable id. Find hypertable_id from catalog table _timescaledb_catalog.hypertable:
SELECT id
FROM _timescaledb_catalog.hypertable
WHERE schema_name = 'mySchema' AND table_name = 'hyper1';
Let's say that the id is 2. Then dump the database according the instructions:
pg_dump -U user -Fc -f TestDB_Backup.bak \
--exclude-table-data='_timescaledb_internal._hyper_2*' TestDB
I followed the manual on: https://docs.timescale.com/v1.0/using-timescaledb/backup
When I dump it into a binary file everything work out as expected (can restore it easily).
However, when I dump it into plain text SQL, insertions to hyper tables will be created. Is that possible to create INSERTION to the table itself?
Say I have an 'Auto' table with columns of id,brand,speed
and with only one row: 1,Opel,170
dumping into SQL will result like this:
INSERT INTO _timescaledb_catalog.hypertable VALUES ...
INSERT INTO _timescaledb_internal._hyper_382_8930_chunk VALUES (1, 'Opel',170);
What I need is this (and let TS do the work in the background):
INSERT INTO Auto VALUES (1,'Opel',170);
Is that possible somehow? (I know I can exclude tables from pg_dump but that wouldn't create the needed insertion)
Beatrice. Unfortunately, pg_dump will dump commands that mirror the underlying implementation of Timescale. For example, _hyper_382_8930_chunk is a chunk underlying the auto hypertable that you have.
Might I ask why you don't want pg_dump to behave this way? The SQL file that Postgres creates on a dump is intended to be used by pg_restore. So as long as you dump and restore and see correct state, there is no problem with dump/restore.
Perhaps you are asking a different question?
I can't find any system tables, even with joining several together, that have info to show how many records were loaded with a COPY statement. I've looked at a bunch of tables in the pg_catalog schema but haven't found anything
The system table STL_LOAD_COMMITS shows how many records were loaded per file but it does not provide a simple way to associate the load to the table being loaded (only query_id is provided).
However COPY returns the loaded row count to the client in a consistent way that can be captured and parsed. For example:
INFO: Load into table 'my_table' completed, 7304953 record(s) loaded successfully.
I submit my loads in a bash script and capture this return line. For instance, with my COPY statement in a variable called sql:
copy=`psql -v ON_ERROR_STOP=1 -h $3 -d "$4" -c "$sql"`;
A new variable copy will be created that contains the INFO… line.
If your are using new line character as a new row in your load file, you can query table STL_LOAD_COMMITS and check the lines_scanned columns, which gives you the number of rows copied in the table.
I have a postgres database, I am trying to backup a table with :
pg_dump --data-only --table=<table> <db> > dump.sql
Then days later I am trying to overwrite it (basically want to erase all data and add the data from my dump) by:
psql -d <db> -c --table=<table> < dump.sql
But It doesn't overwrite, it adds on it without deleting the existing data.
Any advice would be awesome, thanks!
You have basically two options, depending on your data and fkey constraints.
If there are no fkeys to the table, then the best thing to do is to truncate the table before loading it. Note that truncate behaves a little odd in transactions so the best thing to do is (in a transaction block):
Lock the table
Truncate
Load
This will avoid other transactions seeing an empty table.
If you have fkeys then you may want to load into a temporary table and then do an upsert. In this case you may still want to lock the table to avoid a race condition if it is possible other transactions may want to write to the table (also in a transaction block):
Load data into a temporary table
Lock the destination table (optional, see above)
use a writeable cte to "upsert" in the table.
Use a separate delete statement to delete data from the table.
Stage 3 is a little tricky. You might need to ask a separate question about it, but basically you will have two stages (and write this in consultation with the docs):
Update existing records
Insert non-existing records
Hope this helps.
I've been using both mysql and mysqldump to teach myself how to get data out of one database, but I consider myself a moderate MySQL newbie, just for the record.
What I'd like to do is get a sub-set of one database into a brand new database on a different server, so I need to create both the db/table creation sql as well as populating the data records.
Let's say my original db has 10 tables with 100 rows each. My new database will have 4 of those tables (all original columns), but a further-refined dataset of 40 rows each. Those 40 rows are isolated with some not-so-short SELECT statements, one for each table.
I'd like to produce .sql file(s) that I can call from mysql to load/insert my exported data. How can I generate those sql files? I have HEARD that you can call a select statement from mysqldump, but haven't seen relevant examples with select statements as long as mine.
Right now I can produce sql output that is just the results set with column names, but no insert code, etc.
Assistance is GREATLY appreciated.
You will probably have to use mysqldump to dump your tables one at a time and using the where clause
-w, --where='where-condition'
Dump only selected records. Note that quotes are mandatory:
"--where=user='jimf'" "-wuserid>1" "-wuserid<1"
For example:
mysqldump database1 table1 --where='rowid<10'
See docs: http://linux.die.net/man/1/mysqldump
mysqldump each table with a where clause like Dennis said above. one table at a time and merge the scripts with cat
cat customer.db order.db list.db price.db > my_new_db.db