How can I combine multiple Postgres database dumps into a single database? - postgresql

I have ~100 Postgres .dump from different sources. They all have the same schema, just a single table, and a few hundred to a few hundred thousand rows. However, the data was collected at different locations and now needs to all be combined.
So I'd like to merge all the rows from all the databases into one single database, ignoring the ID key. What would be a decent way to do this? I may collect more data in the future from more sources, so it's likely to be a process I need to repeat.

if needed use pg_restore to convert the dumps into SQL.
run the SQL dump trhough
sed '/^COPY .* FROM stdin;$/,/^\\.$/ p;d'
as there is only one table in your data that will give you the copy command needed to load the data send that to your database to load the data.

Related

How to replicate a Postgres DB with only a sample of the data

I'm attempting to mock a database for testing purposes. What I'd like to do is given a connection to an existing Postgres DB, retrieve the schema, limit the data pulled to 1000 rows from each table, and persist both of these components as a file which can later be imported into a local database.
pg_dump doesn't seem to fullfill my requirements as theres no way to tell it to only retrieve a limited amount of rows from tables, its all or nothing.
COPY/\copy commands can help fill this gap, however, it doesn't seem like theres a way to copy data from multiple tables into a single file. I'd rather avoid having to create a single file per table, is there a way to work around this?

How to save database with all tables limited with 1000 rows only

I have a database with 20 tables. Currently, we are using the pg_dump command daily to archive our database.
a few tables in this database are very big. We are working to make a light version of this database for testing purposes and small tickets.
So, I need a way to use pg_dump command and save all tables with only 1000 rows in each table. I tried to find anything like that in Google, but without success.

PG_DUMP to Skip Certain Items in PG_LARGEOBJECT?

I am newer to PostgreSQL than Oracle. So, I'm going to explain what I do in Oracle and then ask if anyone knows if there is a way to do this in PostgreSQL.
I have over 300 tables in Oracle. Of them, some contain LOBs. Two of the ones I know, which consume a ton of space to house PDFs, are called JP_PDFS and JP_PRELIMPDFS. When I need a copy of this database DMP to transport to someone else, I don't need the contents of these tables to do a majority of so many other troubleshooting steps. So, I can export this database in two DMP files; one with the exclude directive and one with the include + content directives:
expdp full=n schemas=mySchema directory=DMPs dumpfile=mySchema_PDFS_schema.dmp logfile=expdp1.log include=TABLE:"IN('JP_PDFS','JP_PRELIMPDFS')" content=metadata_only
expdp full=n schemas=mySchema directory=DMPs dumpfile=mySchema_noPDFS.dmp logfile=expdp2.log exclude=TABLE:"IN('JP_PDFS','JP_PRELIMPDFS')"
Unfortunately, all LOBs are stored in PG_LARGEOBJECT. The references/pointers to the actual LO rows that comprise the LOB is stored in the aforementioned tables. But, there ARE other tables with LOBs I DO need to be exported to the .backup file with pg_dump.
What I want is a way to do what I do in the Oracle world with PostgreSQL. I know how to export the schemas, only, for JP_PDFS and JP_PRELIMPDFS. But, is there a way to tell pg_dump to not include the objects from PG_LARGEOBJECT for the referenced items from both JP_PDFS and JP_PRELIMPDFS?
Thanks!
No, large objects don't “belong” to anybody. Either all of them are dumped (for a dump of the complete database ot if the -b option of pg_dump was used) or none.
Large objects are cumbersome and require a special API. If the size of your binary data doesn't exceed 1GB, consider using the data type bytea for them. That is much easier to handle and will work like you expect.

Most efficient way to extract tables from Redshift?

I have large table (~1e9 rows, ~20 columns) in a AWS Redshift instance. I would like to extract this entire table through PostgreSQL in order to pipe the data into another columnar storage. Ideally, columns would be extracted one column at a time while maintaining an identical row ordering - as it would facilitate a lot of work on the receiving end (columnar).
How can I ensure that the series of SQL queries stay exactly aligned with each other? Thanks!
Ps: I am aware of the unload through S3 option, but I am seeking a PostgreSQL option.

Dump Postgres 9.3 data with indexes

When using
pg_dump --section post-data
I get a dump that contains the definitions of the indexes but not the index data. As the database I'm working on is really big and complex, recreating the indexes takes a lot of time.
Is there a way get the index data into my dump, so that I can restore an actually working database?
There is no way to include index data in a logical dump (pg_dump).
There's no way to extract index data from the SQL level, where pg_dump operates, nor any way to write it back. Indexes refer to the physical structure of the table (tuple IDs by page and offset) in a way that isn't preserved across a dump and reload anyway.
You can use a low-level disk copy using pg_basebackup if you want to copy the whole DB, indexes and all. Unlike a pg_dump you can't restore this to a different PostgreSQL version, you can't dump just one database, etc; it's all or nothing.