When using
pg_dump --section post-data
I get a dump that contains the definitions of the indexes but not the index data. As the database I'm working on is really big and complex, recreating the indexes takes a lot of time.
Is there a way get the index data into my dump, so that I can restore an actually working database?
There is no way to include index data in a logical dump (pg_dump).
There's no way to extract index data from the SQL level, where pg_dump operates, nor any way to write it back. Indexes refer to the physical structure of the table (tuple IDs by page and offset) in a way that isn't preserved across a dump and reload anyway.
You can use a low-level disk copy using pg_basebackup if you want to copy the whole DB, indexes and all. Unlike a pg_dump you can't restore this to a different PostgreSQL version, you can't dump just one database, etc; it's all or nothing.
Related
We have a system which stores data in a postgres database. In some cases, the size of the database has grown to several GBs.
When this system is upgraded, the data in the said database is backed up, and finally it's restored in the database. Owing to the huge amounts of data, the indexing takes a long time to complete (~30 minutes) during restoration, thereby delaying the upgrade process.
Is there a way where the data copy and indexing can be split into two steps, where the data is copied first to complete the upgrade, followed by indexing which can be done at a later time in the background?
Thanks!
There's no built-in way to do it with pg_dump and pg_restore. But pg_restore's -j option helps a lot.
There is CREATE INDEX CONCURRENTLY. But pg_restore doesn't use it.
It would be quite nice to be able to restore everything except secondary indexes not depended on by FK constraints. Then restore those as a separate phase using CREATE INDEX CONCURRENTLY. But no such support currently exists, you'd have to write it yourself.
You can, however, filter the table-of-contents used by pg_restore, so you could possibly do some hacky scripting to do the needed work.
There is an option to separate the data and creating index in postgresql while taking pg_dump.
Here pre-data refers to Schema, post-data refers to index and triggers.
From the docs,
--section=sectionname Only dump the named section. The section name can be pre-data, data, or post-data. This option can be specified more
than once to select multiple sections. The default is to dump all
sections.
The data section contains actual table data, large-object contents,
and sequence values. Post-data items include definitions of indexes,
triggers, rules, and constraints other than validated check
constraints. Pre-data items include all other data definition items.
May be this would help :)
I'm trying to move postgresql between two servers. There's rsync connectivity between the two servers.
My tables are large, around 200GB in total with nearly 800 million rows across 15 tables. For this volume of data, I found that COPY command for the key tables was far faster than the usual pg_dump. However, this only dumps the data.
Is there a way to dump only data this way, but also then dump the database creation script -- which will create the tables, and separately indexes? I'm thinking of the following sequence:
COPY all tables into file system. Just 15 files, therefore.
RSYNC these files to the new server.
On the new server, Create a fresh PG database: tables, foreign keys etc. But no indexes yet.
In this fresh PG database, COPY FROM all the tables, one by one. Slightly painful but worth it.
Then create the indexes, all in one go.
I'm seeing ways to get some scripts for #3 and #5 dumped by PG on the older server. The complication in the PG world is the OIDs for tables etc. Will this affect the tables and data on the new server? The pg_dump reference is a bit cryptic in its help material.
For #3, jsut the creation of the "schema" and tables, I could do this:
pg_dump --schema-only mybigdb
Will this carry all the OIDs and other complications, thereby being a good way to complete step #3?
And for only #5, not sure what I'd do. Just the indexes etc. Will I have to look inside the "schema only" file and separate out the indexes?
Appreciate any pointers.
Funny, the sequence you are describing is a pretty good description of what pg_dump/pg_restore does (with some oversights: e.g., for performance reasons, you wouldn't define a foreign key before you restore the data).
So I think that you should use pg_dump instead of reinventing the wheel.
You can get better performance out of pg_dump as follows:
Use the directory format (-Fd) and parallelize the COPY commands with -j number-of-jobs.
Restore the dump with pg_restore and use -j number-of-jobs for several parallel workers for data restore and index creation.
The only drawback is that you have to wait for pg_dump to finish before you can start pg_restore if you use the directory format. If that is a killer, you could use the custom format (-Fc) and pipe the result into pg_restore. That won't allow you to use -j with pg_dump, but you can still parallelize index creation and such with pg_restore -j.
While reading the mongodump documentation, I came across this information.
"mongodump only captures the documents in the database in its backup data and does not include index data. mongorestore or mongod must then rebuild the indexes after restoring data."
Considering that indexes are also critical piece of the database puzzle and they form required to be rebuilt, why doesn't mongodump have an option of taking the backups with indexes?
I get that there are two advantages of not backing up indexes as a default option:
1. We save time which would otherwise be required for backup and restore of indexes.
2. We save space required for storing the backups.
But why not have it as an option at all?
mongodump creates a binary export of data from a MongoDB database (in BSON format). The index definitions are backed up in <dbname>.metadata.json files, so mongorestore can recreate the original data & indexes.
There are two main reasons that the actual indexes cannot be backed up with mongodump:
Indexes point to locations in the data files. The data files do not exist if you are only exporting the documents in the data files (rather than taking a full file copy of the data files).
The format of indexes on disk is storage-engine specific, whereas mongodump is intended to be storage-engine independent.
If you want a full backup of data & indexes, you need to backup by copying the underlying data files (typically by using filesystem or EBS snapshots). This is a more common option for larger deployments, as mongodump requires reading all data into the mongod process (which will evict some of your working set if your database is larger than memory).
I've got a Postgres 9.0 database which frequently I took data dumps of it.
This database has a lot of indexes and everytime I restore a dump postgres starts background task vacuum cleaner (is that right?). That task consumes much processing time and memory to recreate indexes of the restored dump.
My question is:
Is there a way to dump the database data and the indexes of that database?
If there is a way, will worth the effort (I meant dumping the data with the indexes will perform better than vacuum cleaner)?
Oracle has some the "data pump" command a faster way to imp and exp. Does postgres have something similar?
Thanks in advance,
Andre
If you use pg_dump twice, once with --schema-only, and once with --data-only, you can cut the schema-only output in two parts: the first with the bare table definitions and the final part with the constraints and indexes.
Something similar can probably be done with pg_restore.
Best Practice is probably to
restore the schema without indexes
and possibly without constraints,
load the data,
then create the constraints,
and create the indexes.
If an index exists, a bulk load will make PostgreSQL write to the database and to the index. And a bulk load will make your table statistics useless. But if you load data first, then create the index, the stats are automatically up to date.
We store scripts that create indexes and scripts that create tables in different files under version control. This is why.
In your case, changing autovacuum settings might help you. You might also consider disabling autovacuum for some tables or for all tables, but that might be a little extreme.
When I make a backup in postgres 8 it only backs up the schemas and data, but not the indexes. How can i do this?
Sounds like you're making a backup using the pg_dump utility. That saves the information needed to recreate the database from scratch. You don't need to dump the information in the indexes for that to work. You have the schema, and the schema includes the index definitions. If you load this backup, the indexes will be rebuilt from the data, the same way they were created in the first place: built as new rows are added.
If you want to do a physical backup of the database blocks on disk, which will include the indexes, you need to do a PITR backup instead. That's a much more complicated procedure, but the resulting backup will be instantly usable. The pg_dump style backups can take quite some time to restore.
If I understand you correctly, you want a dump of the indexes as well as the original table data.
pg_dump will output CREATE INDEX statements at the end of the dump, which will recreate the indexes in the new database.
You can do a PITR backup as suggested by Greg Smith, or stop the database and just copy the binaries.