Why mongodump does not backup indexes? - mongodb

While reading the mongodump documentation, I came across this information.
"mongodump only captures the documents in the database in its backup data and does not include index data. mongorestore or mongod must then rebuild the indexes after restoring data."
Considering that indexes are also critical piece of the database puzzle and they form required to be rebuilt, why doesn't mongodump have an option of taking the backups with indexes?
I get that there are two advantages of not backing up indexes as a default option:
1. We save time which would otherwise be required for backup and restore of indexes.
2. We save space required for storing the backups.
But why not have it as an option at all?

mongodump creates a binary export of data from a MongoDB database (in BSON format). The index definitions are backed up in <dbname>.metadata.json files, so mongorestore can recreate the original data & indexes.
There are two main reasons that the actual indexes cannot be backed up with mongodump:
Indexes point to locations in the data files. The data files do not exist if you are only exporting the documents in the data files (rather than taking a full file copy of the data files).
The format of indexes on disk is storage-engine specific, whereas mongodump is intended to be storage-engine independent.
If you want a full backup of data & indexes, you need to backup by copying the underlying data files (typically by using filesystem or EBS snapshots). This is a more common option for larger deployments, as mongodump requires reading all data into the mongod process (which will evict some of your working set if your database is larger than memory).

Related

Postgres backup size is getting double of database size itself

I'm totally new to Postgres. I had been needed to take Postgres database backup. I have used below command to take backup.
pg_dump Live backup > Livebackup.bak
The backup was being taken. However, backup file size is ~double of original database size itself. (Database size was 43GB and backup size is 86GB).The backup file should have around or ~database size.
I have already did full vacuum to this database. but still the backup size is larger.
Per the pg_dump docs, the default format is "plain text". That means that pg_dump is generating a large SQL script which can then be imported using the psql command line.
Because it's plain text, any numeric or time types will take up much more space than they do in the database (where they can be stored as integers, floating point numbers, or in a binary format, etc). It's not a surprise to me that your backup file is larger than your database.
I recommend looking at the -F/--format and -Z/--compress options for pg_dump and figure out the best method for your use case.
I routinely have backups which take about 10-15% of the database size, by using those flags.

mongoimport without dropping the data first

I reset my database every night with a mongoimport command. Unfortunately, I understand that it drops the database first then fills it again.
This means that my database is being queried while half-filled. Is there a way to make the mongoimport atomic ? This would be achieved by first filling another collection, dropping the first then renaming the second.
Is that a builtin feature of mongoimport ?
Thanks,
It's unclear what behaviour you want from your nightly process.
If your nightly process is responsible for creating a new dataset then dropping everything first makes sense. But if your nightly process is responsible for adding to an existing dataset then that might suggest using mongorestore (without --drop) since mongorestore's behaviour is:
mongorestore can create a new database or add data to an existing database. However, mongorestore performs inserts only and does not perform updates. That is, if restoring documents to an existing database and collection and existing documents have the same value _id field as the to-be-restored documents, mongorestore will not overwrite those documents.
However, those concerns seem to be secondary to your need to import / restore into your database while it is still in use. I don't think either mongoimport or mongorestore are viable 'write mechanisms' for use when your database is online and available for reads. From your question, you are clearly aware that issues can arise from this but there is no Mongo feature to resolve this for you. You can either:
Take your system offline during the mongoimport or mongorestore and then bring it backonline once that process is complete and verified
Use mongoimport or mongorestore to create a side-by-side database and then once this database is ready switch your application to read from that database. This is a variant of a Blue/Green or A/B deployment model.

Moving Postgresql - download all data and table creation, but without indexes

I'm trying to move postgresql between two servers. There's rsync connectivity between the two servers.
My tables are large, around 200GB in total with nearly 800 million rows across 15 tables. For this volume of data, I found that COPY command for the key tables was far faster than the usual pg_dump. However, this only dumps the data.
Is there a way to dump only data this way, but also then dump the database creation script -- which will create the tables, and separately indexes? I'm thinking of the following sequence:
COPY all tables into file system. Just 15 files, therefore.
RSYNC these files to the new server.
On the new server, Create a fresh PG database: tables, foreign keys etc. But no indexes yet.
In this fresh PG database, COPY FROM all the tables, one by one. Slightly painful but worth it.
Then create the indexes, all in one go.
I'm seeing ways to get some scripts for #3 and #5 dumped by PG on the older server. The complication in the PG world is the OIDs for tables etc. Will this affect the tables and data on the new server? The pg_dump reference is a bit cryptic in its help material.
For #3, jsut the creation of the "schema" and tables, I could do this:
pg_dump --schema-only mybigdb
Will this carry all the OIDs and other complications, thereby being a good way to complete step #3?
And for only #5, not sure what I'd do. Just the indexes etc. Will I have to look inside the "schema only" file and separate out the indexes?
Appreciate any pointers.
Funny, the sequence you are describing is a pretty good description of what pg_dump/pg_restore does (with some oversights: e.g., for performance reasons, you wouldn't define a foreign key before you restore the data).
So I think that you should use pg_dump instead of reinventing the wheel.
You can get better performance out of pg_dump as follows:
Use the directory format (-Fd) and parallelize the COPY commands with -j number-of-jobs.
Restore the dump with pg_restore and use -j number-of-jobs for several parallel workers for data restore and index creation.
The only drawback is that you have to wait for pg_dump to finish before you can start pg_restore if you use the directory format. If that is a killer, you could use the custom format (-Fc) and pipe the result into pg_restore. That won't allow you to use -j with pg_dump, but you can still parallelize index creation and such with pg_restore -j.

Dump Postgres 9.3 data with indexes

When using
pg_dump --section post-data
I get a dump that contains the definitions of the indexes but not the index data. As the database I'm working on is really big and complex, recreating the indexes takes a lot of time.
Is there a way get the index data into my dump, so that I can restore an actually working database?
There is no way to include index data in a logical dump (pg_dump).
There's no way to extract index data from the SQL level, where pg_dump operates, nor any way to write it back. Indexes refer to the physical structure of the table (tuple IDs by page and offset) in a way that isn't preserved across a dump and reload anyway.
You can use a low-level disk copy using pg_basebackup if you want to copy the whole DB, indexes and all. Unlike a pg_dump you can't restore this to a different PostgreSQL version, you can't dump just one database, etc; it's all or nothing.

How can I backup everything in Postgres 8, including indexes?

When I make a backup in postgres 8 it only backs up the schemas and data, but not the indexes. How can i do this?
Sounds like you're making a backup using the pg_dump utility. That saves the information needed to recreate the database from scratch. You don't need to dump the information in the indexes for that to work. You have the schema, and the schema includes the index definitions. If you load this backup, the indexes will be rebuilt from the data, the same way they were created in the first place: built as new rows are added.
If you want to do a physical backup of the database blocks on disk, which will include the indexes, you need to do a PITR backup instead. That's a much more complicated procedure, but the resulting backup will be instantly usable. The pg_dump style backups can take quite some time to restore.
If I understand you correctly, you want a dump of the indexes as well as the original table data.
pg_dump will output CREATE INDEX statements at the end of the dump, which will recreate the indexes in the new database.
You can do a PITR backup as suggested by Greg Smith, or stop the database and just copy the binaries.