I'm using mongodump to migrate a 200+ GB database to directoryPerDB. In the process of trying to test a smaller dataset for finding the optimal number of parallel dumps I could run (per this answer), my mongodump keeps resuming from where I left off.
So if I cancel it at 10% and then re-run it, it picks right back at 10%.
I've tried deleting the dump folder, exporting to an archive file, changing my target and current folders, and restarting mongod, but I can't find any way to force mongodump to start over from the beginning on the collection.
Thanks in advance for your help!
Related
I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db
Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.
I have a database with two schemas. The first schema was built weeks ago and have been stable ever since. The second schema under the same database in the same server is populated through an ETL process of the original schema. I built the data twice (approx 20 hours to build). I can see the schema takes the 100GB it requires from the hard drive. Upon connecting to pgadmin4 or datagrip, I can see instantly the data gets truncated (deleted) freeing up the space it takes. After the second build before connecting to anything, I made a File System Level Backup (tar file).
The first backup (3rd trial to keep the schema alive) I uncompressed the tar file and moved it to position keeping the uncompressed version of the folder in place.
I connected to pgadmin4 and the data disappeared again. Then I edited the postgres configuration data directory to point at the folder where I initially uncompressed the tar file to avoid copying and pasting in 2 hours. Launched postgres server again and boom, the schema truncates the data again.
I have no clue how or why this happens. Any advice on where to look next time before relaunching the server to pinpoint where that truncate command is fired from?
PS:
The tar file is a compression of "main" folder inside .../postgresql/11/ folder.
Thanks in advance.
I type the following below:
root#:/home/deploy# mongorestore --db=dbname --collection=collectionname pathtobackupfolder/collectionname.bson
Here's the output:
2016-07-16T00:08:03.513-0400 checking for collection data in pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.525-0400 reading metadata file from pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.526-0400 restoring collectionname from file pathtobackupfolder/collectionname.bson
Killed
What's going on? I can't find anything on Google or Stackoverflow about a mongorestore resulting in "Killed". The backup folder that I'm restoring from is a collection of 12875 documents, and yet everytime I run the mongorestore, it always says "Killed", and always restores a different number that is less than the total number: 4793, 2000, 4000, etc.
The machine that I'm performing this call on is "Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-71-generic x86_64)" from Digital Ocean
Any help is appreciated. Thanks.
After trying the mongorestore command for the 5th and 6th time after posting this question, this time more explicit output came out that indicated it was a memory issue specific to Digital Ocean. I followed https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04 and the restore finished completely without errors.
If you are trying to solve it in docker, just increase swap memory in settings.json file
I have a replica-set.
And I run out of disk space on my secondary instances.
There is no space on disk to run db.repairDatabase()
Is there any other way to free some disk space?
I was thinking:
bring secondary down
Delete all data
run db.repairDatabase() if deleting data will allow it
Bring it back up.
WIll this work?
UPDATE
Worth to mention that I can't currently SSH to servers. Only using mongo client now.
No that won't work - there has to be a database there to run db.repairDatabase() on. However, what works just as well is to bring the secondary down, delete the database files and then bring it back it up. This will force a re-sync with the primary which will in effect do the same thing as a db.repairDatabase() as it will recreate the data files from scratch.
However, in order to delete the datafiles you'll need to ssh in to the instance. If you cannot ssh in you have fairly significant issues that will interfere with any attempt to recover the secondary.
i am newbie to mongoDB ,as i start working with test application (ASP.Net) found that the db consuming large disk space.I was wondering that collections have only small piece of data like a word.So does anybody can shed some light on this ?
Please correct me if am wrong.Thanks in advance
Mongo doesn't shrink previously allocated structures because that would slow database down. When you need it, run the repair procedure to rebuild DB and reclaim unused space. On live project you should schedule it to off-peak hours.
From the command line:
mongod --repair
From the shell (you have to do for all dbs including local if you go this route):
db.repairDatabase();