How to copy some of the data from one MongoDB to another - mongodb

I have an existing MongoDB dump and I would like to cherry pick some of the data to a clean DB.
Is dumping a single collection and restoring them (mongodump & mongorestore) the way to do this?

You can to this by using the --filter '<JSON>' option on mongorestore.
That's like the first argument of db.find().
If you just want to filter by collection --collection <collection>
See more info in the doc

Related

Is read and insert operation faster than dump in mongodb

I need to clean a mongodb collection of 200Tb, and delete older timstamp. I am trying to build a new collection from the new, and run a delete query, since, running a del on the present collection that is in use, will slow down the other requests to it. I have thought of cloning a new collection either by taking a dump of the following collection, or by create a read and and write script, such that, it will read from the present collection and write to the cloned collection. My question is is a read/write operation of a batch ex: 1000 read and write faster than a dump ?
EDIT:
I found this, this and this article, and want to know, if writing a script in the above mentioned way the same as creating a ssh pipe of read and write ? ex: is a node/python script to fetch 1000 rows from a collection and insert that to a clone collection the same as ssh *** ". /etc/profile; mongodump -h sourceHost -d yourDatabase … | mongorestore -h targetHost -d yourDatabase ?
I would suggest this approach:
Rename the collection. Your application will immediately create a new empty collection with the old name when it tries to insert some data. You may create some indexes.
Run mongoexport/mongoimport to import the valid data, i.e. skip the outdated.
Yes, in general mongodump/mongorestore might be faster, however at mongoexport you can define a query and limit the data which is exported. Could be like this:
mongoexport --uri "..." --db=yourDatabase --collection=collection --query='{timestamp: {$gt: ISODate("2022-01-010")}}' | mongoimport --uri "..." --db=yourDatabase --collection=collection --numInsertionWorkers=10
Utilize parameter numInsertionWorkers to run multiple workers. It will speed up your inserts.
So you run a sharded cluster? If yes, then you should use sh.splitAt() on the new collection, see How to copy a collection from one database to another in MongoDB

Faster way to remove all entries from mongodb collection by dropping collection and recreating schema

When I want to remove all objects from my mongoDB collection comments I do this with this command:
mongo $MONGODB_URI --eval 'db.comments.deleteMany({});'
However, this is super slow when there are millions of records inside the collection.
In a relational db like Postgres I'd simply copy the structure of the collection, create a comments2 collection, drop the comments collection, and rename comments2 to comments.
Is this possible to do in MongoDB as well?
Or are there any other tricks to speed up the progress?
Thanks, the answers inspired my own solution. I forgot that MongoDB doesn't have a schema like a relationalDB.
So what I did is this:
1. dump an empty collection + the indexes of the collection
mongodump --host=127.0.0.1 --port=7001 --db=coral --collection=comments --query='{"id": "doesntexist"}' --out=./dump
This will create a folder ./dump with the contents comments.bson (empty) and comments.metadata.json
2. Drop the comments collection
mongo mongodb://127.0.0.1:7001/coral --eval 'db.comments.drop();'
3. Import new data new_comments.json (different from comments.bson)
mongoimport --uri=mongodb://127.0.0.1:7001/coral --file=new_comments.json --collection comments --numInsertionWorkers 12
This is way faster than first adding the indexes, and then importing.
4. Add indexes back
mongorestore --uri=mongodb://127.0.0.1:7001/coral --dir dump/coral --nsInclude coral.comments --numInsertionWorkersPerCollection 12
Note that --numInsertionWorkers speeds up to process by dividing the work over 12 cpus.
How many cpus do you have can be found on OSx with:
sysctl -n hw.ncpu
db.cities.aggregate([{ $match: {} }, { $out: "collection2" }]) in case you can login to the mongo prompt and simply drop the previous collection.
Otherwise, the approach you have posted is the one.
mongoexport.exe /host: /port: /db:test /collection:collection1 /out:collection1.json
mongoimport.exe /host: /port: /db:test /collection:collection2 /file:collection1.json
Thanks,
Neha
For mongodb version >=4.0 you can do this via db.comments.renameCollection("comments2") ,but it is kind of resource intensive operation and for bigger collections better you do mongodump/mongorestore. So the best action steps are:
mongodump -d x -c comments -out dump.bson
>use x
>db.comments.drop()
mongorestore -d x -c comments2 dump.bson
Plese, note deleteMany({}) is even more resource intensive operation since it will create oplog single entry for every document you delete and propagate to all replicaSet members.

How to export data from a mongo instance and import to other one?

I want to export an entire mongodb from A mongo to B mongo that are completely equals in terms of structure. They have the same collections and the collections also are equals to.
The mongo instances are on different servers something like staging and dev environments.
The idea is to do it in just one command like:
mongoexport --host="mongodb0.example.com:27017" --db=reporting <to-other-mongo-host>
Is there an way to do it in "one shot" or I have to do a mongoexport and then a mongoimport?
For export
mongodump -d <database_name> -o <directory_backup>
For restore
mongorestore -d <database_name> <directory_backup>
Not recommend for big data storages. It is very slow and once you get past 10/20GB of data it can take hours to restore.

MongoDB: restore a collection in the mongo shell

I am trying to import/restore a single collection from within MongoDB (i.e. mongorestore cannot be accessed, I think ...?).
Is it possible? What is the command? Ideally, I'd like to include indexes as well. The backup has been produced by mongodump.
Specifically, I am using the IntelliShell from the excellent MongoChef. I perform other commands in this as well, such as renaming existing collections first.

Mongo restore filter is not working

I dump my mongo database and i want to restore it using filter, I am doing the following steps
I am writing the following command
mongorestore -h mongo1.xxx.projects.xyz.biz --db xxx --collection photos --filter '{"_id" : { "$in": [ObjectId("54614b85ec8d83183f368a0d"), ObjectId("542c3b7b91201b49132d16d0"), ObjectId("546cac1691201b7b5438cac8"),ObjectId("546cac96ec8d830a7b33bfc7"),ObjectId("546bfd14ec8d830a8ad63db3"),ObjectId("5487691aec8d837be8426106"),ObjectId("54513bbeec8d8308418ce8f2"),ObjectId("545139b791201b63b43cb135"),ObjectId("549ca49891201b6562012d42"),ObjectId("54a32fdbec8d83019291d433"),ObjectId("54a42132ec8d83019bb5e23e"),ObjectId("54a423ce91201b64f73a9752"),ObjectId("54a42346ec8d83019291d444"),ObjectId("54a4246691201b64f73a9753"),ObjectId("54a425e3ec8d8301a1c3b85d"),ObjectId("54a4264291201b64f29d0db8"),ObjectId("54a4268fec8d8301a1c3b85e"),ObjectId("54a4275e91201b64f73a9755"),ObjectId("54a42e3f91201b64e8ed20d3"),ObjectId("54a42e6c91201b64e8ed20d4"),ObjectId("54a42e96ec8d8301a1c3b860"),ObjectId("54a43473ec8d83019bb5e253"),ObjectId("54a43c3a91201b64f29d0dc0"),ObjectId("54a43f05ec8d83019291d453"),ObjectId("54a43ff591201b64f29d0dc3"),ObjectId("54a4425c91201b64e8ed20dd"),ObjectId("54a442e1ec8d8301a1c3b866"),ObjectId("54a44767ec8d83019bb5e25c"),ObjectId("54a447daec8d8301a1c3b868"),ObjectId("54a450e291201b64f29d0dc9"),ObjectId("54a47e1a91201b64e8ed20e2"),ObjectId("54a48896ec8d83019bb5e26f"),ObjectId("54a48984ec8d830199f07151"),ObjectId("54a48b8c91201b64e8ed20ef"),ObjectId("54a493b1ec8d830199f07158"),ObjectId("54a495b3ec8d8301a1c3b89b"),ObjectId("54a73d11ec8d830199f0718b")]}}' /home/ubuntu/backup/05012015mongodump/dump/xxx/photos.bson
It says
2081 objects found
37 objects processed
But my data on mongo server is the same. Nothing restored
Any ideas?
Mongorestore is only for insert
In the official MongoDB mongorestore documentation under the Behavior section the following is written (emphasis mine):
Insert Only
mongorestore can create a new database or add data to an existing database. However, mongorestore performs inserts only and does not perform updates. That is, if restoring documents to an existing database and collection and existing documents have the same value _id field as the to-be-restored documents, mongorestore will not overwrite those documents.
The same thing that Niel Lunn stated in his comment.