I have a mongo collection(s) with 2.5 million data and that may grow upto 3 million. I am using spring batch and am trying to copy that collection to another collection. Approaches I have used are as follows :
Inside a tasklet, I have Created a ProcessBuilder object and called a shell script which executes a mongo query. Content of shell script is as follows :
> mongo $serverURL/$dbName js-file-to-execute.js
// js file contains copy command (db.collection.copyto('newCollection'))
For less data (< 200 k) it works fine but for 2 million data it hangs the mongo server and the job got failed with Socket Exception
Used a mongo template and executed a query
dbMongoTemplate.getDb().getCollection("collection").aggregate(Arrays.asList((DBObject) new BasicDBObject("$out","newCollection")));
This executes a mongo aggregate query db.collection.aggregate({$out : "newCollection"})
This also worked for collections with less data but for larger data set it keeps running until socket time out occurs and fails the job at the end.
Please suggest efficient way to copy data?
//Fastest way to copy a Collection in MongoDB
db.getCollection('OriginalCollection').aggregate([ { $out: "ClonedCollection" } ]);
This command copied a collection of 2 million records in about 2-3 minutes.
https://gist.github.com/tejzpr/ff37324a8c26d13fef08c318278c0718
To copy this collection I will sugest using mongodump/mongoexport
mongodump --db databaseName --collection collectionName --out directory-path
then copy directory directory-path and then restore on target machine using
mongorestore --db databaseName --collection collectionName directory-path
Related
I need to clean a mongodb collection of 200Tb, and delete older timstamp. I am trying to build a new collection from the new, and run a delete query, since, running a del on the present collection that is in use, will slow down the other requests to it. I have thought of cloning a new collection either by taking a dump of the following collection, or by create a read and and write script, such that, it will read from the present collection and write to the cloned collection. My question is is a read/write operation of a batch ex: 1000 read and write faster than a dump ?
EDIT:
I found this, this and this article, and want to know, if writing a script in the above mentioned way the same as creating a ssh pipe of read and write ? ex: is a node/python script to fetch 1000 rows from a collection and insert that to a clone collection the same as ssh *** ". /etc/profile; mongodump -h sourceHost -d yourDatabase … | mongorestore -h targetHost -d yourDatabase ?
I would suggest this approach:
Rename the collection. Your application will immediately create a new empty collection with the old name when it tries to insert some data. You may create some indexes.
Run mongoexport/mongoimport to import the valid data, i.e. skip the outdated.
Yes, in general mongodump/mongorestore might be faster, however at mongoexport you can define a query and limit the data which is exported. Could be like this:
mongoexport --uri "..." --db=yourDatabase --collection=collection --query='{timestamp: {$gt: ISODate("2022-01-010")}}' | mongoimport --uri "..." --db=yourDatabase --collection=collection --numInsertionWorkers=10
Utilize parameter numInsertionWorkers to run multiple workers. It will speed up your inserts.
So you run a sharded cluster? If yes, then you should use sh.splitAt() on the new collection, see How to copy a collection from one database to another in MongoDB
When I want to remove all objects from my mongoDB collection comments I do this with this command:
mongo $MONGODB_URI --eval 'db.comments.deleteMany({});'
However, this is super slow when there are millions of records inside the collection.
In a relational db like Postgres I'd simply copy the structure of the collection, create a comments2 collection, drop the comments collection, and rename comments2 to comments.
Is this possible to do in MongoDB as well?
Or are there any other tricks to speed up the progress?
Thanks, the answers inspired my own solution. I forgot that MongoDB doesn't have a schema like a relationalDB.
So what I did is this:
1. dump an empty collection + the indexes of the collection
mongodump --host=127.0.0.1 --port=7001 --db=coral --collection=comments --query='{"id": "doesntexist"}' --out=./dump
This will create a folder ./dump with the contents comments.bson (empty) and comments.metadata.json
2. Drop the comments collection
mongo mongodb://127.0.0.1:7001/coral --eval 'db.comments.drop();'
3. Import new data new_comments.json (different from comments.bson)
mongoimport --uri=mongodb://127.0.0.1:7001/coral --file=new_comments.json --collection comments --numInsertionWorkers 12
This is way faster than first adding the indexes, and then importing.
4. Add indexes back
mongorestore --uri=mongodb://127.0.0.1:7001/coral --dir dump/coral --nsInclude coral.comments --numInsertionWorkersPerCollection 12
Note that --numInsertionWorkers speeds up to process by dividing the work over 12 cpus.
How many cpus do you have can be found on OSx with:
sysctl -n hw.ncpu
db.cities.aggregate([{ $match: {} }, { $out: "collection2" }]) in case you can login to the mongo prompt and simply drop the previous collection.
Otherwise, the approach you have posted is the one.
mongoexport.exe /host: /port: /db:test /collection:collection1 /out:collection1.json
mongoimport.exe /host: /port: /db:test /collection:collection2 /file:collection1.json
Thanks,
Neha
For mongodb version >=4.0 you can do this via db.comments.renameCollection("comments2") ,but it is kind of resource intensive operation and for bigger collections better you do mongodump/mongorestore. So the best action steps are:
mongodump -d x -c comments -out dump.bson
>use x
>db.comments.drop()
mongorestore -d x -c comments2 dump.bson
Plese, note deleteMany({}) is even more resource intensive operation since it will create oplog single entry for every document you delete and propagate to all replicaSet members.
I have a mongodb database with version 3.6.3. I have another mongodb database (on another machine) using version 4.4.5 with no documents in it. I want to put the data from the v3.6.3 into the v4.4.5 database. Can I safetly do this using mongoexport and then mongoimport or do I need to perform more steps?
Yes, mongoexport writes the documents out to a JSON file, and mongoimport can read that file and insert the documents to the new database.
These will transfer only the documents, but not index information. You many want to consider mongodump/mongorestore if you also need to move indexes.
I'm using Mongo 3.2. I have two databases on my localhost named client1 and client2.
Now client1 contains a collection named users.
I want to clone this collection to client2.
I have tried:-
use client2
db.cloneCollection('localhost:27017', 'client1.users',
{ 'active' : true } )
This outputs
{
"ok" : 0.0,
"errmsg" : "can't cloneCollection from self"
}
Is cloning a collection from one db to another on the same server prohibited?
Few things :
In general cloneCollection is used for different mongo instances but not to copy on same instances.
Also if you're using v4.2 you should stop using copyDB & cloneCollection cause they're deprecated compatibility-with-v4.2 & start using mongodump and mongorestore or mongoexport & mongoimport.
I would suggest to use mongodump & mongorestore :
Cause mongodump would preserve MongoDB's data types i.e.; bson types.
mongodump creates a binary where as mongoexport would convert bson to json & again mongoimport will convert json to bson while writing, which is why they're slow. You can use mongoexport & mongoimport when you wanted to analyze your collections data visually or use json data for any other purpose.
You can run below script in shell
declare - a collections = ("collectionName1" "collectionName2")
for i in "${collections[#]}"
do
echo "$i"
mongodump --host "All-shards" --username=uname --password password --ssl --authenticationDatabase admin --db dbname --collection "$i"
mongorestore --host=host-shard-name --port=27017 --username=uname --password=psswrd --ssl --authenticationDatabase=admin --db=dbname --collection= "$i" ./dump/dbName/"$i".bson;
done
To use mongodump, you must run mongodump against a running mongod or mongos instance. So these commands are being run expecting mongo is properly installed & path setup is good, if not you can navigate to mongo folder & run like ./mongodump & ./mongorestore. Above script will be useful if you wanted to backup multiple collections, You need specify few things in script like :
mongodump--host "All-shards" -> Here you need to specify all shards if your MongoDB is a replica set, if not you can specify localhost:27017.
mongorestore --host=host-shard-name -> You've to specify one shard of replica set, else your localhost, Few things here can be optional --ssl, --username, --password.
So mongodump will create a folder named dump for first time which will have the sub-folders with dbNames & each sub-folder will has bson files respective to their collection names dumped, So you need to refer dbName in restore command & collection name will be taken from variable i -> ./dump/dbName/"$i".bson
Note : MongoDB v3.2 is so old & in cloud based MongoDB service Mongo-atlas it has already reached it's end of lifecycle, So please upgrade asap. If you're looking for a free mongo instance or starting with MongoDB - you can try atlas.
db.cloneCollection() copies data directly between MongoDB instances.
https://docs.mongodb.com/v3.2/reference/method/db.cloneCollection/
That means you cannot clone inside the same mongod instance. Use mongoexport and mongoimport to clone your collection.
Since 4.2 MongoDb introduces $merge operator which allows copy from db1.collection to db2.collection.
I use mongochef as a UI client for my mongo database. Now I have collection which consists of 12,000 records. I want to export them using the mongochef.
I have tried with the export option(available) which is working fine up to 3000 documents. But if the number of records gets increasing the system is getting hung up.
Can you please let me know the best way to export all the documents in a nice way using mongochef.
Thanks.
Finally I came to conclusion to use the mongo using terminal which the best way to use(efficient).
read about the primary and secondary databases and executed the following query:
mongoexport --username user --password pass --host host --db database --collection coll --out file_name.json