Is it possible to do bulk update/upsert (not insert) in MongoDB?
If yes, please point me to any docs related to this?
Thanks
You can use the command line program mongoimport it should be in your MongoDB bin dir ...
There are two options you'll want to look into to use upsert ...
--upsert insert or
update objects that already exist
--upsertFields arg comma-separated fields for the query
part of the
upsert. You should make sure this is indexed
More info here: http://www.mongodb.org/display/DOCS/Import+Export+Tools
Or just do ...
$ mongoimport --help
mongo can execute .js file.
you can push all you update commands in a js file.
t.js
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81a"},{$set:{algres:[]}},false,true);
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81b"},{$set:{algres:[]}},false,true);
then,
mongo 127.0.0.1/test t.js
Bulk updates can also be done in batches as found in the documentation:
MongoDB Bulk Methods
I use these to import CSV files that I need to massage a bit before importing the data. Its kinda slow when dealing with updates, but it did my 50K document updates in about 83 seconds, which is far slower than mongoimport command.
Related
I'm trying to write a mongodump / mongorestore script that would copy our data from the production environment to staging once a week.
Problem is, I need to filter out one of the collections.
I was sure I'd find a way to apply a query only on a specific collection during the mongodump, but it seems like the query statement affects all cloned collections.
So currently I'm running one dump-restore for all the other collections, and one for this specific collection with a query on it.
Am I missing something? Is there a better way to achieve this goal?
Thanks!
It is possible.
--excludeCollection=<string>
Excludes the specified collection from the mongodump output. To exclude multiple collections, specify the --excludeCollection multiple times.
Example
mongodump --db=test --excludeCollection=users --excludeCollection=salaries
See Details here.
Important mongodump writes to /dump folder. If it's already there, it will overwrite everything.
If you need that data rename the folder or give mongodump an --out directory. Otherwise you don't need to worry.
I have the following two documents in a mongo collection:
{
_id: "123",
name: "n1"
}
{
_id: "234",
name: "n2"
}
Let's suppose I read those two documents, and make changes, for example, add "!" to the end of the name.
I now want to save the two documents back.
For a single document, there's save, for new documents, I can use insert to save an array of documents.
What is the solution for saving updates to those two documents? The update command asks for a query, but I don't need a query, I already have the documents, I just want to save them back...
I can update one by one, but if that was 2 million documents instead of just two this would not work so well.
One thing to add: we are currently using Mongo v2.4, we can move to 2.6 if Bulk operations are the only solution for this (as that was added in 2.6)
For this you have 2 options (present in 2.6),
Bulk operations like Mongoimport, mongorestore.
Upsert command for each document.
First option goes better with huge no. of documents (which is your case). In Mongoimport you can use --upsert flag to overwrite the existing documents. You can use --upsert --drop flags to drop existing data and set new document.
This options scales well with lot amount of data in terms of IO and system util.
Upsert command works on in-place update principle. You can use it with a filter but drawback is it works in serial fashion and shouldn't be used for huge data size. Performant only with small data.
When you switch off write concerns, a save doesn't block until the database wrote and returns almost immediately. So with WriteConcern.Unacknowledged, storing 2 million documents with save is a lot quicker than you would think. But no write concerns have the drawback that you won't get any errors from the database.
When you don't want to save them one-by-one, bulk operations are the way to go.
I am setting up a new ElasticSearch instance using the mongo-connector python tool. The tool is working, but only imported around ~100k entries from the mongodb oplog.
However, my collections contain millions of records... Is there a way to pass all the records from each collection through the oplog without modifying the records in any way?
Following the advice of Sammaye, I solved this problem by iterating over the collection, converting to json, and posting it to the index API via curl. Thanks for the suggestion!
I want to export around 5000 MongoDB Collections to JSON format using a single command. Is it possible to do so?
Please take a look at this script.
It's a bash script that basically does the following:
Read the collections from MongoDB to a variable
Iterate and call mongoexport for each collection
Is it possible to write JSON that will cause mongoimport to append to existing arrays during an upsert? (mongodb 2.0)
It appears that, as of now (9/26/11) this is not possible. Users with this problem are encouraged to write their own import script.