Import a whole dataset into one MongoDB document - mongodb

I'm trying to find a way to import a whole dataset into one MongoDB document. Every solution I've tried only inserts many documents instead of one. What I've tried so far.
mongoimport --db=dbName --collection=collectionName --drop --file=file.json --jsonArray
Thank you in advance!

Problem is solved, thanks to both prasad_ and Joe.
After using the mongoimport that is shown in my question, I just aggregated using $group to get everything into one document. Though there was a problem which Joe commented, that a document is limited to 16MB. Which meant that I had to remove some data from the dataset that I wasn't using.

Related

mongodump - query for one collection

I'm trying to write a mongodump / mongorestore script that would copy our data from the production environment to staging once a week.
Problem is, I need to filter out one of the collections.
I was sure I'd find a way to apply a query only on a specific collection during the mongodump, but it seems like the query statement affects all cloned collections.
So currently I'm running one dump-restore for all the other collections, and one for this specific collection with a query on it.
Am I missing something? Is there a better way to achieve this goal?
Thanks!
It is possible.
--excludeCollection=<string>
Excludes the specified collection from the mongodump output. To exclude multiple collections, specify the --excludeCollection multiple times.
Example
mongodump --db=test --excludeCollection=users --excludeCollection=salaries
See Details here.
Important mongodump writes to /dump folder. If it's already there, it will overwrite everything.
If you need that data rename the folder or give mongodump an --out directory. Otherwise you don't need to worry.

is there a way to dump an entire mongodb collection into the oplog?

I am setting up a new ElasticSearch instance using the mongo-connector python tool. The tool is working, but only imported around ~100k entries from the mongodb oplog.
However, my collections contain millions of records... Is there a way to pass all the records from each collection through the oplog without modifying the records in any way?
Following the advice of Sammaye, I solved this problem by iterating over the collection, converting to json, and posting it to the index API via curl. Thanks for the suggestion!

Imports to Mongo limited to 16MB if using jsonArray

I am using mongo 2.6.1. I want to import data from a json file > 16 MB. The json is an array of documents. According to their documentation if I use the --jsonArray option, the file can only be 16MB, see http://docs.mongodb.org/manual/reference/program/mongoimport/
Strangely, I have already managed to import data > 16 MB (24MB) no problem using mongoimport, by doing:
mongoimport -db mydb --collection product --file products.json --jsonArray
So what is this 16MB limit then?
16 MB is a MongoDB BSON document size limit. It means that no document inside MongoDB could exceed 16 MB.
Note that JSON representation of MongoDB document could exceed this limit, since BSON is more compact.
The problem with --jsonArray flag is that mongoimport reads the whole .json file as a single document first, and then performs import on each of its elements, thus suffering from BSON document size limit.
Solution for new MongoDB versions (2.5.x and later)
I just tested mongoimport with latest MongoDB 2.6.4 using very large JSON array (~200 MB) and it worked perfectly.
I'm pretty sure that such an operation was impossible with MongoDB 2.2.x. So, it looks like mongodb.org simply forgot to update mongoimport documentation.
I searched MongoDB bug tracker and found this issue. According to it, this problem was resolved a year ago and the fix was released with MongoDB 2.5.0.
So, feel free to import large JSON documents!
Solution for old MongoDB versions (prior to 2.5.0)
If you're using old version of MongoDB, it's still possible to import large array of documents, using --type json flag instead of --jsonArray. But it assumes a special structure for a file to import from. It's similar to JSON format, except that only one document per line is allowed with no comma after each of them:
{ name: "Widget 1", desc: "This is Widget 1" }
{ name: "Widget 2", desc: "This is Widget 2" }
Strangely, I have already managed to import data > 16 MB (24MB) no problem using mongoimport, by doing:
If you are happy with the data thats imported this way - you needn't worry about the limit of 16MB. That limit is for each record (document) in the Collection. 16MB in text data is a lot - you could have an entire book in that much space - so its extremely unusual to have a single record more than 16MB in size.
Faced the similar problem I guess the 16MB limitation is still persists with older version. While there is a way around in any case just make your json which has jsonArray into normal json file using linux sed commands which will remove some initial part and end part.
You then can import the file with normal mongoimport command.

mongoimport - $addToSet/$push with upsert?

Is it possible to write JSON that will cause mongoimport to append to existing arrays during an upsert? (mongodb 2.0)
It appears that, as of now (9/26/11) this is not possible. Users with this problem are encouraged to write their own import script.

Bulk update/upsert in MongoDB?

Is it possible to do bulk update/upsert (not insert) in MongoDB?
If yes, please point me to any docs related to this?
Thanks
You can use the command line program mongoimport it should be in your MongoDB bin dir ...
There are two options you'll want to look into to use upsert ...
--upsert insert or
update objects that already exist
--upsertFields arg comma-separated fields for the query
part of the
upsert. You should make sure this is indexed
More info here: http://www.mongodb.org/display/DOCS/Import+Export+Tools
Or just do ...
$ mongoimport --help
mongo can execute .js file.
you can push all you update commands in a js file.
t.js
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81a"},{$set:{algres:[]}},false,true);
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81b"},{$set:{algres:[]}},false,true);
then,
mongo 127.0.0.1/test t.js
Bulk updates can also be done in batches as found in the documentation:
MongoDB Bulk Methods
I use these to import CSV files that I need to massage a bit before importing the data. Its kinda slow when dealing with updates, but it did my 50K document updates in about 83 seconds, which is far slower than mongoimport command.