I am using mongo 2.6.1. I want to import data from a json file > 16 MB. The json is an array of documents. According to their documentation if I use the --jsonArray option, the file can only be 16MB, see http://docs.mongodb.org/manual/reference/program/mongoimport/
Strangely, I have already managed to import data > 16 MB (24MB) no problem using mongoimport, by doing:
mongoimport -db mydb --collection product --file products.json --jsonArray
So what is this 16MB limit then?
16 MB is a MongoDB BSON document size limit. It means that no document inside MongoDB could exceed 16 MB.
Note that JSON representation of MongoDB document could exceed this limit, since BSON is more compact.
The problem with --jsonArray flag is that mongoimport reads the whole .json file as a single document first, and then performs import on each of its elements, thus suffering from BSON document size limit.
Solution for new MongoDB versions (2.5.x and later)
I just tested mongoimport with latest MongoDB 2.6.4 using very large JSON array (~200 MB) and it worked perfectly.
I'm pretty sure that such an operation was impossible with MongoDB 2.2.x. So, it looks like mongodb.org simply forgot to update mongoimport documentation.
I searched MongoDB bug tracker and found this issue. According to it, this problem was resolved a year ago and the fix was released with MongoDB 2.5.0.
So, feel free to import large JSON documents!
Solution for old MongoDB versions (prior to 2.5.0)
If you're using old version of MongoDB, it's still possible to import large array of documents, using --type json flag instead of --jsonArray. But it assumes a special structure for a file to import from. It's similar to JSON format, except that only one document per line is allowed with no comma after each of them:
{ name: "Widget 1", desc: "This is Widget 1" }
{ name: "Widget 2", desc: "This is Widget 2" }
Strangely, I have already managed to import data > 16 MB (24MB) no problem using mongoimport, by doing:
If you are happy with the data thats imported this way - you needn't worry about the limit of 16MB. That limit is for each record (document) in the Collection. 16MB in text data is a lot - you could have an entire book in that much space - so its extremely unusual to have a single record more than 16MB in size.
Faced the similar problem I guess the 16MB limitation is still persists with older version. While there is a way around in any case just make your json which has jsonArray into normal json file using linux sed commands which will remove some initial part and end part.
You then can import the file with normal mongoimport command.
Related
I'm trying to find a way to import a whole dataset into one MongoDB document. Every solution I've tried only inserts many documents instead of one. What I've tried so far.
mongoimport --db=dbName --collection=collectionName --drop --file=file.json --jsonArray
Thank you in advance!
Problem is solved, thanks to both prasad_ and Joe.
After using the mongoimport that is shown in my question, I just aggregated using $group to get everything into one document. Though there was a problem which Joe commented, that a document is limited to 16MB. Which meant that I had to remove some data from the dataset that I wasn't using.
I'm new to mongodb. I wanted to know if I initially code my app using mongodb and later I want to switch to mongodb gridfs, will the switching (of a filled large database) be possible.
So, if I am using mongo db initially and after some time of running the app the database documents exceed the size of 16Mb, I guess I will have to switch to gridfs. I want to know how easy or difficult will it be to switch to gridfs and whether that will be possible?
Thanks.
GridFS is used to store large files. It internally divides data in chunks(By default 255 KB). Let me give you an example of saving a pdf file in MongoDB using both ways. I am assuming the size of pdf as 10 MB so that we can see both normal way and GridFS way.
Normal Way:
Say you want to store it in normal_book collection in testDB database. So, whole pdf is stored in this collection and when you want to fetch it using db.normal_book.find(), whole pdf will be fetched in memory.
GridFS way:
In GridFS, we have two collections, one is for storing data and other is for storing its metadata. It will store data in fs.chunks collection and metadata in fs.filescollection. Now, the beauty of GridFS is that you can find the whole file at once or you can find chunks individually.
Now coming to your question, there is no direct way or property to
tell MongoDB that now I want to switch to GridFS. You need to
reinsert data in GridFS using mongofiles command-line tool or
using MongoDB's drivers.
I have a MongoDB collection backup containing many small documents. The back up was produced by mongodump, but when I try to import it using mongorestore I get an error:
AssertionException handling request, closing client connection: 10334 BSONObj size: 18039019 (0x11340EB) is invalid. Size must be between 0 and 16793600(16MB)
I'm running MongoDB version 3.0.3 (from trunk).
Using --batchSize=100 fixes this issue for me every time.
e.g. mongorestore -d my-database --batchSize=100 ./database-dump-directory
Basically mongoDB accept the size of the document should be less than 16MB.If you intent to use the document which is more than 16MB ,you can use gridfs.
Each document takes memory power of 2 size allocation.
Your application should ensure the size of bson document it is generating..Or else you can use the different data model rather than embedding all the data in one doc.
mongorestore send insert commands by batch in a {"applyOps", entries} document. This document is (AFAIK) limited to 16MB just like any other document.
According to the sources there are "pathological cases where the array overhead of many small operations can overflow the maximum command size". The variable oplogMaxCommandSize is used to help mongorestore to not fail on such cases. It was raised to 16.5M at some point during the 3.0... development. That was too optimistic. It was lowered back to 8M later (JIRA TOOLS-754).
If you need to, you may adjust that value yourself according to your needs. And then recompile the tools.
I have an application where I'm using mongodb as a database for storing record the ruby wrapper for mongodb I'm using is mongoid
Now everything was working fine until I hit a above error
Exceded maximum insert size of 16,000,000 bytes
Can any pin point how to get rid of errors.
I'm running a mongodb server which does not have a configuration file (no configuration was provide with mongodb source file)
Can anyone help
You have hit the maximum limit of a single document in MongoDB.
If you save large data files in MongoDB, use GridFs instead.
If your document has too many subdocuments, consider splitting it and use relations instead of nesting.
The limit of 16MB data per document is a very well known limitation.
Use GridFS for storing arbitrary binary data of arbitrary size + metadata.
I just import my Mysql database (size 2,8 Mio) in my new Mongo database with very simple php script i build, import was ok without error but when i look my Mongo database (with RockMongo) i can see this : Data Size 8.01m, Storage Size 13.7m.
MongoDB is bigger than Mysql for the same amount of data, is this normal ?
Thanks for your help and sorry for my english.
Yes, it's normal that the "same" data will take up more space in mongodb. There's a few things you need to take into account:
1) the document _id that's stored for each document (unless you are specifying your own value for that) is 12 bytes per doc
2) you're storing the key for each key-value pair in each document, whereas in MySql the column name is not stored for every single row so you have that extra overhead in your mongodb documents too. One way to reduce this is to use shortened key names ("column names") in your docs
3) mongodb automatically adds padding to allow documents to grow
In similar tests, loading data from SQL Server to MongoDB, with shortened 2 character document key names instead of the full names as per SQL Server, I see about 25-30% extra space being used in MongoDB.