I have a rethinkdb database, which contains 100k documents on server. i want to get a dump of 500 documents or 50Mb of data to run tests on my local machine.
Is their a way that i can specify in command line / query to get dump for specific length of documents or of specific size ?
There isn't an argument like that in the script, but you can do this yourself from one of the client languages.
Related
Issue
I have at least 10 text files(CSV), each reaches to 5GB in size. There is no issue when I import the first text file. But when I start importing the second text file it shows the Maximum Size Limit (16MB).
My primary purpose for using the database is for searching the customers from the database using customer_id index.
Given Below is the details of One CSV File.
Collection Name|Documents|Avg.Document Size|Total Document Size|Num.Indexes| Total Index Size|Properties
Customers|8,874,412|1.8 KB|15.7 GB|3|262.0 MB
To overcome this MongoDB community were recommending GridFS, but the problem with GridFS is that the data is stored in bytes and its not possible to query for a specific index in the textfile.
I don't know if its possible to query for a specific index in a textfile when using GridFS. If some one knows any help is appreciated.
Then the other solution I thought about was creating multiple instance of MonogDB running in different ports to solve the issue. Is this method feasible?
But lot of the tutorial on multiple instance shows how to cerate a replica set. There by storing the same data in the PRIMARY and the SECONDARY.
The SECONDARY instances don't allow to write and only allows to read data.
Is it possible to create multiple instance of MongoDB without creating replica set and with write and read operations on them? If Yes How? Can this method overcome the 16MB limit.
Second Solution I thought about was creating shards of the collections or simply sharding. Can this method overcome the 16MB limit. If yes any help regarding this.
Of the two solutions which is more efficient for searching for data (in terms of speed). As I mentioned earlier I just want to search of customers from this database.
The error message shows exactly where the problem is: entry #8437: line 13530, column 627
Have a look at the file and correct it in the file.
The error extraneous " in field ... is quite clear. In your CSV file you have an opening quote " but it is not closed, i.e. the rest of entire file is considered as one single field.
I have a local MongoDB database that I am starting to put some files into GridFS for caching purposes. What I want to know is:
Can I use db.cloneCollection() on another server to clone my fs.* collections? If I do that will the GridFS system on that server work properly? Essentially I have to "pull" data from another machine that has the files in GridFS, I can't direcly add them easily to the production box.
Edit: I was able to get on my destination server and use the following commands from the mongo shell to pull the GridFS system over from another mongo system on our network.
use DBName
db.cloneCollection("otherserver:someport","fs.files")
db.cloneCollection("otherserver:someport","fs.chunks")
For future reference.
The short answer is of course you can, it is only a collection and there is nothing special about it at all. The longer form is explaining what GridFS actually is.
So the very first sentence on the manual page:
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.
GridFS is not something that "MongoDB does", internally to the server it is basically just two collections, one for the reference information and one for the "chunks" that are used to break up the content so no individual document exceeds the 16MB limit. But most importantly here is the word "specification".
So the server itself does no magic at all. The implementation to store reference data and chunks is all done at the "driver" level, where in fact you can name the collections you wish to use rather than just accept the defaults. So when reading and writing data, it is the "driver" that does the work by pulling the "chunks" contained in the reference document or creating new "chunks" as data is sent to the server.
The other common misconception is that GridFS is the only method for dealing with "files" when sending content to MongoDB. Again in that first sentence, it actually exists as a way to store content that exceeds the 16MB limit for BSON documents.
MongoDB has no problem directly storing binary data in a document as long as the total document does not exceed the 16MB limit. So in most use cases ( small image files used on websites ) the data would be better stored in ordinary documents and thus avoid the overhead of needing to read and write with multiple collections.
So there is no internal server "magic". These are just ordinary collections that you can query, aggregate, mapReduce and even copy or clone.
I have an application where I'm using mongodb as a database for storing record the ruby wrapper for mongodb I'm using is mongoid
Now everything was working fine until I hit a above error
Exceded maximum insert size of 16,000,000 bytes
Can any pin point how to get rid of errors.
I'm running a mongodb server which does not have a configuration file (no configuration was provide with mongodb source file)
Can anyone help
You have hit the maximum limit of a single document in MongoDB.
If you save large data files in MongoDB, use GridFs instead.
If your document has too many subdocuments, consider splitting it and use relations instead of nesting.
The limit of 16MB data per document is a very well known limitation.
Use GridFS for storing arbitrary binary data of arbitrary size + metadata.
I just import my Mysql database (size 2,8 Mio) in my new Mongo database with very simple php script i build, import was ok without error but when i look my Mongo database (with RockMongo) i can see this : Data Size 8.01m, Storage Size 13.7m.
MongoDB is bigger than Mysql for the same amount of data, is this normal ?
Thanks for your help and sorry for my english.
Yes, it's normal that the "same" data will take up more space in mongodb. There's a few things you need to take into account:
1) the document _id that's stored for each document (unless you are specifying your own value for that) is 12 bytes per doc
2) you're storing the key for each key-value pair in each document, whereas in MySql the column name is not stored for every single row so you have that extra overhead in your mongodb documents too. One way to reduce this is to use shortened key names ("column names") in your docs
3) mongodb automatically adds padding to allow documents to grow
In similar tests, loading data from SQL Server to MongoDB, with shortened 2 character document key names instead of the full names as per SQL Server, I see about 25-30% extra space being used in MongoDB.
I would like to create a database for each customer. But before, I would like to know how many databases can be created in a single instance of MongoDB ?
There's no explicit limit, but there are probably some implicit limits
due to max number of open file handles / files in a directory on the
host OS/filesystem.
see: http://groups.google.com/group/mongodb-user/browse_thread/thread/01727e1af681985a?fwc=2
By default, you can run some 12,000 collections in a single instance of MongoDB( that is, if each collection also has 1 index).
If you want to create more number of collections, then use --nssize when you run mongod process. You can see this link for more details:
http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections