How to query through more than 2 million documents in mongodb? - mongodb

I have more than 2 million documents in my mongodb atlas collection. I am using Mongoose to read/write data into the cloud collection but as the size is too huge, queries like Model.find() takes infinite time, unless I use .limit(x) with it.
Is there any way or concept I should look for this?
I just want to analyse the documents for some simple queries and then put the data to use in React frontend.

Related

How to query documents from an entire MongoDB database containing multiple collections at once (Pymongo)

I have a DB containing thousands of collections, each collection containing documents.
How would I query all these documents from all the collections at once?
I have considered using the $lookup aggregation method but from my knowledge, it only combines an additional collection at once, whereas I have thousands of collections that I want to combine and query altogether.
Is there a way to achieve this?
MongoDB operations work on a single collection at a time. As you point out, $lookup can perform a cross-collection lookup, but it won't help you with a large number collections.
Re-design your data model.

Is it possible to use mongoDB geospacial indexes with grid FS

I have a large geojson feature collection which is over 16MB. I am hoping to insert the data into MongoDB so that I can utilize the geospatial functionality that MongoDB offers ($geoIntersects, $geoWithin, etc). Due to the large size of the file, I cannot store the data in one MongoDB document.
I have used GridFS to break the file up into several chunks within MongoDB but I am now unsure whether I can now utilize the geospatial features that I would like to.
Does anyone know if this is possible and if so whats the best way to do something like this?
One way you should be able to achieve what you are describing is to extract the data to be indexed into a separate collection, and add indexes on that collection.
GridFS essentially takes the data, splits it into 16 MB chunks or less and stores each chunk as a document. I don't see how the chunks would be supported by a geo index.

mongodb - retrieve multiple documents efficiently

I want to download ~100K records from mongodb.
Using pymongo.
find() returns cursor, which, i guess, go to the db (cloud) for each single doc. This takes ~0.003 per document, not so good.
Is there some way to perform "fetch all" or something like that, to reduce run time?

Avoid MongoDB to load index

I want to perform a huge query that performs joins across my database, more than 600 collections with billions of documents. The point is that if I perform that query, my db will explode because MongoDB will be trying to load the index of each collection.
¿Is there any way to avoid MongoDB to automatically load the indexes of each collection?

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.