sorting 2 millions of records using mongo sort is possible or not?
From the MongoDB Documentation, it is clearly mentioned that "When the sort operation consumes more than 32 megabytes, MongoDB returns an error."
But I have a requirement to sort huge number of records. How to do it?
It's possible. The documentation states that 32MB limit is there only when MongoDB sorts data in-memory i.e. without using an index.
When the sort operation consumes more than 32 megabytes, MongoDB
returns an error. To avoid this error, either create an index to
support the sort operation or use sort() in conjunction with limit().
The specified limit must result in a number of documents that fall
within the 32 megabyte limit.
I suggest that you add an index on the field on which you want to sort with ensureIndex command:
db.coll.ensureIndex({ sortFieldName : 1});
If you're sorting on multiple fields, you will need to add an compound index on the fields your sorting on (order of the fields in index matter):
db.coll.ensureIndex({ sortFieldName1 : 1, sortFieldName2 : 1});
Related
while using mongodb sort function I noticed that it's examining too many records, but If I didn't use sort it examine only 22 documents and returns them.
I'm trying to get users who bought some products in DESC order.
both products.userIDs and orderTime are indexed
db.collection.find({$or:[{'products.userIDs':{$in:usersArray}} , {'item.userIDs':{$in:usersArray}}]})
.sort({orderTime:-1})
.explain();
without sort it docs examined are 22
with sort docs examined are 1602
how can I sort the documents without examine too many documents.
Mongo only uses a single index per query (unless there's an $or). If you want to have that particular query be well indexed, you'd want to have an index on {"products.userIDS": 1, "orderTime": 1}. That way, when it finds the documents, they'll already be sorted & it won't need to to do it in memory. The compound index docs have more details.
Let's say I have a collection mycollection that has 1,000,000 records.
How many records will this query return?
const query = firestore.collection('mycollection').get()
I couldn't find that in docs.
There is no default limit. The query you're showing is asking for all of the documents in mycollection. For large collections, you will need to impose a limit in order to avoid excessive costs and running out of memory.
From firebase.google.com documentation:
By default, a query retrieves all documents that satisfy the query in
ascending order by document ID. You can specify the sort order for
your data using orderBy(), and you can limit the number of documents
retrieved using limit().
My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.
I am currently developing an app which gets the specific number of documents from a collection if their location cordinates falls within certain range of distance. I am using a active record library for Codeigniter and the query that is generated is as follows
db.updates.find({locs: { $near: [72.844102008984, 19.130207090604 ], $maxDistance: 5000 }, posted_on : { $lt :1398425538.1942 },}).sort( { posted_on: -1 } ).limit(10).toArray()
The problem I am facing is that the above query skips few documents which should actually get pulled. But if I remove the limit(10) from the above query then proper documents gets pulled.
I am not sure, but does using limit() in MongoDB omit few results ? or does it limits to only the closest(nearest) documents?
P.S - The documents skipped using the limit are not always the same & random results are generated
I suspect you are running into problems with the special nature of the $near query. $near performs both a limit() and a sort() on the cursor returning the results -
Specifies a point for which a geospatial query returns the closest documents first. The query sorts the documents from nearest to farthest.
By default, queries that use a 2d index return a limit of 100 documents; however you may use limit() to change the number of results.
http://docs.mongodb.org/manual/reference/operator/query/near/
While the documentation does specifically discuss overriding the limit of 100 with your own limit call
You can further limit the number of results using cursor.limit().
It is silent on adding your own sort() or both sorting and overriding the limit at the same time. I suspect you are running into side effects of doing both. Note that it's not incorrect to do both - it just may not produce the results you are looking for. I'd suggest trying the same query using $geoWithin
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
$geoWithin does not apply a sort or a limit on the results, so it gives you something of a more raw result set.
Do you have any identical posted_on dates in the system? I recommend sorting by a second key, perhaps _id. If the sort order is non-deterministic the system may skip documents in a non-deterministic manor. Adding the _id field to your sort order is generally not that expensive if you have an index on the other fields as they will already be very close to the correct order and _id is part of all indexes. ("By default, all collections have an index on the _id field, and applications and users may add additional indexes to support important queries and operations." http://docs.mongodb.org/manual/core/index-single/ )
I have a collection that has different query params to support,as such I create indexes for every keys. I just added a few more and get an error:
pymongo.errors.OperationFailure: add index fails, too many indexes for collection key:{ foo: 1 }
and then I notice that the maximum number of indexes per collection in mongodb is just 64, Can I change this number ?
The max is built into MongoDB:
http://docs.mongodb.org/manual/reference/limits/
Number of Indexes per Collection
A single collection can have no more than 64 indexes.