Overflow sort stage buffered data usage - mongodb

We have a mongoDB 2.6.4 replica set running and are trying to diagnose this behavior. We are getting the Runner error: Overflow sort stage buffered data usage of 33598393 bytes exceeds internal limit of 33554432 bytes when we expect that we would not. The collection has millions of records and has a compound index that includes the key that is being sorted. As an example
index looks like this
{ from: 1, time : -1, otherA : 1, otherB : 1}
our find is
find.collection({ from : { $in : ["a", "b"] }, time : { $gte : timestamp },
otherA : {$in:[...]}, otherB : {$in:[...]}})
.sort( time : -1 )
mongoDB parallels (clauses) this query like this:
{ from : a }, { time : { $gte : timestamp }, ... }
{ from : b }, { time : { $gte : timestamp }, ... }
In the explain each stage reports that scanAndOrder : false, which implies that the index was used to return the results. This all seems fine, however the mongoDB client gets the Runner error: Overflow sort stage buffered data usage error. This seems to imply that the sort was done in memory. Is this because it is doing an in-memory merge sort of the clauses? Or is there some other reason that this error could occur?

I was also facing the same problem of memory overflow.
I am using PHP with MongoDB to manipulate Documents.
When I am accessing a collection which is probably having large set of documents it is throwing an error.
As per the following link, it can sort only upto 32MB data at a time.
http://docs.mongodb.org/manual/reference/limits/#Sorted-Documents .
So as per the description given in MongoDocs about sorting,
I sorted the array converted from MongoCursor object with PHP rather than Mongo's sort() method.
Hope it'll help you.
Thanks.

Related

Matching elements in array documents sometimes gets very slow

I have a mongodb collection with about 100.000 documents.
Each document has an array with about ~ 100 elements. Is an array of strings like this:
features: [
"0_Toyota",
"29776_Grey",
"101037_Hybrid",
"240473_Iron Gray",
"46290_Aluminium,Magnesium",
"2787_14",
"9350_1920 x 1080",
"36303_Y",
"310870_N",
"57721_Y"
...
Making queries like this, are very fast. But sometimes gets very slow, including an specific extra condition inside $and. I have no idea why this happens. When gets slow, it takes more than 40 seconds. Always happens with the same extra condition. It is very possible that it happens with other conditions.
db.products.find({
$and:[
{
"features" : {
"$eq" : "36303_N"
}
},
{
"features" : {
"$eq" : "91135_IPS"
}
},
{
"features" : {
"$eq" : "9350_1366 x 768"
}
},
{
"features" : {
"$eq" : "178874_Y"
}
},
{
"features" : {
"$eq" : "43547_Y"
}
}
...
I'm running the same mongodb in my unix laptop and on a linux server instance.
Also trying indexing the field "features" with the same results.
use $all in your mongo query with your data helps you to query for an array
first create index on features
use this query may helps to you
db.products.find( { features: { $all: ["36303_N", "91135_IPS","others..."] } } )
by the way ,
if your query is very slow ,get the slow operation from your mongod log
show your mongodb version .
any writing when query (write will blocking read in some version)
I have realized that order inside $all matters. I change the order of elements by its number of documents that exists inside the collection, ascending. Making the query more selective.
Before, the query takes ~ 40 seconds to execute, now, with elements ordered, it takes ~ 22 seconds.
Still many seconds anyway.

Mongodb Sorting returns null instead of data

My Mongodb dataset is like this
{
"_id" : ObjectId("5a27cc4783800a0b284c7f62"),
"action" : "1",
"silent" : "0",
"createdate" : ISODate("2017-12-06T10:53:59.664Z"),
"__v" : 0
}
Now I have to find the data whose Action value is 1 and silent value is 0. one more thing is that all the data returns is descending Order.
My Mongodb Query is
db.collection.find({'action': 1, 'silent': 0}).sort({createdate: -1}).exec(function(err, post) {
console.log(post.length);
});
Earlier It works Fine for me But Now I have 121000 entry on this Collection. Now it returns null.
I know there is some confusion on .sort()
If i remove the sort Query then everything is fine. Example
db.collection.find({'action': 1, 'silent': 0}).exec(function(err, post) {
console.log(post.length);// Now it returns data but not on Descending order
});
MongoDB limits the amount of data it will attempt to sort without an index .
This is because Mongo has to sort the data in memory or on disk, both of which can be expensive operations, particularly for queries run frequently.
In most cases, this can be alleviated by creating indexes on the fields you sort on.
you can create index with :-
db.myColl.createIndex( { createdate: 1 })
thanks !

Why is the this mongodb query slow when it's indexed?

Why is the following query slow when an index is being utilized?
db.foo.count({$and:[{'metadata.source':'WZ'}, {'metadata.source':'ED'}]})
with the index
{
"v" : 1,
"key" : { 'metadata.source" : 1 },
"name" : "metadata.source_1",
"ns" : "bar.foo"
}
where the metadata field is a JSON Array
The following with a single value returns immediately
db.foo.count({'metadata.source':'WZ'})
Update:
I'm using Mongo v3.0.3. Setup is a sharded replica-set with about 12M documents.
I tried the following with the same delay
db.foo.count({'metadata.source' : { $all : ['WZ', 'ED'] }})
When I check db.currentOp(), it shows the following which seems correct:
"planSummary" : "IXSCAN { metadata.source: 1.0 }"
But the numYields is very high and continues to increase. Does this mean the index does not fit into memory and is reading from disk. There should be plenty of memory based on my db.foo.stats(). Anything else I should look for to help diagnose?
This is also using the wiredTiger storage engine which seems to have some noted performance issues. I'm attempting to upgrade to 3.0.7 to see if that resolves the issue.

Complex-ish mongo query runs fairly slow, combination of $and $or $in and regex

I'm running some queries to a mongodb 2.4.9 server that populate a datatable on a webpage. The user needs to be able to do a substring search across multiple fields, sort the data on various columns, and flip through the results in pages. I have to check multiple fields for matches since the user could be searching for anything related to the documents. There are about 300,000 documents in the collection so the database is relatively small.
I have indexes created for the created_by, requester, desc.name, metaprogram.id, program.id, and arr.programid fields. I've also created indexes [("created", 1), ("created_by", 1), ("requester", 1)] and [("created_by", 1), ("requester", 1)] at the suggestion of Dex.
It's also worth mentioning that documents might not have all of the fields that are being searched for here. Some documents might have a metaprogram.id but not the other ID fields for example.
An example of a query I might run is
{
"$query" : {
"$and" : [
{
"created_by" : {"$ne" : "automation"},
"requester" : {"$in" : ["Broadway", "Spec", "Falcon"] }
},
{
"$or" : [
{"requester" : /month/i },
{"created_by" : /month/i },
{"desc.name" : /month/i },
{"metaprogram.id" : {"$in" : [708, 2314, 709 ] } },
{"program.id" : {"$in" : [708, 2314, 709 ] } },
{"arr.programid" : {"$in" : [708, 2314, 709 ] } }
]
}
]
},
"$orderby" : {
"created" : 1
}
}
with differing orderby, limit, and skip values as well.
Queries on average take 500-1500ms to complete.
I've looked into how to make it faster, but haven't been able to come up with anything. Some of the text searching stuff looks handy but as far as I know each collection only supports at most one text index and it doesn't support pagination (skips). I'm sure that prefix searching instead of regex substring matches would be faster as well but I need substring matching.
Is there anything you can think of to improve the speed of a query like this?
It's quite hard to optimize a query when it's unpredictable.
Analyze how the system is being used and place indexes on the most popular fields.
Use .explain() to make sure the indexes are being used.
Also limit the results returned to a value of 50 or 100. The user doesn't need to see everything at once.
Try upgrading mongodb to see if there's a performance improvement.
Side note:
You might want to consider using ElasticSearch as a search engine instead of Mongodb. ElasticSearch would store the searchable fields and return the Mongodb Ids for matched results. ElasticSearch is a magnitude faster as a search engine than Mongodb.
More info:
How to find queries not using indexes or slow in mongodb
Range query for MongoDB pagination
http://www.elasticsearch.org/overview/

MongoDB: phantom records in $unwind results under heavy load

I have a simple collection of elements like this
{_id: n, xs: [...]}
I'm trying to count total number of elements in all arrays
db.testRace.aggregate([{ $unwind : "$xs" }, { $group : { _id : null, count : { $sum : 1 } } }])
And it works great unless I start to do massive updates of this collection. Under heavy load of update operations I get wrong total - slightly bigger than it should be.
It can be easily reproduced.
First generate some test data
for(var i = 1; i <= 1000000; i++) {
db.testRace.insert({_id: i, xs: [i]});
}
Then simulate a lot of updates
while(true) {
var id = Math.floor((Math.random() * 1000000) + 1);
var obj = db.testRace.find({_id: id}).next();
obj.some="change";
db.testRace.update({_id: id}, obj);
}
And while it is running do aggregate unwind query.
Without load I get right result - 1000000. But when there are a lot of updates I get bigger numbers, like 1001456.
And if I run query like this
db.testRace.aggregate([{ $unwind : "$xs" }, {$group: {_id:"$xs", count:{$sum: 1}}}, { $sort : { count : -1 } }, { $limit : 2 }]);
I get
"result" : [
{
"_id" : 996972,
"count" : 2
},
{
"_id" : 997789,
"count" : 2
}
],
So it seems aggregate count some records twice.
Is it expected behaviour or maybe I'm doing aggregation wrong?
I tested on local mongodb instance, version - 2.4.9
It's expected behavior due to the way MongoDB handles read isolation. When you have a long running query (and an aggregation that reads every single document is a long running query) with updates to that data during the query it may impact whether or no the updated data is returned in the query - depending on what happens when, you could miss a document, receive it or receive it twice.
From the source code:
Any data inserted, deleted, or modified during a yield that should be
returned by a query may or may not be returned by that query. The
query could return: nothing; the data before; the data after; or both
the data before and the data after.
In short, there is no isolation between a query and an
insert/delete/update. AKA, READ_UNCOMMITTED.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/plan_stage.h
Your aggregation query is yielding mid query, during which some of the data is updated. This impacts the results of the query.