Why is the this mongodb query slow when it's indexed? - mongodb

Why is the following query slow when an index is being utilized?
db.foo.count({$and:[{'metadata.source':'WZ'}, {'metadata.source':'ED'}]})
with the index
{
"v" : 1,
"key" : { 'metadata.source" : 1 },
"name" : "metadata.source_1",
"ns" : "bar.foo"
}
where the metadata field is a JSON Array
The following with a single value returns immediately
db.foo.count({'metadata.source':'WZ'})
Update:
I'm using Mongo v3.0.3. Setup is a sharded replica-set with about 12M documents.
I tried the following with the same delay
db.foo.count({'metadata.source' : { $all : ['WZ', 'ED'] }})
When I check db.currentOp(), it shows the following which seems correct:
"planSummary" : "IXSCAN { metadata.source: 1.0 }"
But the numYields is very high and continues to increase. Does this mean the index does not fit into memory and is reading from disk. There should be plenty of memory based on my db.foo.stats(). Anything else I should look for to help diagnose?
This is also using the wiredTiger storage engine which seems to have some noted performance issues. I'm attempting to upgrade to 3.0.7 to see if that resolves the issue.

Related

why is mongodb not indexing my collection

I have created a collection and added just a name field and tried to apply the following index.
db.names.createIndex({"name":1})
Even after applying the index I see the below result.
db.names.find()
{ "_id" : ObjectId("57d14139eceab001a19f7e82"), "name" : "kkkk" } {
"_id" : ObjectId("57d1413feceab001a19f7e83"), "name" : "aaaa" } {
"_id" : ObjectId("57d14144eceab001a19f7e84"), "name" : "zzzz" } {
"_id" : ObjectId("57d14148eceab001a19f7e85"), "name" : "dddd" } {
"_id" : ObjectId("57d1414ceceab001a19f7e86"), "name" : "rrrrr" }
What am I missing here.
Khans...
the way you built your index is correct however building an ascending index on names wont return the results in ascending order.
if you need results to be ordered by name you have to use
{db.names.find().sort({names:1})}
what happens when you build an index is that when you search for data the Mongo process perform the search behind the scenes in an ordered fashion for faster outcomes.
Please note: if you just want to see output in sorted order. you dont even need an index.
You won't be able to see if an index has been successfully created (unless there is a considerable speed performance) by running a find() command.
Instead, use db.names.getIndexes() to see if the index has been created (it may take some time if you're running the index in the background for it to appear in the index list)

Complex-ish mongo query runs fairly slow, combination of $and $or $in and regex

I'm running some queries to a mongodb 2.4.9 server that populate a datatable on a webpage. The user needs to be able to do a substring search across multiple fields, sort the data on various columns, and flip through the results in pages. I have to check multiple fields for matches since the user could be searching for anything related to the documents. There are about 300,000 documents in the collection so the database is relatively small.
I have indexes created for the created_by, requester, desc.name, metaprogram.id, program.id, and arr.programid fields. I've also created indexes [("created", 1), ("created_by", 1), ("requester", 1)] and [("created_by", 1), ("requester", 1)] at the suggestion of Dex.
It's also worth mentioning that documents might not have all of the fields that are being searched for here. Some documents might have a metaprogram.id but not the other ID fields for example.
An example of a query I might run is
{
"$query" : {
"$and" : [
{
"created_by" : {"$ne" : "automation"},
"requester" : {"$in" : ["Broadway", "Spec", "Falcon"] }
},
{
"$or" : [
{"requester" : /month/i },
{"created_by" : /month/i },
{"desc.name" : /month/i },
{"metaprogram.id" : {"$in" : [708, 2314, 709 ] } },
{"program.id" : {"$in" : [708, 2314, 709 ] } },
{"arr.programid" : {"$in" : [708, 2314, 709 ] } }
]
}
]
},
"$orderby" : {
"created" : 1
}
}
with differing orderby, limit, and skip values as well.
Queries on average take 500-1500ms to complete.
I've looked into how to make it faster, but haven't been able to come up with anything. Some of the text searching stuff looks handy but as far as I know each collection only supports at most one text index and it doesn't support pagination (skips). I'm sure that prefix searching instead of regex substring matches would be faster as well but I need substring matching.
Is there anything you can think of to improve the speed of a query like this?
It's quite hard to optimize a query when it's unpredictable.
Analyze how the system is being used and place indexes on the most popular fields.
Use .explain() to make sure the indexes are being used.
Also limit the results returned to a value of 50 or 100. The user doesn't need to see everything at once.
Try upgrading mongodb to see if there's a performance improvement.
Side note:
You might want to consider using ElasticSearch as a search engine instead of Mongodb. ElasticSearch would store the searchable fields and return the Mongodb Ids for matched results. ElasticSearch is a magnitude faster as a search engine than Mongodb.
More info:
How to find queries not using indexes or slow in mongodb
Range query for MongoDB pagination
http://www.elasticsearch.org/overview/

Overflow sort stage buffered data usage

We have a mongoDB 2.6.4 replica set running and are trying to diagnose this behavior. We are getting the Runner error: Overflow sort stage buffered data usage of 33598393 bytes exceeds internal limit of 33554432 bytes when we expect that we would not. The collection has millions of records and has a compound index that includes the key that is being sorted. As an example
index looks like this
{ from: 1, time : -1, otherA : 1, otherB : 1}
our find is
find.collection({ from : { $in : ["a", "b"] }, time : { $gte : timestamp },
otherA : {$in:[...]}, otherB : {$in:[...]}})
.sort( time : -1 )
mongoDB parallels (clauses) this query like this:
{ from : a }, { time : { $gte : timestamp }, ... }
{ from : b }, { time : { $gte : timestamp }, ... }
In the explain each stage reports that scanAndOrder : false, which implies that the index was used to return the results. This all seems fine, however the mongoDB client gets the Runner error: Overflow sort stage buffered data usage error. This seems to imply that the sort was done in memory. Is this because it is doing an in-memory merge sort of the clauses? Or is there some other reason that this error could occur?
I was also facing the same problem of memory overflow.
I am using PHP with MongoDB to manipulate Documents.
When I am accessing a collection which is probably having large set of documents it is throwing an error.
As per the following link, it can sort only upto 32MB data at a time.
http://docs.mongodb.org/manual/reference/limits/#Sorted-Documents .
So as per the description given in MongoDocs about sorting,
I sorted the array converted from MongoCursor object with PHP rather than Mongo's sort() method.
Hope it'll help you.
Thanks.

Search full document in mongodb for a match

Is there a way to match a value with every array and sub document inside the document in mongodb collection and return the document
{
"_id" : "2000001956",
"trimline1" : "abc",
"trimline2" : "xyz",
"subtitle" : "www",
"image" : {
"large" : 0,
"small" : 0,
"tiled" : 0,
"cropped" : false
},
"Kytrr" : {
"count" : 0,
"assigned" : 0
}
}
for eg if in the above document I am searching for xyz or "ab" or "xy" or "z" or "0" this document should be returned.
I actually have to achieve this at the back end using C# driver but a mongo query would also help greatly.
Please advice.
Thanks
You could probably do this using '$where'
db.mycollection({$where:"JSON.stringify(this).indexOf('xyz')!=-1"})
I'm converting the whole record to a big string and then searching to see if your element is in the resulting string. Probably won't work if your xyz is in the fieldnames!
You can make it iterate through the fields to make a big string and then search it though.
This isn't the most elegant way and will involve a full tablescan. It will be faster if you look through the individual fields!
While Malcolm's answer above would work, when your collection gets large or you have high traffic, you'll see this fall over pretty quickly. This is because of 2 things. First, dropping down to javascript is a big deal and second, this will always be a full table scan because $where can't use an index.
MongoDB 2.6 introduced text indexing which is on by default (it was in beta in 2.4). With it, you can have a full text index on all the fields in the document. The documentation gives the following example where a text index is created for every field and names the index "TextIndex".
db.collection.ensureIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)

Getting error while Updating Collection attribute name in the MongoDB

So I have following structure of MongoDB collection
{ "_id" : ObjectId("516c48631f6c263a24fbbe7a"), "oldname" : 1, "name" : "somename" }
and I want to rename OLD NAME to NEW NAME so it will look like,
{ "_id" : ObjectId("516c48631f6c263a24fbbe7a"), "newname" : 1, "name" : "somename" }
so I am writing this command,
db.element_type.update({}, {$rename: {'oldname': 'newname'}}, false, true);
But it is giving me this error
failing update: objects in a capped ns cannot grow
The problem, per the error message, is that you're trying to update a capped collection, presumably with a newname that is longer than the oldname.
You can read about capped collections in the docs. They're designed to maintain their order, which is why you're running into this.
If you must use a capped collection, perhaps you should remove and re-insert instead of updating.