Optimize query performance in MongoDB - mongodb

I have a collection named App and need to query those active (active: true) apps that belong to a particular user (user_id) or are available to all users (by their _id). I use query like this
{
"active" : true,
"$or" : [
{
"user_id" : "111111111111111111111111"
},
{
"_id" : {
"$in" : [
ObjectId("222222222222222222222222"),
ObjectId("333333333333333333333333"),
ObjectId("444444444444444444444444")
]
}
}
]
}
However in db.currentOp(true) I see that this query is running very slowly: lockStats.timeLockedMicros.r is about 3000.
How can I optimize performance of this query? I already have the following indexes on App:
> db.App.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.App"
},
{
"v" : 1,
"key" : {
"active" : 1,
"created_at" : -1
},
"name" : "active_1_created_at_-1",
"ns" : "mydb.App",
"background" : true
},
{
"v" : 1,
"key" : {
"active" : 1,
"user_id" : 1
},
"name" : "active_1_user_id_1",
"ns" : "mydb.App",
"background" : true
}
]

Two issues I see here:
1) You would not need index on the boolean field active as it would have low selectivity and not benefiting query performance.
"If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes." source
2) You need an index for user_id because user_id cannot use the compound index you created for active_1_user_id_1
Edit: You can always check index efficiency by doing a explain(true) and look at which indexes are used for that query.

I would try to do the following:
remove all your indexes, your active field has a low cardinality (boolean) and does not help you at all, you are not using created_at, so there is no reason for it.
add an index only on user_id key
change your strings as numbers to numbers.

Related

Mongo DB, document count mismatch for a collection

I have a collection User in mongo. When I do a count on this collection I got 13204951 documents
> db.User.count()
13204951
But when I tried to find the count of non-stale documents like this I got a count of 13208778
> db.User.find({"_id": {$exists: true, $ne: null}}).count()
13208778
> db.User.find({"UserId": {$exists: true, $ne: null}}).count()
13208778
I even tried to get the count of this collection using MongoEngine
user_list = set(User.objects().values_list('UserId'))
len(resume_list)
13208778
Here are the indexes of this User collection
>db.User.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "user_db.User"
},
{
"v" : 1,
"unique" : true,
"key" : {
"UserId" : 1
},
"name" : "UserId_1",
"ns" : "user_db.User",
"sparse" : false,
"background" : true
}
]
Any pointers on how to debug the mismatch in counts from different queries.
refer to this document
On a sharded cluster, db.collection.count() can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
Also, refer to this question
If you are not using sharding cluster, you can refer to this question
The basic idea is db.{collection}.count() might do some tricks to make it fast to return a count, and it might be not accurate, use a count() with query should be accurate.

Is a mongodb query with 1 indexed field faster than multiple indexed fields?

In the following model a product is owned by a customer. and cannot be ordered by other customers. So I know that in an order by customer 1 there can only be products owned by customer one.
To give you an idea here is a simple version of the data model:
Orders:
{
'customer' : 1
'products' : [
{'productId' : 'a'},
{'productId' : 'b'}
]
}
Products:
{
'id' : 'a'
'name' : 'somename'
'customer' : 1
}
I need to find orders that contain certain products. I know the product id and customer id. I'm free to add/change indexes on my database.
Now my question is. Is it faster to just add a single field index on the product id's and query only using that ID. Or should I go for a compound index with customer and product id?
I'm not sure if this matters, but in my real model the list of products is actually a list of objects which have an amount and a dbref to the product. And the customer is also a dbref.
Here is a full order object:
{
"_id" : 0,
"_class" : "nl.pfa.myprintforce.models.Order",
"orderNumber" : "e35f1fa8-b4c4-4d53-89c9-66abe94a3553",
"status" : "ERROR",
"created" : ISODate("2017-03-30T11:50:50.292Z"),
"finished" : false,
"orderTime" : ISODate("2017-01-12T12:50:50.292Z"),
"expectedDelivery" : ISODate("2017-03-30T11:50:50.292Z"),
"totalItems" : 19,
"orderItems" : [
{
"amount" : 4,
"product" : {
"$ref" : "product",
"$id" : NumberLong(16)
}
},
{
"amount" : 7,
"product" : {
"$ref" : "product",
"$id" : NumberLong(26)
}
},
{
"amount" : 8,
"product" : {
"$ref" : "product",
"$id" : NumberLong(7)
}
}
],
"stateList" : [
{
"timestamp" : ISODate("2017-03-28T11:50:50.074Z"),
"status" : "NEW",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-29T11:50:50.075Z"),
"status" : "IN_PRODUCTION",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-30T11:50:50.075Z"),
"status" : "ERROR",
"message" : "Something went wrong"
}
],
"customer" : {
"$ref" : "customer",
"$id" : ObjectId("58dcf11a71571a24c475c044")
}
}
When I have the following indexes:
1: {"customer" : 1, "orderItems.product" : 1}
2: {"orderItems.product" : 1}
both count queries (I use count to forcefully find all documents without the network transfer):
a: db.getCollection('order').find({
'orderItems.product' : DBRef('product',113)
}).count()
b: db.getCollection('order').find({
'customer' : DBRef('customer',ObjectId("58de009671571a07540a51d5")),
'orderItems.product' : DBRef('product',113)
}).count()
Run with the same time of ~0.007 seconds on a set of 200k.
When I add 1000k record for a different customer (and different products) it does not effect the time at all.
an extended explain shows that:
query 1 just uses index 2.
query 2 uses index 2 but also considered index 1. Perhaps index intersection is used here?
Because if I drop index 1 the results are:
Query a: 0.007 seconds
Query b: 0.035 seconds (5x as long!)
So my conclusion is that with the right indexing both methods work about as fast. However, if you do not need the compound index for anything else it's just a waste of space & write speed.
So: single field index is better in my case.

Why does MongoDB take so long to sort a result with only one object in it?

I am optimizing a fairly complicated query of the form:
db.foo.find({
"$or":[
{"bar1":ObjectID("123123"), baz:false},
{"bar2":ObjectID("123123"), baz:false}
],
"deleted":false}
).sort("modified_on")
And the sort appears to be killing my performance. I have indexes on the modified_date field like so:
{
"v" : 1,
"key" : {
"modified_on" : 1
},
"ns" : "realtalk.customer_profile",
"name" : "modified_on_1"
},
{
"v" : 1,
"key" : {
"modified_on" : -1
},
"ns" : "realtalk.customer_profile",
"name" : "modified_on_-1"
}
These do not seem to speed up the query at all. What is really driving my crazy though is the size of my result. There is only ever a handful of objects returned by this query, often only one. How is it taking mongo so long to sort one thing, and how can I speed it up?

mongodb compound index and single index performance

Consider the following scenario:
100% of the time my query will include a in the query, and sometimes also b.
90% the query will be:
{a:"somevalue"}
and 10% it will be
{a:"somevalue",b:"somevalue"}
What would be the downside to satisfy this with a compound index only (if any)?, like so:
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
}
Or would i benefit from adding a second index satisfying only queries on a
{
"v" : 1,
"key" : {
"a" : 1,
"b" : 1
},
"name" : "a_1_b_1",
"ns" : "foo.bar"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "foo.bar"
}
The manual has this to say;
If you have a collection that has a compound index on { a: 1, b: 1 }, as well as an index that consists of the prefix of that index, i.e. { a: 1 }, assuming none of the index has a sparse or unique constraints, then you can drop the { a: 1 } index.
MongoDB will be able to use the compound index in all of situations that it would have used the { a: 1 } index.
In other words, you'll most likely get as good or better performance using a single index, since MongoDB does not have to cache two indexes in memory or update two separate indexes on every insert.

Calling ensureIndex with compound key results in _id field in index object

When I call ensureIndex from the mongo shell on a collection for a compound index an _id field of type ObjectId is auto-generated in the index object.
> db.system.indexes.find();
{ "name" : "_id_", "ns" : "database.coll", "key" : { "_id" : 1 } }
{ "_id" : ObjectId("4ea78d66413e9b6a64c3e941"), "ns" : "database.coll", "key" : { "a.b" : 1, "a.c" : 1 }, "name" : "a.b_1_a.c_1" }
This makes intuitive sense as all documents in a collection need an _id field (even system.indexes, right?), but when I check the indexes generated by morphia's ensureIndex call for the same collection *there is no _id property*.
Looking at morphia's source code, it's clear that it's calling the same code that the shell uses, but for some reason (whether it's the fact that I'm creating a compound index or indexing an Embedded document or both) they produce different results. Can anyone explain this behavior to me?
Not exactly sure how you managed to get an _id field in the indexes collection but both shell and Morphia originated ensureIndex calls for compound indexes do not put an _id field in the index object :
> db.test.ensureIndex({'a.b':1, 'a.c':1})
> db.system.indexes.find({})
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.test", "name" : "_id_" }
{ "v" : 1, "key" : { "a.b" : 1, "a.c" : 1 }, "ns" : "test.test", "name" : "a.b_1_a.c_1" }
>
Upgrade to 2.x if you're running an older version to avoid running into now resolved issues. And judging from your output you are running 1.8 or earlier.