MongoDB. How to set up indexes? - mongodb

Please help me with indexes in mongoDB.
There is a collection in which 800,000 documents.
There is a request that is very long runs. About 5 seconds!
{
"$or":[
{
"performer":"534ba408f9cd0ecb51711673",
"$or":[
{
"performersRole":"534ba30bf9cd0ec151a69522"
},
{
"performersRole":{
"$exists":false
}
}
]
},
{
"performersRole":"534ba30bf9cd0ec151a69522",
"notShowInToDo":{
"$ne":true
}
}
],
"taskTime":{
"$gte":1409774400,
"$lt":1409860799
},
"$and":[
{
"$or":[
{
"department":{
"$in":[
"5356134ef9cd0e4805672a15",
"53561368f9cd0e4b05645f3f",
"53a0357ff9cd0e670537c4b7",
"53a03594f9cd0e6705389449"
]
}
},
{
"department":{
"$exists":false
}
}
]
},
{
"$or":[
{
"salon":"534f7b3bf9cd0e311e77896f"
},
{
"salon":{
"$exists":false
}
}
]
}
],
"isDone":{
"$ne":true
}
}
Which indexes to add to optimize? Thanks for any advice!
Almost all documents about this format:
{
"_id": "541da66cf535a4a8569dd0ed",
"title": "test task",
"taskTime": NumberLong(1411229292),
"client": "53f876b2f535a4187f9e1264",
"salon": "534f7c3cf9cd0e91206dd948",
"track": "541da66cf535a4a8569dd0ec",
"department": "53a0357ff9cd0e670537c4b7",
"type": "invitePBP",
"performersRole": [
"534ba30bf9cd0ec151a69522"
],
"notShowInToDo": true,
"#createTime": NumberLong(1411229292),
"#updateTime": NumberLong(1411229292)
}

Before the creation of index, consider following points:
1. Cut down the number of query hierarchy as possible as you can;
2. Avoid to use $add and $or if possible;
3. Avoid to use $exists if possible as it will access the collection even though having index on the field;
4. Design the index according to the sequence executed as you want to.
Suppose I have understood your requirements correctly, then I reconstruct the query as below:
var query = {
"taskTime" : {
"$gte" : 1409774400,
"$lt" : 1409860799
},
"isDone" : {
"$ne" : true
},
"$and" : [
{
"salon" : {
"$in" : [ null, "534f7b3bf9cd0e311e77896f" ]
}
}, {
"department" : {
"$in" : [ null,
"5356134ef9cd0e4805672a15",
"53561368f9cd0e4b05645f3f",
"53a0357ff9cd0e670537c4b7",
"53a03594f9cd0e6705389449" ]
}
}],
"$or" : [ {
"performer" : "534ba408f9cd0ecb51711673",
"performersRole" : {
"$in" : [ null, "534ba30bf9cd0ec151a69522" ]
}
}, {
"performersRole" : "534ba30bf9cd0ec151a69522",
"notShowInToDo" : {
"$ne" : true
}
} ]
};
Be careful of null:
Be attentioned that {"salon" : {"$in" : [ null, "534f7b3bf9cd0e311e77896f" ]} can work completely on index {salon:1} in v2.4 but will still access the collection in v2.6, I don't know the exact reason but just guess that it's possible to the definition of null has been changed (include undefined type).
To avoid this issue in v2.6, an alternative is to initialize a real value to field salon instead of doing nothing.
You can try this way to create index and your feedback is appriciated since I haven't the real data to make a test.
db.c.ensureIndex({taskTime:1, isDone:1, salon:1, department:1}, {name:"bigIndex"});
Add my test result - 1010,000 documents
var a = {
"taskTime" : {
"$gte" : 1410443932781,
"$lt" : 1412443932781
},
"isDone" : {
"$ne" : true
},
"$and" : [
{
"salon" : {
"$in" : [ null, "534f7b3bf9cd0e311e77896f", "5420ecdc218ba2fb5353ad5b" ]
}
}, {
"department" : {
"$in" : [ null,
"5356134ef9cd0e4805672a15",
"53561368f9cd0e4b05645f3f",
"53a0357ff9cd0e670537c4b7", "5420ecdc218ba2fb5353ad5d",
"53a03594f9cd0e6705389449" ]
}
}],
"$or" : [ {
"performer" : "534ba408f9cd0ecb51711673",
"performersRole" : {
"$in" : [ null, "5420ecdc218ba2fb5353ad5e" ]
}
}, {
"performersRole" : "5420ecdc218ba2fb5353ad5e",
"notShowInToDo" : {
"$ne" : true
}
} ]
};
db.c.find(a).explain();
{
"cursor" : "BtreeCursor bigIndex",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 54290,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 54290,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 425,
"nChunkSkips" : 0,
"millis" : 261,
"indexBounds" : {
"taskTime" : [
[
1410443932781,
1412443932781
]
],
"isDone" : [
[
{
"$minElement" : 1
},
true
],
[
true,
{
"$maxElement" : 1
}
]
],
"salon" : [
[
null,
null
],
[
"534f7b3bf9cd0e311e77896f",
"534f7b3bf9cd0e311e77896f"
],
[
"5420ecdc218ba2fb5353ad5b",
"5420ecdc218ba2fb5353ad5b"
]
],
"department" : [
[
null,
null
],
[
"5356134ef9cd0e4805672a15",
"5356134ef9cd0e4805672a15"
],
[
"53561368f9cd0e4b05645f3f",
"53561368f9cd0e4b05645f3f"
],
[
"53a0357ff9cd0e670537c4b7",
"53a0357ff9cd0e670537c4b7"
],
[
"53a03594f9cd0e6705389449",
"53a03594f9cd0e6705389449"
],
[
"5420ecdc218ba2fb5353ad5d",
"5420ecdc218ba2fb5353ad5d"
]
]
},
"server" : "Mars-PC:27017",
"filterSet" : false
}

Related

MongoDB first query taking time in Java Spring Boot application

I have following peace of java spring mongoDB code:
startTime = System.currentTimeMillis();
AggregationResults<MyClass> list = mongoTemplate.aggregate(Aggregation.newAggregation(operations),
"Post", MyClass.class);
System.out.println("Time taken for query execution -> "
+ (System.currentTimeMillis() - startTime));
when i am testing this code using jmeter, first execution shows:
Time taken for query execution -> 3275 ('list' has 16 records)
On 2nd and henceforth request its liks
Time taken for query execution -> 355 ('list' has 16 records)
Time difference is huge. How can I improve it in first call ?
When I do Aggregation.newAggregation(operations).toString() I am getting following query output. Running the folliwng aggregation query on shell command always take around .350sec.
{
"aggregate": "__collection__",
"pipeline": [
{
"$match": {
"$and": [
{
"postType": "AUTOMATIC"
}
]
}
},
{
"$project": {
"orders.id": 1,
"postedTotals": 1
}
},
{
"$unwind": "$orders"
},
{
"$group": {
"_id": "$orders.userId",
"ae": {
"$addToSet": "$orders.userId"
}
}
},
{
"$sort": {
"ae": 1
}
}
]
}
.explain().aggregate( shows following:
/* 1 */
{
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"postType" : "AUTOMATIC"
}
]
},
"fields" : {
"headerPostedTotals" : 1,
"orders.UserId" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "post",
"indexFilterSet" : false,
"parsedQuery" : {
"postType" : {
"$eq" : "AUTOMATIC"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"postType" : 1,
"orders.UserId" : 1,
"orders.flightStartDateForQuery" : 1,
"orders.flightEndDateForQuery" : 1,
"postRunDate" : -1
},
"indexName" : "default_filter_index",
"isMultiKey" : true,
"multiKeyPaths" : {
"postType" : [],
"orders.UserId" : [
"orders"
],
"orders.flightStartDateForQuery" : [
"orders"
],
"orders.flightEndDateForQuery" : [
"orders"
],
"postRunDate" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"postType" : [
"[\"AUTOMATIC\", \"AUTOMATIC\"]"
],
"orders.UserId" : [
"[MinKey, MaxKey]"
],
"orders.flightStartDateForQuery" : [
"[MinKey, MaxKey]"
],
"orders.flightEndDateForQuery" : [
"[MinKey, MaxKey]"
],
"postRunDate" : [
"[MaxKey, MinKey]"
]
}
}
},
"rejectedPlans" : []
}
}
},
{
"$project" : {
"_id" : true,
"headerPostedTotals" : true,
"orders" : {
"UserId" : true
}
}
},
{
"$unwind" : {
"path" : "$orders"
}
},
{
"$group" : {
"_id" : "$orders.UserId",
"aes" : {
"$addToSet" : "$orders.UserId"
}
}
},
{
"$sort" : {
"sortKey" : {
"aes" : 1
}
}
}
],
"ok" : 1.0
}

Combine different collection result into one in Mongo

Below is my query I want the result of shp_tx_survey_with_index and
for each loop collection shp_counties_with_index name1 and name2 together of both this collection. If running this query separate then getting the result but this query gives me nothing. I want result like Range_Township, Survey, Section, abstract, centroid, name_1, name_2.
db.shp_tx_survey_with_index.aggregate(
[
{ $match: { "centroid": { "$ne": null } } },
{ $limit: 5 },
{
$project: {
Range_Township: "$l1surnam",
Survey: "$l4surnam",
Section: "$l1surnam",
abstract: "$abstract_",
centroid: "$centroid"
}
}
]
).forEach((obj) => {
var item = db.shp_counties_with_index.findOne({
geom_geojson: {
$nearSphere: {
$geometry: obj.centroid
}
}
}, { 'name_1': 1, 'name_2': 1 });
});
shp_counties_with_index sample collection
{
"_id" : ObjectId("5846bf55834d5b761f00000a"),
"engtype_2" : "County",
"geom_geojson" : {
"type" : "MultiPolygon",
"coordinates" : [
[
[
[
-73.6516685561232,
34.2445059658098
],
[
-73.6516685623318,
34.2445059757618
],
[
-73.6516685538257,
34.244505973301
],
[
-73.6516685561232,
34.2445059658098
]
]
] ]
},
"name_0" : "United States",
"name_1" : "Michigan",
"name_2" : "Chippewa",
"shape_area" : "0.481851809544",
"shape_leng" : "9.37720288177",
"type_2" : "County",
"validfr_2" : "Unknown",
"validto_2" : "Unknown",
"centroid" : {
"coordinates" : [
-73.65166855807875,
34.244505970785795
],
"type" : "Point"
}
}
shp_tx_survey_with_index sample collection
{
"_id" : ObjectId("5846bf76834d5b761f013fa7"),
"abstract_" : "321.000000000",
"abstract_i" : "322.000000000",
"anum" : "443962",
"area" : "0.0000666764235294",
"geom" : "01060000000100000001030000000100000008000000EC90DE47A07659C0F062332AEA813E403471FBB0A17759C06082096CE6813E4034A2C2ABA17759C0700AAF2731823E40B49BADAAA17759C09092F09440823E401C588E90A17759C000B4279A6A823E400019834C677559C02026721261823E403073564B677559C080C77880E6813E40EC90DE47A07659C0F062332AEA813E40",
"geom_geojson" : {
"type" : "MultiPolygon",
"coordinates" : [
[
[
[
-73.6517272344497,
34.2444627902475
],
[
-73.6517271719931,
34.2444627964974
],
[
-73.6517271718375,
34.2444627914072
],
[
-73.6517272344497,
34.2444627902475
]
]
]
]
},
"geom_text" : "MULTIPOLYGON(((-73.6517272344497 34.2444627902475,-73.6517271719931 34.2444627964974,-73.6517271718375 34.2444627914072,-73.6517272344497 34.2444627902475)))",
"gid" : "271508",
"l1surnam" : "TEMPLETON, J S",
"l2block" : null,
"l3surnum" : "4",
"l4surnam" : null,
"perimeter" : "0.0735082380545",
"probflag" : "0",
"shape_area" : "0.0000666764230571",
"shape_leng" : "0.0735082374282",
"centroid" : {
"coordinates" : [
-73.6517272031436,
34.24446279337245
],
"type" : "Point"
}
}
Thanks in advance.
When you want to combine information from 2 collections in a aggregation pipeline you can use the $lookup operator.
This operator is available from MongoDB 3.2 and up.

get length of array via variable just created mongodb

I am new to mongodb, I have a dataset that looks like the following, and I'm trying to Write an aggregation query that will determine the number of unique companies with which an individual has been associated.
Schema:
{
"_id" : ObjectId("52cdef7c4bab8bd675297d8b"),
"name" : "AdventNet",
"permalink" : "abc3",
"crunchbase_url" : "http://www.crunchbase.com/company/adventnet",
"homepage_url" : "http://adventnet.com",
"blog_url" : "",
"blog_feed_url" : "",
"twitter_username" : "manageengine",
"category_code" : "enterprise",
"number_of_employees" : 600,
"founded_year" : 1996,
"deadpooled_year" : 2,
"tag_list" : "",
"alias_list" : "Zoho ManageEngine ",
"email_address" : "pr#adventnet.com",
"phone_number" : "925-924-9500",
"description" : "Server Management Software",
"created_at" : ISODate("2007-05-25T19:24:22Z"),
"updated_at" : "Wed Oct 31 18:26:09 UTC 2012",
"overview" : "<p>AdventNet is now Zoho ManageEngine.</p>\n\n<p>Founded in 1996, AdventNet has served a diverse range of enterprise IT, networking and telecom customers.</p>\n\n<p>AdventNet supplies server and network management software.</p>",
"image" : {
"available_sizes" : [
[
[
150,
55
],
"assets/images/resized/0001/9732/19732v1-max-150x150.png"
],
[
[
150,
55
],
"assets/images/resized/0001/9732/19732v1-max-250x250.png"
],
[
[
150,
55
],
"assets/images/resized/0001/9732/19732v1-max-450x450.png"
]
]
},
"products" : [ ],
"relationships" : [
{
"is_past" : true,
"title" : "CEO and Co-Founder",
"person" : {
"first_name" : "Sridhar",
"last_name" : "Vembu",
"permalink" : "sridhar-vembu"
}
},
{
"is_past" : true,
"title" : "VP of Business Dev",
"person" : {
"first_name" : "Neil",
"last_name" : "Butani",
"permalink" : "neil-butani"
}
},
{
"is_past" : true,
"title" : "Usabiliy Engineer",
"person" : {
"first_name" : "Bharath",
"last_name" : "Balasubramanian",
"permalink" : "bharath-balasibramanian"
}
},
{
"is_past" : true,
"title" : "Director of Engineering",
"person" : {
"first_name" : "Rajendran",
"last_name" : "Dandapani",
"permalink" : "rajendran-dandapani"
}
},
{
"is_past" : true,
"title" : "Market Analyst",
"person" : {
"first_name" : "Aravind",
"last_name" : "Natarajan",
"permalink" : "aravind-natarajan"
}
},
{
"is_past" : true,
"title" : "Director of Product Management",
"person" : {
"first_name" : "Hyther",
"last_name" : "Nizam",
"permalink" : "hyther-nizam"
}
},
{
"is_past" : true,
"title" : "Western Regional OEM Sales Manager",
"person" : {
"first_name" : "Ian",
"last_name" : "Wenig",
"permalink" : "ian-wenig"
}
}
],
"competitions" : [ ],
"providerships" : [
{
"title" : "DHFH",
"is_past" : true,
"provider" : {
"name" : "A Small Orange",
"permalink" : "a-small-orange"
}
}
],
"total_money_raised" : "$0",
"funding_rounds" : [ ],
"investments" : [ ],
"acquisition" : null,
"acquisitions" : [ ],
"offices" : [
{
"description" : "Headquarters",
"address1" : "4900 Hopyard Rd.",
"address2" : "Suite 310",
"zip_code" : "94588",
"city" : "Pleasanton",
"state_code" : "CA",
"country_code" : "USA",
"latitude" : 37.692934,
"longitude" : -121.904945
}
],
"milestones" : [ ],
"video_embeds" : [ ],
"screenshots" : [
{
"available_sizes" : [
[
[
150,
94
],
"assets/images/resized/0004/3400/43400v1-max-150x150.png"
],
[
[
250,
156
],
"assets/images/resized/0004/3400/43400v1-max-250x250.png"
],
[
[
450,
282
],
"assets/images/resized/0004/3400/43400v1-max-450x450.png"
]
],
"attribution" : null
}
],
"external_links" : [ ],
"partners" : [ ]
}
Here is the query I tried:
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$addToSet: "$relationships"
}
}
}])
I think I now need to get the length of the $relationships array? How would I do that?
When you only want the size of the array you really don't need to unwind...
Just use $size.
Alter your aggregation to:
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0,
relationship_size : { $size : "$relationships"}
}
}
}])
This should give you the result you want
From the comment i understand you want some more logic in the aggregation, from outta my head i would alter your aggregation to:
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person.permalink",
count : {$sum : 1}
}
}])
I can't find a "company name" in your relationships array so i use the permalink property

Mongo aggregation combines $match steps, resulting in slow query

This question is a follow up to Query with $in and $nin doesn't use index. I've tried using aggregation to declare the order of steps.
db.assets.aggregate([
{
"$match": {
"tags": {
"$in": ["blah"]
}
}
},
{
"$match": {
"tags": {
"$nin": ["test"]
}
}
}
], {"explain": true})
You'd think Mongo would now understand that we want to filter by $in first. Well, you'd be surprised.
{
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"tags" : {
"$in" : [
"blah"
]
}
},
{
"tags" : {
"$nin" : [
"test"
]
}
}
]
},
"planError" : "InternalError No plan available to provide stats"
}
}
],
"ok" : 1
}
The planner doesn't even know what to do. It turns out it actually combines both $matches into one query, and then runs into the same problem as Query with $in and $nin doesn't use index, eventually returning the results in about 2-3 seconds (which corresponds to the 2331ms on the linked question).
It looks you can trick the aggregator by inserting an empty skip step:
db.assets.aggregate([
{
"$match": {
"tags": {
"$in": ["blah"]
}
}
},
{
"$skip": 0
},
{
"$match": {
"tags": {
"$nin": ["test"]
}
}
}
], {"explain": true})
With that, the planner will use the index and the results are returned immediately.
{
"stages" : [
{
"$cursor" : {
"query" : {
"tags" : {
"$in" : [
"blah"
]
}
},
"plan" : {
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"tags" : [
[ "blah", "blah" ]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"tags" : [
[ "blah", "blah" ]
]
}
}
]
}
}
},
{
"$skip" : NumberLong(0)
},
{
"$match" : {
"tags" : {
"$nin" : [
"test"
]
}
}
}
],
"ok" : 1
}

"InternalError No plan available on provide stats" on aggregate with explain

When I run my aggregation using explain, as described here I get the following...
{
"stages":[
{
"$cursor":{
...
"planError":"InternalError No plan available to provide stats"
}
Any thoughts on what is going on here? I really need to be able to see what (if any) index is being used in my $match stage.
This seems to be a MongoDB 2.6 bug. Check the JIRA ticket.
I tweaked your query just a bit (adding a match to the front since I don't want to unwind the Tags array for all document):
db.collection.aggregate(
[
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $group: { _id : "$_id", count: { $sum:1 } }},
{$sort: {"count":-1}}
],
{ explain: true }
)
And got:
{
"stages" : [
{
"$cursor" : {
"query" : {
"$or" : [
{
"Tags._id" : "tag1"
},
{
"Tags._id" : "tag2"
}
]
},
"plan" : {
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"Tags._id" : [
[
"tag1",
"tag1"
],
[
"tag2",
"tag2"
]
]
},
"allPlans" : [
{
"cursor" : "BtreeCursor ",
"isMultiKey" : false,
"scanAndOrder" : false,
"indexBounds" : {
"Tags._id" : [
[
"tag1",
"tag1"
],
[
"tag2",
"tag2"
]
]
}
}
]
}
}
},
{
"$unwind" : "$Tags"
},
{
"$match" : {
"$or" : [
{
"Tags._id" : "tag1"
},
{
"Tags._id" : "tag2"
}
]
}
},
{
"$group" : {
"_id" : "$_id",
"count" : {
"$sum" : {
"$const" : 1
}
}
}
},
{
"$sort" : {
"sortKey" : {
"count" : -1
}
}
}
],
"ok" : 1
}
While this doesn't quite address why your operation returns a planError, but maybe it can help some how.
Regards
Had the same issue in my Rails app, fixed it by restarting rails server.
MongoDB version is 2.6.4.
I worked around this by rebuilding all indexes on the collection. Not exactly elegant, but the error is gone now.