Problem in using indexes in aggregation pipeline

Problem in using indexes in aggregation pipeline - mongodb

I have a query like this
db.UserPosts.aggregate([
{ "$match" : { "Posts.DateTime" : { "$gte" : ISODate("2018-09-04T11:50:58Z"), "$lte" : ISODate("2018-09-05T11:50:58Z") } } },
{ "$match" : { "UserId" : { "$in" : [NUUID("aaaaaaaa-cccc-dddd-dddd-5369b183cccc"), NUUID("vvvvvvvv-bbbb-ffff-cccc-e0af0c8acccc")] } } },
{ "$project" : { "_id" : 0, "UserId" : 1, "Posts" : 1 } },
{ "$unwind" : "$Posts" },
{ "$unwind" : "$Posts.Comments" },
{ "$sort" : {"Posts.DateTime" : -1} },
{ "$skip" : 0 }, { "$limit" : 20 },
{ "$project" : { "_id" : 0, "UserId" : 1, "DateTime" : "$Posts.DateTime", "Title" : "$Posts.Title", "Type" : "$Posts.Comments.Type", "Comment" : "$Posts.Comments.Description" } },
],{allowDiskUse:true})
I have a compound index
{
"Posts.DateTime" : -1,
"UserId" : 1
}
Posts and Comments are array of objects.
I've tried different types of indexes but the problem is it does not use my index in $sort stage. I changed the place of my $sort stage but wasn't successful. It seems it is working in $match but not set to $sort. I even tried 2 simple indexes on those fields and combination of 2 simple indexes and one compound index but none of them works.
I also read related documents in MongoDB website for
Compound Indexes
Use Indexes to Sort Query Results
Index Intersection
Aggregation Pipeline Optimization
Could somebody please help me to find the solution?

I solved this problem by changing my data model and moving DateTime to higher level of data.

Related

Limit distinct values only if a subelement exists

I have searched here but could not find an clear answer to the following question. In the sample collection mycollection below, how would one select distinct vin numbers only in Objects where the status field exists and the status is UNLOCKED ?
I have tried
db.getCollection('mycollection').distinct("vin", {$and: [{"decoded_payload.status": {$exists: true}}, {"decoded_payload.status":"UNLOCKED"}]})
but this query hangs indefinitely
Due to the large size of the database and the lengthy delay of such a query, I would like to limit the output to check if it runs at all but it seems limit() is not an option with .distinct()
In MongoDB, how would one select the distinct vin in the data below, set the limit = 1 and only select based on the status condition (status exists and is equal to "UNLOCKED")?
Would aggregate() be the right choice? How does one use the above conditions with aggregate() and limit() ?
The output in this case would be 34567
{
"_id" : ObjectId("1"),
"vin" : "12345",
"class_name" : "foo",
"decoded_payload" : {
"timestamp" : 1547329250,
"status" : "LOCKED"
}
}
{
"_id" : ObjectId("2"),
"vin" : "23456",
"class_name" : "foo",
"decoded_payload" : {
"timestamp" : 1547329260,
"status" : "LOCKED"
}
}
{
"_id" : ObjectId("3"),
"vin" : "34567",
"class_name" : "bar",
"decoded_payload" : {
"timestamp" : 1547329270,
"status" : "UNLOCKED",
"reservation_id" : "71"
}
}
{
"_id" : ObjectId("4"),
"vin" : "45678",
"class_name" : "baz",
"decoded_payload" : {
"timestamp" : 1547329280,
"reservation_id" : "71"
}
}

You can use this aggregation Query to filter data and return distinct "vin"
db.mycollection.aggregate([
{
$match: {
$and: [{
"decoded_payload.status": { $exists: true }
}, {
"decoded_payload.status": "UNLOCKED"
}]
}
},
{ $limit : 5 }, // You can use this stage after group too
{
$group: { _id: "$vin" }
}
])
Use limit stage before and after $group stage as per requirement

MongoDB select documents where field1 equals nested.field2 in aggregate pipeline

I have joined two collections on one field using '$lookup', while actually I needed two fields to have a unique match. My next step would be to unwind the array containing different values of the second field I need for a unique match and then compare these to the value of the second field it needs to match higher up. However, the second line in the snippet below returns no results.
// Request only the page that has been viewed
{ '$unwind' : '$DSpub.PublicationPages'},
{ '$match' : {'pageId' : '$DSpub.PublicationPages.PublicationPageId' } }
Is there a more appropriate way to do this? Or can I avoid doing this altogether by unwinding the "from" collection before performing the '$lookup', and then match both fields?

This is not as easy at it looks.
$match does not operate on dynamic data (that means we are comparing static value against data set). To overcome that - we can use $project phase to add a bool static flag, that can be utilized by $match
Please see example below:
Having input collection like this:
[{
"_id" : ObjectId("56be1b51a0f4c8591f37f62b"),
"name" : "Alice",
"sub_users" : [{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
]
}, {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : [{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
]
}
]
We want to get only fields where _id and $docs.sub_users._id" are same, where docs are $lookup output.
db.collecction.aggregate([{
$lookup : {
from : "collecction",
localField : "_id",
foreignField : "_id",
as : "docs"
}
}, {
$unwind : "$docs"
}, {
$unwind : "$docs.sub_users"
}, {
$project : {
_id : 0,
fields : "$$ROOT",
matched : {
$eq : ["$_id", "$docs.sub_users._id"]
}
}
}, {
$match : {
matched : true
}
}
])
that gives output:
{
"fields" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : [
{
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
],
"docs" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a"),
"name" : "Bob",
"sub_users" : {
"_id" : ObjectId("56be1b51a0f4c8591f37f62a")
}
}
},
"matched" : true
}

Mongodb: Indexing for Aggregate sort limit query?

I am in the process of moving from mysql to mongodb. Started learning mongodb yesterday.
I have a big mysql table (over 4 million rows, with over 300 fields each) which I am moving to mongodb.
Let's assume, the products table have the following fields -
_id, category, and 300+ other fields.
To find the top 5 categories in the products along with their count, I have the following mysql query
Select category, count(_id) as N from products group by category order by N DESC limit 5;
I have an index on category field and this query takes around 4.4 sec in mysql.
Now, I have successfully moved this table to mongodb and this is my corresponding query for finding top 5 categories with their counts.
db.products.aggregate([{$group : {_id:"$category", N:{$sum:1}}},{$sort:{N: -1}},{$limit:5}]);
I again have an index on category but the query doesn't seem to be using it (explain : true says so) and it is also taking around 13.5 sec for this query.
Having read more about mongodb aggregation pipeline optimization, I found out that we need to use sort prior to aggregation for index to work but I am sorting on the derived field from aggregation so can't bring it before the aggregate function.
How do I optimize queries like these in mongodb?
=========================================================================
Output of explain
db.products.aggregate([{$group : {_id:"$category",N:{$sum:1}}},{$sort:{N: -1}},{$limit:5}], { explain: true });
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"category" : 1,
"_id" : 0
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb.products",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ ]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : "$category",
"N" : {
"$sum" : {
"$const" : 1
}
}
}
},
{
"$sort" : {
"sortKey" : {
"N" : -1
},
"limit" : NumberLong(5)
}
}
],
"ok" : 1
}

There are currently some limitations in what aggregation framework can do to improve the performance in our use case, however, you should be able to speed up the query by sorting on category first. This will force the query to use the index you have added and should speed up the group query in the second part of your pipeline:
db.products.aggregate([
{ "$sort" : { "category" : 1 },
{$group : {_id:"$category",N:{$sum:1}}},
{$sort:{N: -1}},{$limit:5}]);

Slow $group in mongodb

I am working to fetch data from mongodb using $group. I have modified my query to
db.mydata.aggregate([{ $match: {"CreatedOn": {$lte: ISODate("2015-10-27T03:45:09Z"),
"$gte": ISODate("2015-09-09T07:37:27.526Z")}} },
{"$group" : { "_id" : "$myIP" , "total" : { "$sum" : "$SuccessCount"}}},
{ "$project" : { "myIP" : "$_id" , "_id" : 0 , "Total" : "$total"}},
{ "$sort" : { "Total" : -1}}, { "$limit" : 10}])
But it is taking more than 2 minute to execute, even for small amount of data. I have created index for CreatedOn. I have also created index for myIP.
I have document structure like
{ "_id" : ObjectId("55d33d7045cedc287ed840a3"),
"myIP" : "10.10.10.1","SuccessCount" : 1,
"CreatedOn":ISODate("2015-10-27T03:45:09Z")
}
I want success count's by all myIP's with maximum on top.

Aggregation query returning array of all objects for mongodb

I'm using mongo for the first time. I'm trying to aggregate some documents in a collection using the query below. Instead the query returns an object with a key "result" that contains an array of all the documents that fit with $match.
Below is the query.
db.events_2015_04_10.aggregate([
{$group:{
_id: "$uid",
count: {$sum: 1},
},
$match : {promo:"bc40100abc8d4eb6a0c68f81f4a756c7", evt:"login"}
}
]
);
Below is a sample document in the collection:
{
"_id" : ObjectId("552712c3f92ea17426000ace"),
"product" : "Mobile Safari",
"venue_id" : NumberLong(71540),
"uid" : "dd542fea6b4443469ff7bf1f56472eac",
"ag" : 0,
"promo" : "bc40100abc8d4eb6a0c68f81f4a756c7",
"promo_f" : NumberLong(1),
"brand" : NumberLong(17),
"venue" : "ovation_2480",
"lt" : 0,
"ts" : ISODate("2015-04-10T00:01:07.734Z"),
"evt" : "login",
"mac" : "00:00:00:00:00:00",
"__ns__" : "wifipromo",
"pvdr" : NumberLong(42),
"os" : "iPhone",
"cmpgn" : "fc6de34aef8b4f57af0b8fda98d8c530",
"ip" : "192.119.43.250",
"lng" : 0,
"product_ver" : "8"
}
I'm trying to get it all grouped by uid's with the total sum of each group... What is the correct way to achieve this?

Try the following aggregation framework which has the $match pipeline stage first and then the $group pipeline later:
db.events_2015_04_10.aggregate([
{
$match: {
promo: "bc40100abc8d4eb6a0c68f81f4a756c7",
evt: "login"
}
},
{
$group: {
_id: "$uid",
count: {
$sum: 1
}
}
}
])

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Problem in using indexes in aggregation pipeline - mongodb

I solved this problem by changing my data model and moving DateTime to higher level of data.

Related

Limit distinct values only if a subelement exists

MongoDB select documents where field1 equals nested.field2 in aggregate pipeline

Mongodb: Indexing for Aggregate sort limit query?

Slow $group in mongodb

Aggregation query returning array of all objects for mongodb

Categories

Resources