[MongoDb][net core]Improve performance of Count and Group By for more than 50 million record - mongodb

I'm using MongoDB.Driver(.net core) with MongoDB. I'm writing aggregate data and I see that mongo count and group is very very slow. My API run around 2 minutes.
Is any way to improve the performance of the Count and group method? Or any alternative solutions for query?
public async Task<(List<string>, List<int>, int)> CountByType()
{
var collection = _connectionData.GetCollection<abc>("abc");
var records_actor = await collection
.Aggregate()
.Group(new BsonDocument { { "_id", "$actor_type" }, { "count", new BsonDocument("$sum", 1) } })
.Project(new BsonDocument{
{ "count", 1 },
{ "_id", 0 },
{ "name_type", new BsonDocument(
"$switch", new BsonDocument(
"branches", new BsonArray{
new BsonDocument{
{ "case",new BsonDocument(
"$eq",new BsonArray{"$_id",2})
},
{ "then", "System" }
},
new BsonDocument{
{ "case",new BsonDocument(
"$eq",new BsonArray{"$_id",1})
},
{ "then", "App" }
}
})
)
}
})
.Sort(new BsonDocument { { "count", -1 } })
.ToListAsync();
}
I have index for the field actor type but the index doesn't improve anything

Related

What query uses less resources in MongoDB?

I am getting familiarized with Lucene and MongoDB Atlas search, and I have a question about query efficiency.
Which one of those queries uses fewer resources?
If there are better queries for performing the below task, please let me know.
I want to return all movies (sample_mflix) that match on a title value. The movies must be for a specific year (should not return any movie that is not for that year), and I would like to return movies with "$gte" values for movies.awards.nominations & movies.awards.wins.
The first query seems more complex (which seems to increase resource utilization - query complexity?). This query also is not returning values for that year only. That makes me think that there is probably a better way to do this with Atlas search.
The second query uses the $search and a $match in different stages. It has a simple Lucene search (which might return more movies than the first query?), and the match operator will filter the results. The second query is more precise - from my tests, it respects the year constraint. If I apply a limit stage, would this be a better solution?
If those queries were executed in the same scenario, which one would be more efficient, and why (apologies, the second query is formatted for .net driver)?
new BsonArray
{
new BsonDocument("$search",
new BsonDocument
{
{ "index", "nostoreindex" },
{ "compound",
new BsonDocument
{
{ "must",
new BsonDocument("near",
new BsonDocument
{
{ "path", "year" },
{ "origin", 2000 },
{ "pivot", 1 }
}) },
{ "must",
new BsonDocument("text",
new BsonDocument
{
{ "query", "poor" },
{ "path", "title" }
}) },
{ "should",
new BsonDocument("range",
new BsonDocument
{
{ "path", "awards.nominations" },
{ "gte", 1 }
}) },
{ "should",
new BsonDocument("range",
new BsonDocument
{
{ "path", "awards.wins" },
{ "gte", 1 }
}) }
} }
})
}
VS
var searchStage =
new BsonDocument("$search",
new BsonDocument
{
{ "index", "nostoreindex" },
{ "text",
new BsonDocument
{
{ "query", title },
{ "path", "title" }
} }
});
var matchStage = new BsonDocument("$match",
new BsonDocument("$and",
new BsonArray
{
new BsonDocument("year",
new BsonDocument("$eq", year)),
new BsonDocument("awards.nominations",
new BsonDocument("$gte", nominations)),
new BsonDocument("awards.wins",
new BsonDocument("$gte", awards))
})
);
When using Atlas Search, it is better to avoid using a succeeding $match filter after your $search stage. This is because all data will need to be looked up in your mongod by id, which can be quite slow.
So, generally, you are trying to keep your search and filters "in Lucene" if possible, to avoid extra IO and comparisons.
In your case, you are using near which will return all results in order descending from near. You should use range instead which can filter those results and speed up your query.
near is used to score your results higher if they are closer to a specific value, which can simulate a sort. For example, if you want to score results with higher 'awards.wins' you may wish to add a near : { origin: 10000, pivot: 1} then the closer the value is to 10000 the higher the score.
new BsonArray
{
new BsonDocument("$search",
new BsonDocument
{
{ "index", "nostoreindex" },
{ "compound",
new BsonDocument
{
{ "must",
new BsonDocument("range",
new BsonDocument
{
{ "path", "year" },
{ "gte", 2000 },
{ "lte", 2000 }
}) },
{ "must",
new BsonDocument("text",
new BsonDocument
{
{ "query", "poor" },
{ "path", "title" }
}) },
{ "should",
new BsonDocument("range",
new BsonDocument
{
{ "path", "awards.nominations" },
{ "gte", 1 }
}) },
{ "should",
new BsonDocument("range",
new BsonDocument
{
{ "path", "awards.wins" },
{ "gte", 1 }
}) }
} }
})
}

MongoDb $projection query on C#

I need help on how to build a MongoDB query from the C# Driver. What I'm trying to make is a datediff in milliseconds and then filter those results where the datediff in milliseconds is greater or equal than an specific number.
The mongodb query that I use in the mongo shell is:
db.getCollection('Coll').aggregate(
[
{$project : {
"dateInMillis" : {$subtract: [ new Date(), "$UpdateDate" ]},
"Param2": "$Param2",
"Param3": "$Param3"}
},
{$match :{ dateInMillis : { $gte : 2662790910}}}
],
{
allowDiskUse : true
});
Which would be the equivalente C# expression?
I've been trying to make the query in many different ways without any result.
I finally found the way to make the aggregate query through the mongodb c# driver. I don't know if its the most efficient way but it's working.
var project = new BsonDocument()
{
{
"$project",
new BsonDocument
{
{"dateInMillis", new BsonDocument
{
{
"$subtract", new BsonArray() {new BsonDateTime(DateTime.UtcNow), "$UpdateDate" }
}
}
},
{
"Param2", "$Param2"
},
{
"Param3", "$Param3"
},
{
"_id", 0
}
}
}
};
var match = new BsonDocument()
{
{
"$match",
new BsonDocument
{
{
"dateInMillis",
new BsonDocument {
{
"$gte",
intervalInMilliseconds
}
}
}
}
}
};
var collection = db.GetCollection<CollClass>("Coll");
var pipeline = new[] { project, match };
var resultPipe = collection.Aggregate<CollClassRS>(pipeline);

$elemMatch equivalent in spring data mongodb

I need to know the equivalent code in spring data mongo db to the code below:-
db.inventory.find( {
qty: { $all: [
{ "$elemMatch" : { size: "M", num: { $gt: 50} } },
{ "$elemMatch" : { num : 100, color: "green" } }
] }
} )
I am able to get the answer. This can be done in Spring data mongodb using following code
Query query = new Query();
query.addCriteria(Criteria.where("qty").elemMatch(Criteria.where("size").is("M").and("num").gt(50).elemMatch(Criteria.where("num").is(100).and("color").is("green"))));
I think query in your answer generated below query
{ "qty" : { "$elemMatch" : { "num" : 100 , "color" : "green"}}}
I think thats not you need.
Its only check last elemMatch expression not all.
Try with this.
query = new Query();
Criteria first = Criteria.where("qty").elemMatch(Criteria.where("size").is("M").and("num").gt(50));
Criteria two = Criteria.where("qty").elemMatch(Criteria.where("num").is(100).and("color").is("green"));
query.addCriteria(new Criteria().andOperator(first, two));
Just implemented $all with $elemMatch with spring data Criteria API:
var elemMatch1 = new Criteria()
.elemMatch(Criteria.where("size").is("M").and("num").gt(50));
var elemMatch2 = new Criteria()
.elemMatch(Criteria.where("num").is(100).and("color").is("green"));
var criteria = Criteria.where("qty")
.all(elemMatch1.getCriteriaObject(), elemMatch2.getCriteriaObject());
mongoTemplate.find(Query.query(criteria), Inventory.class);
Note: important part is calling getCriteriaObject method inside Criteria.all(...) for each Criteria.elemMatch(...) element.
Just to make #Vaibhav answer a bit more clearer.
given Document in DB
{
"modified": true,
"items": [
{
"modified": true,
"created": false
},
{
"modified": false,
"created": false
},
{
"modified": true,
"created": true
}
]
}
You could do following Query if you need items where both attribute of an item are true.
Query query = new Query();
query.addCriteria(Criteria.where("modified").is(true));
query.addCriteria(Criteria.where("items")
.elemMatch(Criteria.where("modified").is(true)
.and("created").is(true)));
here an example how to query with OR in elemMatch
Query query = new Query();
query.addCriteria(Criteria.where("modified").is(true));
query.addCriteria(Criteria.where("items")
.elemMatch(new Criteria().orOperator(
Criteria.where("modified").is(true),
Criteria.where("created").is(true))));
Hi i implemented too in Kotlin, its is extended about if statements to create dynamic query :)
val map = segmentFilter.segmentValueMap.map {
val segmentCriteria = where("segment").isEqualTo(it.key)
if (it.value.isNotEmpty()) {
segmentCriteria.and("segmentValue").`in`(it.value)
}
if (sovOverviewInDateRange != null) {
segmentCriteria.and("date").lte(sovOverviewInDateRange.recent).gte(sovOverviewInDateRange.older)
}
Criteria().elemMatch(segmentCriteria)
}
criteriaQuery.and("sovOverview").all(map.map { it.criteriaObject })
sovOverviews: {
$all: [
{
$elemMatch: {
segment: "Velikost",
segmentValue: "S"
}
},
{
$elemMatch: {
segment: "Kategorie"
}
}
]
}

Error while using aggregation with limit property on mongodb in asp.net MVC4.0?

I use MongoDB and MVC 4.0.
The below code gave me an error, I tried many different ways but it always shows this error:
"Command 'aggregate' failed: exception: A pipeline stage specification
object must contain exactly one field. (response: { "errmsg" :
"exception: A pipeline stage specification object must contain exactly
one field.", "code" : 16435, "ok" : 0.0 })"
My code:
var matchSumcount2 = new BsonDocument
{
{
"$group",
new BsonDocument
{
{ "_id", new BsonDocument
{
{
"Device","$Device"
}
}
},
{
"Clicks",new BsonDocument
{
{
"$sum","$Clicks"
}
}
},
{
"Day",new BsonDocument
{
{
"$sum",1
}
}
}
}
},
{
"$limit",50
}
};
var database = MongoDbManager.GetDatabase();
var pipeline = new[] { matchSumcount2 };
var list = database.GetCollection("rnd").Aggregate(pipeline);
I only want the first 50 records and then perform the aggregation.
What I am doing wrong here? Any suggestion or code sample to do this?
I made the comment above, but you didn't understand. I apologize for not being more clear. I'll use code examples to show you what is wrong.
You are doing this (effectively):
{
{ $group: { _id: { Device: "$Device" } } },
{ $limit: 50 }
}
But that is wrong. $group and $limit should not be siblings in a document. They should be elements in an array.
[
{ $group: { _id: { Device: "$Device" } } },
{ $limit: 50 }
]
Like I mentioned in the comment, I cannot see the start of your code, so I can only make an assumtpion based on the end. Your first line is probably a new BsonDocument(). That is wrong. It should be new BsonArray();
var pipeline = new BsonArray();
pipeline.Add(new BsonDocument(
{
{ "$group", new BsonDocument { { "_id", new BsonDocument { { "Device", "$Device" } } } } }
});
pipeline.Add(new BsonDocument("$limit", 50));

Project fields in mongo Database Command with fulltext search

I'm trying to project fields in the result of a mongo fulltext search but so far with no luck. The comman is as this
var textSearchCommand = new CommandDocument
{
{ "text", "mycollection" },
{ "search", keyword },
{"project", "_id:1, Name:1"}
};
I've tried other approaches pro project but with no luck. What would be the correct syntax`?
You need to make a BsonDocument for your "project" value instead of using a string:
var textSearchCommand = new CommandDocument
{
{ "text", "mycollection" },
{ "search", keyword },
{ "project", new BsonDocument { { "_id", 1 }, { "Name", 1 } } }
};