How to perform case-insensitive aggregation grouping in MongoDb? - mongodb

Let's say that I want to aggregate and group by documents in MongoDb by the Description field.
Running the following (case-sensitive by default):
db['Products'].aggregate(
{ $group: {
_id: { 'Description': "$Description" },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
);
on my sample data gives me 1000 results, which is fine.
But now I expect that running a case-insensitive query (using $toLower) should give me less than or equal to 1000 results:
db['Products'].aggregate(
{ $group: {
_id: { 'Description': {$toLower: "$Description"} },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
);
But instead I get more than 1000 results. That can't be right, can it? More common entries should get grouped together to yield less number of total groupings ... I think.
So then probably my aggregation query is wrong! Which brings me to my question:
How should case-insensitive aggregation grouping in MongoDb be performed?

You approach to case-insensitive grouping is correct so perhaps your observation is not? ;)
Try this example:
// insert two documents
db.getCollection('test').insertOne({"name" : "Test"}) // uppercase 'T'
db.getCollection('test').insertOne({"name" : "test"}) // lowercase 't'
// perform the grouping
db.getCollection('test').aggregate({ $group: { "_id": { $toLower: "$name" }, "count": { $sum: 1 } } }) // case insensitive
db.getCollection('test').aggregate({ $group: { "_id": "$name", "count": { $sum: 1 } } }) // case sensitive
You may have a typo somewhere?
The documentation also states that
$toLower only has a well-defined behavior for strings of ASCII characters.
Perhaps that's what's biting you here?

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.
To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

Count nested wildcard array mongodb query

I have the following data of users and model cars:
[
{
"user_id":"ebebc012-082c-4e7f-889c-755d2679bdab",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_37c04124-cb12-436c-902b-6120f4c51782":0,
"car_b78ddcd0-1136-4f45-8599-3ce8d937911f":1
},
{
"user_id":"f3eb2a61-5416-46ba-bab4-459fbdcc7e29",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_0d15eae9-9585-4f49-a416-46ff56cd3685":1
}
]
I want to see how many users have a car_ with the value 1 using mongodb, something like:
{"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d": 2}
For this example.
The issue is that I will never know how are the fields car_ are going to be, they will have a random structure (wildcard).
Notes:
car_id and user_id are at the same level.
The car_id is not given, I simply want to know for the entire database which are the most commmon cars_ with value 1.
$group by _id and convert root object to array using $objectToArray,
$unwind deconstruct root array
$match filter root.v is 1
$group by root.k and get total count
db.collection.aggregate([
{
$group: {
_id: "$_id",
root: { $first: { $objectToArray: "$$ROOT" } }
}
},
{ $unwind: "$root" },
{ $match: { "root.v": 1 } },
{
$group: {
_id: "$root.k",
count: { $sum: 1 }
}
}
])
Playground

SQL to Mongo Aggregation

Hi I want to change my sql query to mongo aggregation.
select c.year, c.minor_category, count(c.minor_category) from Crime as c
group by c.year, c.minor_category having c.minor_category = (
Select cc.minor_category from Crime as cc where cc.year=c.year group by
cc.minor_category order by count(*) desc, cc.minor_category limit 1)
I tried do something like this:
db.crimes.aggregate({
$group: {
"_id": {
year: "$year",
minor_category :"$minor_category",
count: {$sum: "$minor_category"}
}
},
},
{
$match : {
minor_category: ?
}
})
But i stuck in $match which is equivalent to having, but i dont know how to make subqueries in mongo like in my sql query.
Can anybody can help me ?
Ok based on the confirmation above , the below query should work.
db.crime.aggregate
([
{"$group":{"_id":{"year":"$year","minor":"$minor"},"count":{"$sum":1}}},
{"$project":{"year":"$_id.year","count":"$count","minor":"$_id.minor","document":"$$ROOT"}},
{"$sort":{"year":1,"count":-1}},
{"$group":{"_id":{"year":"$year"},"orig":{"$first":"$document"}}},
{"$project":{"_id":0,"year":"$orig._id.year","minor":"$orig._id.minor","count":"$orig.count"}}
)]
This translates into the following MongoDB query:
db.crime.aggregate({
$group: { // group by year and minor_catetory
_id: {
"year": "$year",
"minor_category": "$minor_category"
},
"count": { $sum: 1 }, // count all documents per group,
}
}, {
$sort: {
"count": -1, // sort descending by count
"minor_category": 1 // and ascending by minor_category
}
}, {
$group: { // now we get the highst element per year
_id: "$_id.year", // so group by year
"minor_category": { $first: "$_id.minor_category" }, // and get the first (we've sorted the data) value
"count": { $first: "$count" } // same here
}
}, {
$project: { // remove the _id field and add the others in the right order (if needed)
"_id": 0,
"year": "$_id",
"minor_category": "$minor_category",
"count": "$count"
}
})

Best usage for MongoDB Aggregate request

I would like to highlight a list of _id documents (with a limit) ranked in descending order (via their timestamp) based on a list of ObjectId.
Corresponding to this:
db.collection.aggregate( [ { $match: { _id: { $in: [ObjectId("X"), ObjectId("Y") ] } } }, { $sort: { timestamp: -1 } }, { $group: { _id: "$_id" } }, { $skip: 0 }, { $limit: 100 } ] )
Knowing that the list from the loop may contain way more than 1000 ObjectId (in $in array), do you think my solution is viable? Is not there a faster and less resource intensive way?
Best Regards.

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.