MongoDB distinct aggregation - mongodb

I'm working on a query to find cities with most zips for each state:
db.zips.distinct("state", db.zips.aggregate([
{ $group:
{ _id: {
state: "$state",
city: "$city"
},
numberOfzipcodes: {
$sum: 1
}
}
},
{ $sort: {
numberOfzipcodes: -1
}
}
])
)
The aggregate part of the query seems to work fine, but when I add the distinct I get an empty result.
Is this because I have state in the id? Can I do something like distinct("_id.state ?

You can use $addToSet with the aggregation framework to count distinct objects.
For example:
db.collectionName.aggregate([{
$group: {_id: null, uniqueValues: {$addToSet: "$fieldName"}}
}])
Or extended to get your unique values into a proper list rather than a sub-document inside a null _id record:
db.collectionName.aggregate([
{ $group: {_id: null, myFieldName: {$addToSet: "$myFieldName"}}},
{ $unwind: "$myFieldName" },
{ $project: { _id: 0 }},
])

Distinct and the aggregation framework are not inter-operable.
Instead you just want:
db.zips.aggregate([
{$group:{_id:{city:'$city', state:'$state'}, numberOfzipcodes:{$sum:1}}},
{$sort:{numberOfzipcodes:-1}},
{$group:{_id:'$_id.state', city:{$first:'$_id.city'},
numberOfzipcode:{$first:'$numberOfzipcodes'}}}
]);

SQL Query: (group by & count of distinct)
select city,count(distinct(emailId)) from TransactionDetails group by city;
Equivalent mongo query would look like this:
db.TransactionDetails.aggregate([
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} }
]);

You can call $setUnion on a single array, which also filters dupes:
{ $project: {Package: 1, deps: {'$setUnion': '$deps.Package'}}}

Related

How to filter array (of objects) inside one document in mongo db based on some condition

I have the below docs collection structure.
I'm able to filter the documnents with various approaches, but not able to filter the array inside the documents.
{
"_id": "",
"employee": {
"EmployeeAttributeValues": {
"EmployeeAttributeValue": [
{.....
},
{.....
},
{.....
},
{.....
}
]
}
}
}
Kindly help me on how to filter the MemberAttributeValue array based on some condition.
you can use $where operator for custom filtering
https://docs.mongodb.com/v4.2/reference/operator/query/where/
db.test.aggregate([
{ $match: {_id: <ID>}},
{ $unwind: '$<ARRAY>'},
{ $match: {'<ARRAY>.a': {$gt: 3}}},
{ $group: {_id: '$_id', list: {$push: '$<ARRAY>.a'}}}
])

MongoDB Group by field, count it and sort it desc

I have the following document structure:
{
..
"mainsubject" : {
"code": 2768,
"name": "Abc"
}
}
Now I need a list of all mainsubject.code's and how often they are used.
In SQL i would do something like this:
SELECT mainsubject_code, COUNT(*) AS 'count'
FROM products
GROUP BY mainsubject_code
ORDER BY count
I already was able to group it and count it:
db.products.aggregate([
{"$group" : {_id:"$mainsubject.code", count:{$sum:1}}}
]);
But how to sort it?
db.coll.aggregate([
{
$group: {
_id: "$mainsubject.code",
countA: { $sum: 1}
}
},
{
$sort:{$mainsubject.code:1}
}
])
did not work?
On looking at your sql query, it looks like you want to sort by count. So in mongo query also you should mention countA as the sort field.
db.coll.aggregate([
{
$group: {
_id: "$mainsubject.code",
countA: { $sum: 1}
}
},
{
$sort:{'countA':1}
}
])
You have to sort by _id field that is the name of the field resulting from the $group stage of your aggregation pipeline. So, modify your query in this way:
db.coll.aggregate([
{
$group: {
_id: "$mainsubject.code",
countA: { $sum: 1}
}
},
{
$sort:{_id:1}
}
])
In this way you're sorting by _id ascending. Your SQL equivalent query is actually sorting by count and to achieve this you can change the $sort stage to:
$sort:{"countA":1}
Use Sort By Count ($sortByCount)
Groups incoming documents based on the value of a specified expression, then computes the count of documents in each distinct group.
db.coll.aggregate([ { $sortByCount: "$mainsubject.code" } ]

MongoDB: How can I get a count of a field in a collection grouped by first character and matching a 2nd field?

Following this question's answer (https://stackoverflow.com/a/20817040/2656506) I was able to group a field based on it's first character with this command:
db.kits.aggregate({ $group: {_id: {$substr: ['$kit', 0, 1]}, count: {$sum: 1}}})
But I can't figure out how I can additionally group only those documents which match an additional condition like _id: 'abc' in the same query. Can it be done in one query?
Thanks in advance!
add $match pipeline stage to your aggregation query:
db.kits.aggregate(
[
{
$match: {
_id: 'abc'
}
},
{
$group: {
_id: {
$substr: ['$kit', 0, 1]
},
count: {$sum: 1}
}
}
]
)

How to determine if average is 0 vs null?

I've got a Mongo database where I run some aggregation queries. Here's the simplified query I want to run:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' }
} },
])
It groups data by fieldA and calculates average for fieldB. Anyway, some rows in result set have 0 as value for fieldB. There can be 2 reasons for that:
Average value IS 0.
All documents in a group didn't have fieldB (or had null as a value); and Mongo behavior is to return 0 in that case.
Is it possible to determine which scenario took place for each row in resulting selection without issuing other query and without leaving aggregation pipeline?
UPDATE
I can't filter out non-null fields, as I'm doing aggregation for few fields, like that:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' },
fieldC: { $avg: '$fieldC' }
} },
])
Some of the documents may have fieldB but not fieldC and vice versa.
You can filter the data by using $match before your $group operation.
db.coll.aggregate([
{ $match: { fieldB : {$ne : null }}}},
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' }
} },
])
This way you will get only documents that have fieldB set.
UPDATE
You can't use the $avg that way but you can find out if all values are NULL using $min operator:
db.coll.aggregate([
{ $group: {
_id: 'fieldA',
fieldB: { $avg: '$fieldB' } ,
fieldBAllNullOrMin: { $min: '$fieldB' }
} },
])
The $min operator will return null if all values are null, otherwise it will return min. value (but only in 2.4+ versions of MongoDB).
You can use the $max (or $min) operator to determine whether all
instances of fieldB in a group are null or missing, as the $max (or
$min) operator return null in that case. Given this aggregation
pipeline:
c.aggregate([
{$group: {
_id: '$fieldA',
avg: {$avg: '$fieldB'},
max: {$max: '$fieldB'},
}}
])
with these documents:
c.insert({fieldA: 1, fieldB: 3})
c.insert({fieldA: 1, fieldB: -3})
the result is:
{"_id": 1, "avg": 0, "max": 3}
whereas with these documents:
c.insert({fieldA: 1})
c.insert({fieldA: 1})
the result is:
{"_id": 1, "avg": 0, "max": null}
The null value for the max field tells you that fieldB was null or
missing in all documents in the group.
Hope this helps,
Bruce

How to simply count the documents by a given criteria (mongo aggregation framework)?

I would like to simply count the documents. What would be the correct way to do the following:
db.my_collection.aggregate({
$match: { // go by the indexed field
date: {
$gte: new Date(2013,1,20),
$lte: new Date(2013,1,27)
}
}
},{
$match: { // go by some other field
someField: 'someValue'
}
},{
$count: { // $sum? $group? $anythingElse?
// ???????
}
})
You should use $group with $sum. Something like this:
$group: {
_id: null,
count: {$sum: 1}
}
SQL to Aggregation Framework mapping chart.