Mongo $group with $project - mongodb

I am trying to get keyword count along with parentId, categioryId and llcId.
My db is
{
"_id" : ObjectId("5673f5b1e4b0822f6f0a5b89"),
"keyword" : "electronic content management system",
"llcId" : "CL1K9B",
"categoryId" : "CL1K8V",
"parentId" : "CL1K8V",
}
I tried $project with $group
db.keyword.aggregate([
{
$group: {
_id: "$llcId",
total: {$sum: 1},
}
},
{
$project: {
categoryId: 1, total: 1
}
}
])
And it gives me a result like
{ "_id" : "CL1KJQ", "total" : 17 }
{ "_id" : "CL1KKW", "total" : 30 }
But I need actual data in result also e.g. llcId, categoryId, keyword, total. I tried to display cetgoryId and keyword by using $project but it displays only _id and total. What I am missing?

To get the keyword count you'd need to group the documents by the keyword field, then use the accumulator operator $sum to get the documents count. As for the other field values, since you are grouping all the documents by the keyword value, the best you can do to get the other fields is use the $first operator which returns a value from the first document for each group. Otherwise you may have to use the $push operator to return an array of the field values for each group:
var pipeline = [
{
"$group": {
"_id": "$keyword",
"total": { "$sum": 1 },
"llcId": { "$first": "$llcId"},
"categoryId": { "$first": "$categoryId"},
"parentId": { "$first": "$parentId"}
}
}
];
db.keyword.aggregate(pipeline)

You are grouping by llcId so it will give more than one categoryId per llcId.
If you want categoryId as in your result, you have to write that in your group query. For example:
db.keyword.aggregate([
{
$group: {
_id: "$llcId",
total: {$sum: 1},
categoryId:{$max:"$categoryId"}
}
},
{
$project: {
categoryId: 1, total: 1
}
}])

Related

MongoDB - Filter Array and Get distinct counts

My MongoDB document looks like below:
{
"_id" : ObjectId("5fb1828a6dbd2e5c533e2378"),
"email" : "hskasd#gmail.com",
"fname" : "JOSE",
"appt" : [
{
"date" : "12/04/2020",
"time" : "0900",
},
{
"date" : "12/05/2020",
"time" : "1000",
},
]
}
Both appt.date and appt.time are String!
I need to filter the records that contain array value appt.date: "12/04/2020". Then find all distinct appt.time values for given date along with its count.
I tried to use the pipeline aggregation but just cannot get it to work. How can I solve this in MongoDB 2.6.11?
You can try,
$match appt.date condition to filter main document
$unwind deconstruct appt array
$match appt.date condition again to filter sub document
$group by null and make time unique using $addToSet array
$addFields to get count of total time
db.collection.aggregate([
{ $match: { "appt.date": "12/04/2020" } },
{ $unwind: "$appt" },
{ $match: { "appt.date": "12/04/2020" } },
{
$group: {
_id: null,
time: { $addToSet: "$appt.time" }
}
},
{
$project: {
_id: 0,
time: 1,
count: { $size: "$time" }
}
}
])
Playground

Combining group and project in mongoDB aggregation framework

my document looks like this:
{
"_id" : ObjectId("5748d1e2498ea908d588b65e"),
"some_item" : {
"_id" : ObjectId("5693afb1b49eb7d5ed97de14"),
"item_property_1" : 1.0,
"item_property_2" : 2.0,
},
"timestamp" : "2016-05-28",
"price_information" : {
"arbitrary_value" : 111,
"hourly_rates" : [
{
"price" : 74.45,
"hour" : "0"
},
{
"price" : 74.45,
"hour" : "1"
},
{
"price" : 74.45,
"hour" : "2"
},
]
}
}
I did average the price per day via:
db.hourly.aggregate([
{$match: {timestamp : "2016-05-28"}},
{$unwind: "$price_information.hourly_rates"},
{$group: { _id: "$unique_item_identifier", total_price: { $avg: "$price_information.hourly_rates.price"}}}
]);
I am struggling with bringing (projecting) other params with in the result set. I would like to have also some_item and timestampin the result set. I tried to use a $project: {some_item: 1, total_price: 1, ...} within the query, but that wasn't right.
My desired output would be like:
{
"_id" : ObjectId("5693afb1b49eb7d5ed97de27"),
"someItem" : {
"_id" : ObjectId("5693afb1b49eb7d5ed97de14"),
"item_property_1" : 1.0,
"item_property_2" : 2.0,
},
"timestamp" : "2016-05-28",
"price_information" : {
"avg_price": 34
}
}
If somebody could give me a hint, how to project the grouping and the other params into the result set, I would be thankful.
Best
Rob
If using MongoDB 3.2 and newer, you can use $avg in the $project pipeline since it returns the average of the specified expression or list of expressions for each document e.g
db.hourly.aggregate([
{ "$match": { "timestamp": "2016-05-28" } },
{
"$project": {
"price_information": {
"avg_price": { "$avg": "$price_information.hourly_rates.price" }
},
"someItem": 1,
"timestamp": 1,
}
}
]);
In previous versions of MongoDB, $avg is available in the $group stage only. So to include the other fields, use the $first operator in your grouping:
db.hourly.aggregate([
{ "$match": { "timestamp": "2016-05-28" } },
{ "$unwind": "$price_information.hourly_rates" },
{
"$group": {
"_id": "$_id",
"avg_price": { "$avg": "$price_information.hourly_rates.price" },
"someItem": { "$first": "$some_item" },
"timestamp": { "$first": "$timestamp" },
}
},
{
"$project": {
"price_information": { "avg_price": "$avg_price" },
"someItem": 1
"timestamp": 1
}
}
]);
Note: Usage of the $first operator in a $group stage will largely depend on how the documents getting in that pipeline are ordered as well as the group by key. Because $first will returns the first document value in a group of documents that share the same group by key, the $group stage logically should precede a $sort stage to have the input documents in a defined order. This is only sensible to use when you know the order that the data is being processed in.
However, as the above is grouping by the main document's _id key, the $first operator when applied to non-denormalized fields (and not the flattened price_information array fields) will guarantee the original value in the result. Hence no need for a pre-sort stage to define the order since it won't be necessary in this case.

Mongo find duplicates for entries for two or more fields

I have documents like this:
{
"_id" : ObjectId("557eaf444ba222d545c3dffc"),
"foreing" : ObjectId("538726124ba2222c0c0248ae"),
"value" : "test",
}
I want to find all documents which have duplicated values for pair foreing & value.
You can easily identify the duplicates by running the following aggregation pipeline operation:
db.collection.aggregate([
{
"$group": {
"_id": { "foreing": "$foreing", "value": "$value" },
"uniqueIds": { "$addToSet": "$_id" },
"count": { "$sum": 1 }
}
},
{ "$match": { "count": { "$gt": 1 } } }
])
The $group operator in the first step is used to group the documents by the foreign and value key values and then create an array of _id values for each of the grouped documents as the uniqueIds field using the $addToSet operator. This gives you an array of unique expression values for each group. Get the total number of grouped documents to use in the later pipeline stages with the $sum operator.
In the second pipeline stage, use the $match operator to filter out all documents with a count of 1. The filtered-out documents represent unique index keys.
The remaining documents will be those in the collection that have duplicate key values for pair foreing & value.
We only have to group on the bases of 2 keys, and select the elements with count greater than 1, to find the duplicates.
Query :- Will be like
db.mycollection.aggregate(
{ $group: {
_id: { foreing: "$foreing", value: "$value" },
count: { $sum: 1 },
docs: { $push: "$_id" }
}},
{ $match: {
count: { $gt : 1 }
}}
)
OUTPUT :- Will be like
{
"result" : [
{
"_id" : {
"foreing" : 1,
"value" : 2
},
"count" : 2,
"docs" : [
ObjectId("34567887654345678987"),
ObjectId("34567887654345678987")
]
}
],
"ok" : 1
}
Reference Link :- How to find mongo documents with a same field
Difference between node.js require and ES6 import and export

mongodb count num of distinct values per field/key

Is there a query for calculating how many distinct values a field contains in DB.
f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)
If someone adds more documents with a new country I would like the query to return 9.
Is there easier way then group and count?
MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.
There is a shell db.collection.distinct() helper as well:
> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]
> db.countries.distinct('country').length
4
As noted in the MongoDB documentation:
Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.
Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.
db.articles.aggregate([
{
$match: {
keywords: { $not: {$size: 0} }
}
},
{ $unwind: "$keywords" },
{
$group: {
_id: {$toLower: '$keywords'},
count: { $sum: 1 }
}
},
{
$match: {
count: { $gte: 2 }
}
},
{ $sort : { count : -1} },
{ $limit : 100 }
]);
that give result such as
{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }
With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.
For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:
db.users.aggregate([
{ "$group": {
"_id": { "$toLower": "$role" },
"count": { "$sum": 1 }
} },
{ "$group": {
"_id": null,
"counts": {
"$push": { "k": "$_id", "v": "$count" }
}
} },
{ "$replaceRoot": {
"newRoot": { "$arrayToObject": "$counts" }
} }
])
Example Output
{
"user" : 67,
"superuser" : 5,
"admin" : 4,
"moderator" : 12
}
I wanted a more concise answer and I came up with the following using the documentation at aggregates and group
db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])
You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.
Sample
For each distinct value of field counts the occurrences in documents optionally filtered by query
> db.users.distinctAndCount('name', {name: /^a/i})
{
"Abagail": 1,
"Abbey": 3,
"Abbie": 1,
...
}
The field parameter could be an array of fields
> db.users.distinctAndCount(['name','job'], {name: /^a/i})
{
"Austin,Educator" : 1,
"Aurelia,Educator" : 1,
"Augustine,Carpenter" : 1,
...
}
To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :
db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})
So, find number distinct names from a collection where age > 25 will be like :
db.your_collection_name.distinct('names', {'age': {"$gt": 25}})
Hope it helps!
I use this query:
var collection = "countries"; var field = "country";
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})
Output:
countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16
This query first distinct all the values, and then count for each one of them the number of occurrences.
If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:
db.users.aggregate([
{ $group: { _id: '$country' } },
{ $count: 'countOfUniqueCountries' }
]);

Can the MongoDB aggregation framework $group return an array of values?

How flexible is the aggregate function for output formatting in MongoDB?
Data format:
{
"_id" : ObjectId("506ddd1900a47d802702a904"),
"port_name" : "CL1-A",
"metric" : "772.0",
"port_number" : "0",
"datetime" : ISODate("2012-10-03T14:03:00Z"),
"array_serial" : "12345"
}
Right now I'm using this aggregate function to return an array of DateTime, an array of metrics, and a count:
{$match : { 'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}
},
{$project : { port_name : 1, metric : 1, datetime: 1}},
{$group : { _id : "$port_name",
datetime : { $push : "$datetime"},
metric : { $push : "$metric"},
count : { $sum : 1}}}
Which is nice, and very fast, but is there a way to format the output so there's one array per datetime/metric? Like this:
[
{
"_id" : "portname",
"data" : [
["2012-10-01T00:00:00.000Z", 1421.01],
["2012-10-01T00:01:00.000Z", 1361.01],
["2012-10-01T00:02:00.000Z", 1221.01]
]
}
]
This would greatly simplify the front-end as that's the format the chart code expects.
Combining two fields into an array of values with the Aggregation Framework is possible, but definitely isn't as straightforward as it could be (at least as at MongoDB 2.2.0).
Here is an example:
db.metrics.aggregate(
// Find matching documents first (can take advantage of index)
{ $match : {
'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}},
// Project desired fields and add an extra $index for # of array elements
{ $project: {
port_name: 1,
datetime: 1,
metric: 1,
index: { $const:[0,1] }
}},
// Split into document stream based on $index
{ $unwind: '$index' },
// Re-group data using conditional to create array [$datetime, $metric]
{ $group: {
_id: { id: '$_id', port_name: '$port_name' },
data: {
$push: { $cond:[ {$eq:['$index', 0]}, '$datetime', '$metric'] }
},
}},
// Sort results
{ $sort: { _id:1 } },
// Final group by port_name with data array and count
{ $group: {
_id: '$_id.port_name',
data: { $push: '$data' },
count: { $sum: 1 }
}}
)
MongoDB 2.6 made this a lot easier by introducing $map, which allows a simplier form of array transposition:
db.metrics.aggregate([
{ "$match": {
"array_serial": array,
"port_name": { "$in": ports},
"datetime": { "$gte": from, "$lte": to }
}},
{ "$group": {
"_id": "$port_name",
"data": {
"$push": {
"$map": {
"input": [0,1],
"as": "index",
"in": {
"$cond": [
{ "$eq": [ "$$index", 0 ] },
"$datetime",
"$metric"
]
}
}
}
},
"count": { "$sum": 1 }
}}
])
Where much like the approach with $unwind, you supply an array as "input" to the map operation consisting of two values and then essentially replace those values with the field values you want via the $cond operation.
This actually removes all the pipeline juggling required to transform the document as was required in previous releases and just leaves the actual aggregation to the job at hand, which is basically accumulating per "port_name" value, and the transformation to array is no longer a problem area.
Building arrays in the aggregation framework without $push and $addToSet is something that seems to be lacking. I've tried to get this to work before, and failed. It would be awesome if you could just do:
data : {$push: [$datetime, $metric]}
in the $group, but that doesn't work.
Also, building "literal" objects like this doesn't work:
data : {$push: {literal:[$datetime, $metric]}}
or even data : {$push: {literal:$datetime}}
I hope they eventually come up with some better ways of massaging this sort of data.