group operations over arrays using Mongo aggregation framework - mongodb

I'm using mongodb 2.2. I would like to use the new Aggregation Framework to do queries over my documents, but the elements are arrays.
Here an example of my $project result:
{
"type" : [
"ads-get-yyy",
"ads-get-zzz"
],
"count" : [
NumberLong(0),
NumberLong(10)
],
"latency" : [
0.9790918827056885,
0.9790918827056885
]
}
I want to group by type, so for "ads-get-yyy" to know how much is the average of count and how much is the average of the latency.
I would like to have something similar to the next query, but that works inside of the elements of every array:
db.test.aggregate(
{
$project : {
"type" : 1,
"count" : 1,
"latency" : 1
}
},{
$group : {
_id: {type : "$type"},
count: {$avg: "$count"},
latency: {$avg: "$latency"}
}
});

I'm just learning the new AF too, but I think you need to first $unwind the types so that you can group by them. So something like:
db.test.aggregate({
$project : {
"type" : 1,
"count" : 1,
"latency" : 1
}
},{
$unwind : "$type"
},{
$group : {
_id: {type : "$type"},
count: {$avg: "$count"},
latency: {$avg: "$latency"}
}
});

Related

how to find duplicate records in mongo db query to use

I have below collection, need to find duplicate records in mongo, how can we find that as below is one sample of collection we have around more then 10000 records of collections.
/* 1 */
{
"_id" : 1814099,
"eventId" : "LAS012",
"eventName" : "CustomerTab",
"timeStamp" : ISODate("2018-12-31T20:09:09.820Z"),
"eventMethod" : "click",
"resourceName" : "CustomerTab",
"targetType" : "",
"resourseUrl" : "",
"operationName" : "",
"functionStatus" : "",
"results" : "",
"pageId" : "CustomerPage",
"ban" : "290824901",
"jobId" : "87377713",
"wrid" : "87377713",
"jobType" : "IBJ7FXXS",
"Uid" : "sc343x",
"techRegion" : "W",
"mgmtReportingFunction" : "N",
"recordPublishIndicator" : "Y",
"__v" : 0
}
We can first find the unique ids using
const data = await db.collection.aggregate([
{
$group: {
_id: "$eventId",
id: {
"$first": "$_id"
}
}
},
{
$group: {
_id: null,
uniqueIds: {
$push: "$id"
}
}
}
]);
And then we can make another query, which will find all the duplicate documents
db.collection.find({_id: {$nin: data.uniqueIds}})
This will find all the documents that are redundant.
Another way
To find the event ids which are duplicated
db.collection.aggregate(
{"$group" : { "_id": "$eventId", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }
)
To get duplicates from db, you need to get only the groups that have a count of more than one, we can use the $match operator to filter our results. Within the $match pipeline operator, we'll tell it to look at the count field and tell it to look for counts greater than one using the $gt operator representing "greater than" and the number 1. This looks like the following:
db.collection.aggregate([
{$group: {
_id: {eventId: "$eventId"},
uniqueIds: {$addToSet: "$_id"},
count: {$sum: 1}
}
},
{$match: {
count: {"$gt": 1}
}
}
]);
I assume that eventId is a unique id.

Aggregation query returning array of all objects for mongodb

I'm using mongo for the first time. I'm trying to aggregate some documents in a collection using the query below. Instead the query returns an object with a key "result" that contains an array of all the documents that fit with $match.
Below is the query.
db.events_2015_04_10.aggregate([
{$group:{
_id: "$uid",
count: {$sum: 1},
},
$match : {promo:"bc40100abc8d4eb6a0c68f81f4a756c7", evt:"login"}
}
]
);
Below is a sample document in the collection:
{
"_id" : ObjectId("552712c3f92ea17426000ace"),
"product" : "Mobile Safari",
"venue_id" : NumberLong(71540),
"uid" : "dd542fea6b4443469ff7bf1f56472eac",
"ag" : 0,
"promo" : "bc40100abc8d4eb6a0c68f81f4a756c7",
"promo_f" : NumberLong(1),
"brand" : NumberLong(17),
"venue" : "ovation_2480",
"lt" : 0,
"ts" : ISODate("2015-04-10T00:01:07.734Z"),
"evt" : "login",
"mac" : "00:00:00:00:00:00",
"__ns__" : "wifipromo",
"pvdr" : NumberLong(42),
"os" : "iPhone",
"cmpgn" : "fc6de34aef8b4f57af0b8fda98d8c530",
"ip" : "192.119.43.250",
"lng" : 0,
"product_ver" : "8"
}
I'm trying to get it all grouped by uid's with the total sum of each group... What is the correct way to achieve this?
Try the following aggregation framework which has the $match pipeline stage first and then the $group pipeline later:
db.events_2015_04_10.aggregate([
{
$match: {
promo: "bc40100abc8d4eb6a0c68f81f4a756c7",
evt: "login"
}
},
{
$group: {
_id: "$uid",
count: {
$sum: 1
}
}
}
])

mongodb - filter out some values when doing $unwind

I have the following Customer Order data in mongodb
"_id" : 7,
"customer name" : "John Smith",
"OrderItem" : [
{
"product_category" : "Mobile",
"price" : 900
},
{
"product_category" : "Computer",
"price" : 4200.48
},
{
"product_category" : "TV",
"price" : 670.20
},
{
"product_category" : "TV",
"price" : 960.52
}
]
I need to average each product category to be like this:
"_id" : 7,
"customer name" : "John Smith",
"OrderItem" : [
{
"product_category" : "Mobile",
"price" : 900
},
{
"product_category" : "Computer",
"price" : 4200.48
},
{
"product_category" : "TV",
"price" : 815.36
}
]
i tried to use $unwind but not sure how to group them . any help ?
Use aggregation framework with a pipeline which consists of the following stages: a $match operation in the first pipeline stage filters the document stream to allow only matching documents (document with _id = 7 in your case) to pass unmodified into the next pipeline stage, which is the $unwind operation. This deconstructs the desired OrderItem array field from the input documents to output a document for each element that you can then group on and do the aggregation operation of finding the average of the category prices. The next stage in the pipeline is the $group operation which then groups input documents by product_category and applies the $avg expression to each group on the price. The last stage $project then reshapes each document in the stream to produce the desired outcome. Thus your aggregation would look like:
db.collection.aggregate([
{
"$match": {"_id": 7}
},
{
"$unwind": "$OrderItem"
},
{
"$group": {
"_id": "$OrderItem.product_category",
"average_price": {
"$avg": "$OrderItem.price"
}
}
},
{
"$project": {
"_id": 0,
"product_category" : "$_id",
"average_price": 1
}
}
])
Result:
{
"result" : [
{
"average_price" : 4200.48,
"product_category" : "Computer"
},
{
"average_price" : 815.36,
"product_category" : "TV"
},
{
"average_price" : 900,
"product_category" : "Mobile"
}
],
"ok" : 1
}
First you should unwind OrderItem then group them and mongo $avg to calculate avarage. Below aggregation will calculate avg
db.collectionName.aggregate(
{"$match":{"customer name":"John Smith"}}, // match specified customername
{"$unwind":"$OrderItem"}, // unwind the OrderItem
{"$group":{"_id":"$OrderItem.product_category",
"avg": {"$avg":"$OrderItem.price"} // mongo avg method used for avrage
}}
).pretty()
So above query return following results
{ "_id" : "Computer", "avg" : 4200.48 }
{ "_id" : "TV", "avg" : 815.36 }
{ "_id" : "Mobile", "avg" : 900 }
But above result not match your given expected output, so you should group twice to get exact output
db.collectionName.aggregate(
{"$match":{"customer name":"John Smith"}}, //match given criteria
{"$unwind":"$OrderItem"}, //unwind $OrderItem
{"$group":{"_id":"$OrderItem.product_category",
"customerName":{"$first":"$customer name"}, // group all data with calculating avg
"id":{"$first":"$_id"},
"avg":{"$avg":"$OrderItem.price"}}},
{"$group":{"_id":"$id",
"customer Name":{"$first":"$customerName"},
"OrderItem":{"$push": {"product_category":"$_id","price":"$avg"}}}} // group them for expected output
).pretty()
.aggregate([
{$unwind: "$OrderItem"},
{$group: {
_id: {id: "$_id", cat: "$OrderItem.product_category"},
name: {$first: "$customer name"},
price: {$avg: "$OrderItem.price"}
}},
{$group: {
_id: "$_id.id",
OrderItem: {$push: {product_category: "$_id.cat", price: "$price"}},
"customer name": {$first: "$name"}
}}
])

MongoDB nested grouping

I have the following MongoDB data model:
{
"_id" : ObjectId("53725814740fd6d2ee0ca2bb"),
"date" : "2014-01-01",
"establishmentId" : 1,
"products" : [
{
"productId" : 1,
"price" : 7.03,
"someOtherInfo" : 325,
"somethingElse" : 6878
},
{
"productId" : 2,
"price" : 4.6,
"someOtherInfo" : 243,
"somethingElse" : 1757
},
{
"productId" : 3,
"price" : 2.14,
"someOtherInfo" : 610,
"somethingElse" : 5435
},
{
"productId" : 4,
"price" : 1.45,
"someOtherInfo" : 627,
"somethingElse" : 5762
},
{
"productId" : 5,
"price" : 3.9,
"someOtherInfo" : 989,
"somethingElse" : 3752
}
}
What is the fastest way to get the average price across all establishments? Is there a better data model to achieve this?
An aggregation operation should handle this well. I'd suggest looking into the $unwind operation.
Something along these lines should work (just as an example):
db.collection.aggregate(
{$match: {<query parameters>}},
{$unwind: "$products"},
{
$group: {
_id: "<blank or field(s) to group by before averaging>",
$avg: "$price"
}
}
);
An aggregation built in this style should produce a JSON object that has the data you want.
Due to the gross syntax errors in anything else provided the more direct answer is:
db.collection.aggregate([
{ "$unwind": "$products" },
{ "$group": {
"_id": null,
"avgprice": { "$avg": "$products.price" }
}}
])
The usage of the aggregation framework here is to first $unwind the array, which is a way to "de-normalize" the content in the array into separate documents.
Then in the $group stage you pass in a value of null to the _id which means "group everything" and pass your $products.price ( note the dot notation ) in to the $avg operator to return the total average value across all of the sub-document entries in all of your documents in the collection.
See the full operator reference for more information.
The best solution I found was:
db.collection.aggregate([
{$match:{date:{$gte:"2014-01-01",$lte:"2014-01-31"},establishmentId:{$in:[1,2,3,4,5,6]}}
{ "$unwind": "$products" },
{ "$group": {
"_id": {date:"$date",product:"$products.productId"},
"avgprice": { "$avg": "$products.price" }
}}
])
And something I found out also was that it is much better to first use match and then unwind so there are fewer items to unwind. This results in a faster overall process.

Obtaining $group result with group count

Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
So is obtaining the document result and count at the same time possible? Any help is appreciated.
Use $project to save tag and count into tmp
Use $push or addToSet to store tmp into your data list.
Code:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)