{
"_id" : ObjectId("56bd8e9de517259412a743ab"),
"user_token" : "mzXhdbCu",
"sender_details" : {
"location" : "XYZ",
"zipcode" : "610208"
},
"shipping_address" : {
"location" : "ABC",
"zipcode" : "602578
}
}
I have been trying to count the number of instances of each unique zipcode from both
$sender_details.zipcode
and
$shipping_address.zipcode
I tried to use the following code
db.ac_consignments.aggregate({
$group: {
_id: {
"zipcode":"$sender_details.zipcode",
"szipcode":"$shipping_address.zipcode"
},
count: {"$sum":1}
}
})
The output I receive is this
{
"result" : [
{
"_id" : {
"zipcode" : "610208",
"szipcode" : "602578"
},
"count" : 7
},
{
"_id" : {
"zipcode" : "602578",
"szipcode" : "678705"
},
"count" : 51
}
],
"ok" : 1
}
But what I require is the count of each zipcode present in $sender_details.zipcode and $shipping_address.zipcode totally. So an output like this
{
"result" : [
{
"_id" : {
"zipcode" : "610208",
},
"count" : 7
},
{
"_id" : {
"zipcode" : "602578"
},
"count" : 51
}
{
"_id" : {
"zipcode" : "678705"
},
"count" : 51
}
],
"ok" : 1
}
The following pipeline should work for you
db.getCollection('ac_consignments').aggregate([
{
$project: {
zipcode: [ "$sender_details.zipcode", "$shipping_address.zipcode" ]
}
},
{
$unwind: "$zipcode"
},
{
$group: {
_id: "$zipcode",
count: { $sum: 1 }
}
}
])
which produces output like this
/* 1 */
{
"_id" : "610208",
"count" : 1.0
}
/* 2 */
{
"_id" : "610209",
"count" : 2.0
}
/* 3 */
{
"_id" : "602578",
"count" : 1.0
}
/* 4 */
{
"_id" : "602579",
"count" : 2.0
}
when using the following as sample data
/* 1 */
{
"_id" : ObjectId("56bd8e9de517259412a743ab"),
"user_token" : "mzXhdbCu",
"sender_details" : {
"location" : "XYZ",
"zipcode" : "610208"
},
"shipping_address" : {
"location" : "ABC",
"zipcode" : "602578"
}
}
/* 2 */
{
"_id" : ObjectId("56bd8e9de517259412a743ac"),
"user_token" : "mzXhdbCu",
"sender_details" : {
"location" : "XYZ",
"zipcode" : "610209"
},
"shipping_address" : {
"location" : "ABC",
"zipcode" : "602579"
}
}
/* 3 */
{
"_id" : ObjectId("56bd8e9de517259412a753ac"),
"user_token" : "mzXhdbCu",
"sender_details" : {
"location" : "XYZ",
"zipcode" : "610209"
},
"shipping_address" : {
"location" : "ABC",
"zipcode" : "602579"
}
}
See the following GIF
Update for older versions
db.getCollection('ac_consignments').aggregate([
{
$project: {
sender_zip: "$sender_details.zipcode",
shipping_zip: "$shipping_address.zipcode",
party: { $literal: ["sender_zip", "shipping_zip"] }
}
},
{
$unwind: "$party"
},
{
$group: {
_id: "$_id",
zipcode: {
$push: {
$cond: [
{ $eq: ["$party", "sender_zip"] },
"$sender_zip",
"$shipping_zip"
]
}
}
}
},
{
$unwind: "$zipcode"
},
{
$group: {
_id: "$zipcode",
count: { $sum: 1 }
}
}
])
Related
I'm trying to calculate the frequency of site visits using a Mongo Aggregate function, ie. in a given week, how many days did a user visit the site?
{ "_id" : ObjectId("5f7720caf2b93af8d566bc7c"), "email" : "blah#gmail.com", "timestamp" : ISODate("2020-09-29T17:59:00Z") }
{ "_id" : ObjectId("5f7720dcf2b93af8d566ffb7"), "email" : "blah#gmail.com", "timestamp" : ISODate("2020-09-30T01:01:00Z") }
{ "_id" : ObjectId("5f7721bbf2b93af8d56aadc4"), "email" : "yack#gmail.com", "timestamp" : ISODate("2020-10-01T09:58:00Z") }
{ "_id" : ObjectId("5f771e9ff2b93af8d55c57a9"), "email" : "yack#gmail.com", "timestamp" : ISODate("2020-09-26T04:12:00Z") }
{ "_id" : ObjectId("5f771e9ff2b93af8d55c5f6b"), "email" : "yack#gmail.com", "timestamp" : ISODate("2020-09-26T04:22:00Z") }
{ "_id" : ObjectId("5f771eeaf2b93af8d55dc45c"), "email" : "yack#gmail.com", "timestamp" : ISODate("2020-09-27T04:11:00Z") }
Output I'd like:
[
{ "_id": "blah#gmail.com", "dow" [ 2, 3 ], "visits": 2 }, // Visited Tuesday and Wednesday
{ "_id": "yack#gmail.com", "dow" [ 0, 1, 2, 3, 4, 5 ], "visits": 6 } // Visited Sunday through to Friday
]
I can get each email/dow pair as a record, but I'm not sure where to go from here...
[
{
$group: {
_id: { email: "$email", dow: { $dayOfWeek: "$timestamp" } },
}
}
]
Outputs:
{ "_id" : { "email" : "blah#gmail.com", "dow" : 2 } }
{ "_id" : { "email" : "blah#gmail.com", "dow" : 3 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 0 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 1 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 2 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 3 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 4 } }
{ "_id" : { "email" : "yack#gmail.com", "dow" : 5 } }
Thanks!
You can use another $group statement:
db.collection.aggregate([
{
$group: {
_id: { email: "$email", dow: { $dayOfWeek: "$timestamp" } }
}
},
{
$group: {
_id: "$_id.email",
dow: { $push: "$_id.dow" },
visits: { $sum: 1 }
}
}
])
Mongo Playground
Business Problem:
We are seeing a single customerOrderNumber with all versions having “INACTIVE.” This is a problem for multiple reasons. My goal is to be able to pull a list of customerOrderNumbers with ONLY INACTIVE statuses.
Database and Query: XXX_ORDERMGMT_1
db.getCollection('customerOrder').aggregate( [ { $match: { 'orderDocument.accountInfo.ban': '123456' } }, { $group: { _id: { customerOrderNumber : '$orderReference.customerOrderNumber', status : '$orderReference.customerOrderStatus' }, count: { $sum: 1 } }, }] )
OUTPUT:
/* 1 / { "_id" : { "customerOrderNumber" : "123", "status" : "COMPLETED" }, "count" : 1.0 } / 2 */ { "_id" : { "customerOrderNumber" : "123", "status" : "INACTIVE" }, "count" : 2.0 }
DESIRED_OUTPUT:
/* 1 */ { "_id" : { "customerOrderNumber" : "123", "statusGroupings" : { "status" : "COMPLETED", "status_cnt" : 1.0 }, { "status" : "INACTIVE", "status_cnt" : 2.0 } }, "count" : 3.0 }
( My approach was to pull all customerOrders by status and count, parse it into a relational format, and filter by only customerOrderNumbers with all versions being INACTIVE. This may not be the best way and I’m open to thoughts.)
The following query can get us the expected output:
db.getCollection("customerOrder").aggregate([
{
$match:{
"orderDocument.accountInfo.ban":"123456"
}
},
{
$group:{
"_id":{
"customerOrderNumber":"$orderReference.customerOrderNumber",
"status":"$orderReference.customerOrderStatus"
},
"customerOrderNumber":{
$first:"$orderReference.customerOrderNumber"
},
"status":{
$first:"$orderReference.customerOrderStatus"
},
"count":{
$sum:1
}
}
},
{
$group:{
"_id":"$customerOrderNumber",
"customerOrderNumber":{
$first:"$customerOrderNumber"
},
"statusGroupings":{
$push:{
"status":"$status",
"status_cnt":"$count"
}
},
"count":{
$sum:"$count"
}
}
},
{
$project:{
"_id":0
}
}
]).pretty()
Data set:
{
"_id" : ObjectId("5d9bf7204ed5d873f39a773b"),
"orderDocument" : {
"accountInfo" : {
"ban" : "123456"
}
},
"orderReference" : {
"customerOrderNumber" : 1,
"customerOrderStatus" : "INACTIVE"
}
}
{
"_id" : ObjectId("5d9bf7204ed5d873f39a773c"),
"orderDocument" : {
"accountInfo" : {
"ban" : "123456"
}
},
"orderReference" : {
"customerOrderNumber" : 1,
"customerOrderStatus" : "COMPLETED"
}
}
{
"_id" : ObjectId("5d9bf7204ed5d873f39a773d"),
"orderDocument" : {
"accountInfo" : {
"ban" : "123456"
}
},
"orderReference" : {
"customerOrderNumber" : 1,
"customerOrderStatus" : "INACTIVE"
}
}
{
"_id" : ObjectId("5d9bf7204ed5d873f39a773e"),
"orderDocument" : {
"accountInfo" : {
"ban" : "123456"
}
},
"orderReference" : {
"customerOrderNumber" : 1,
"customerOrderStatus" : "INACTIVE"
}
}
Output:
{
"customerOrderNumber" : 1,
"statusGroupings" : [
{
"status" : "COMPLETED",
"status_cnt" : 1
},
{
"status" : "INACTIVE",
"status_cnt" : 3
}
],
"count" : 4
}
How I can get the total number of seats available for a particular movie (seats present in all the theatres for that movie) from the mongodb schema below.
I need to write a mongo query to get the results
{
"_id" : ObjectId("5d637b5ce27c7d60e5c42ae7"),
"name" : "Bangalore",
"movies" : [
{
"name" : "KGF",
"theatres" : [
{
"name" : "PVR",
"seats" : 45
},
{
"name" : "IMAX",
"seats" : 46
}
]
},
{
"name" : "Avengers",
"theatres" : [
{
"name" : "IMAX",
"seats" : 50
}
]
}
],
"_class" : "com.BMS_mongo.ZZ_BMS_mongo_demo.Entity.CityInfo"
}
I have written this code :
db.cities.aggregate( [
{ "$unwind" : "$movies" }, { "$unwind" : "$theatres" } ,
{ "$group" : { _id : "$movies.theatre`enter code here`s.seats" ,
total : { "$sum" : "$seats" } }
}
] )
My schema:
The following query can get us the expected output:
db.collection.aggregate([
{
$unwind:"$movies"
},
{
$unwind:"$movies.theatres"
},
{
$group:{
"_id":"$movies.name",
"movie":{
$first:"$movies.name"
},
"totalSeats":{
$sum:"$movies.theatres.seats"
}
}
},
{
$project:{
"_id":0
}
}
]).pretty()
Data set:
{
"_id" : ObjectId("5d637b5ce27c7d60e5c42ae7"),
"name" : "Bangalore",
"movies" : [
{
"name" : "KGF",
"theatres" : [
{
"name" : "PVR",
"seats" : 45
},
{
"name" : "IMAX",
"seats" : 46
}
]
},
{
"name" : "Avengers",
"theatres" : [
{
"name" : "IMAX",
"seats" : 50
}
]
}
],
"_class" : "com.BMS_mongo.ZZ_BMS_mongo_demo.Entity.CityInfo"
}
Output:
{ "movie" : "Avengers", "totalSeats" : 50 }
{ "movie" : "KGF", "totalSeats" : 91 }
Query:
db.movie.aggregate([{ $unwind: { path: "$movies",} },
{ $unwind: { path: "$movies.theatres",} },
{ $group: { _id: "$movies.name", "moviename": { $first: "$movies.name" },
"totalSeats": { $sum: "$movies.theatres.seats" }} }])
I got the answer using this query ...
db.cities.aggregate( [
{ "$match" : { "name" : "Bangalore" } },
{ "$unwind" : "$movies" } ,
{ "$match" : {"movies.name" : "KGF"} },
{ "$unwind" : "$theatres" },
{ "$group" : { _id : "$movies.name", total : { "$sum" : "$movies.theatres.seats"
} } }
] )
With this data:
{
"_id" : ObjectId("576948b4999274493425c08a"),
"virustotal" : {
"scan_id" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078-1465973544",
"sha1" : "fd177b8c50b457dbec7cba56aeb10e9e38ebf72f",
"resource" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078",
"response_code" : 1,
"scan_date" : "2016-06-15 06:52:24",
"results" : [
{
"sig" : "Gen:Variant.Mikey.29601",
"vendor" : "MicroWorld-eScan"
},
{
"sig" : null,
"vendor" : "nProtect"
},
{
"sig" : null,
"vendor" : "CAT-QuickHeal"
},
{
"sig" : "HEUR/QVM07.1.0000.Malware.Gen",
"vendor" : "Qihoo-360"
}
]
}
},
{
"_id" : ObjectId("5768f214999274362f714e8b"),
"virustotal" : {
"scan_id" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391-1466529838",
"sha1" : "fb865b8f0227e9097321182324c959106fcd8c27",
"resource" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391",
"response_code" : 1,
"scan_date" : "2016-06-21 17:23:58",
"results" : [
{
"sig" : null,
"vendor" : "Bkav"
},
{
"sig" : null,
"vendor" : "ahnlab"
},
{
"sig" : null,
"vendor" : "MicroWorld-eScan"
},
{
"sig" : "Mal/DrodZp-A",
"vendor" : "Qihoo-360"
}
]
}
}
I'm trying to group by and count the vendor when sig is not null in order to obtain something like:
{
"_id" : "Qihoo-360",
"count" : 2
},
{
"_id" : "MicroWorld-eScan",
"count" : 1
},
{
"_id" : "Bkav",
"count" : 0
},
{
"_id" : "CAT-QuickHeal",
"count" : 0
}
At the moment with this code:
db.analysis.aggregate([
{ $unwind: "$virustotal.results" },
{
$group : {
_id : "$virustotal.results.vendor",
count : { $sum : 1 }
}
},
{ $sort : { count : -1 } }
])
I'm getting everything:
{
"_id" : "Qihoo-360",
"count" : 2
},
{
"_id" : "MicroWorld-eScan",
"count" : 2
},
{
"_id" : "Bkav",
"count" : 1
},
{
"_id" : "CAT-QuickHeal",
"count" : 1
}
How can I count 0 if the sig is null?
You need a conditional expression in your $sum operator that will check if the "$virustotal.results.sig" key is null by using the comparison operator $gt (as specified in the documentation's BSON comparsion order)
You can restructure your pipeline by adding this expression as follows:
db.analysis.aggregate([
{ "$unwind": "$virustotal.results" },
{
"$group" : {
"_id": "$virustotal.results.vendor",
"count" : {
"$sum": {
"$cond": [
{ "$gt": [ "$virustotal.results.sig", null ] },
1, 0
]
}
}
}
},
{ "$sort" : { "count" : -1 } }
])
Sample Output
/* 1 */
{
"_id" : "Qihoo-360",
"count" : 2
}
/* 2 */
{
"_id" : "MicroWorld-eScan",
"count" : 1
}
/* 3 */
{
"_id" : "Bkav",
"count" : 0
}
/* 4 */
{
"_id" : "CAT-QuickHeal",
"count" : 0
}
/* 5 */
{
"_id" : "nProtect",
"count" : 0
}
/* 6 */
{
"_id" : "ahnlab",
"count" : 0
}
I changed the null with None and the numbers increased but seems not correct yet.
Basically doing the query in mongoshell I get like
{
"_id" : "Kaspersky",
"count" : 176.0
}
from python:
Kaspersky 64
one of these 2 is wrong :)
So I'm trying to investigate what part of the query in python is not correctly written compared to the mongo shell one.
I did a simple query:
In mongoshell:
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "null"} } }})
results: 176
db.analysis.count( { "virustotal.results" : { $elemMatch : { "vendor": "Kaspersky", "sig": {$gt: null} } }})
results: 0
Then I tried in python:
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "null"} } }})
results: 568
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "None"} } }})
results: 568
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "None"} } }})
results: 64
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "null"} } }})
results: 6
hard to says what is the correct value! I suppose 176 but not able to reproduce in python...
I am doing this query
db.analytics.aggregate([
{
$match: {"event":"USER_SENTIMENT"}
},
{ $group: {
_id: {brand:"$data.brandId",sentiment:"$data.sentiment"},
count: {$sum : 1}
}
},
{ $group: {
_id: "$_id.brand",
sentiments: {$addToSet : {sentiment:"$_id.sentiment", count:"$count"}}
}
}
])
Which generates that :
{
"result" : [
{
"_id" : 57,
"sentiments" : [
{
"sentiment" : "Meh",
"count" : 4
}
]
},
{
"_id" : 376,
"sentiments" : [
{
"sentiment" : "Meh",
"count" : 1
},
{
"sentiment" : "Happy",
"count" : 1
},
{
"sentiment" : "Confused",
"count" : 1
}
]
}
],
"ok" : 1
}
But What I want is that :
[
{
"_id" : 57,
"Meh" : 4
},
{
"_id" : 376,
"Meh" : 1,
"Happy" : 1,
"Confused" : 1
}
]
Any idea on how to transform that? The blocking point for me is to transform a value into a key.