How to aggregate queries in mongodb - mongodb

I have a document collection that look like the following:
{
name : "tester"
, activity: [
{
gear: "glasses"
where: "outside"
}
, {
gear: "hat"
, where: "inside"
}
, {
gear: "glasses"
, where: "car"
}
]
}
How do I query the collection to return only documents with multiple activities that contain the value of "gear":"glasses"?
Thanks!

I think it's possible to do without aggregation framework, if you need full document filtered by your condition:
db.collection.find({
"activity": {$elemMatch: {gear:"glasses"}},
"activity.1" : {$exists: 1}
})

This is going to be ugly with aggregation framework, but it can be done:
db.collection.aggregate(
{$match: {"activity.gear": "glasses"}},
{$unwind: "$activity"},
{$group: {
_id: {_id: "$_id", name: "$name"},
_count: {$sum: {$cond: [{$eq: ["glasses", "$activity.gear"]}, 1, 0]}}
}},
{$match: {_count: {$gt: 1}}}
)
When analyzing the above query, I would recommend walking through step. Start with just the "$match", the the "$match" and "$unwind". And so one. You will see how each step works.
The response is not the full document. If you are looking for the full document, include a $project step that passes through a dummy activity, and reconstruct the full document on the output.

You can also try this:
db.collection.find( { activity: { $elemMatch: { gear: "glasses" } } )

Related

Sort and assign the order to query in mongodb

I'd like to sort a collection, then add a virtual property to the result which is their numerical order in which the results where displayed as.
So for example, we have a collection called calls, and we'd like to ascertain the current call queue priority as a number so it can be synced to our CRM via reverse ETL.
We have to do this inside of the query itself because we don't have an intermediary step where we can introduce any logic to determine this logic.
So my current query is
db.getCollection('callqueues').aggregate([
{
$match: {
'invalidated': false,
'assigned_agent': null
}
},
{ $sort: {
score: -1, _id: -1
} },
{
$addFields: {
order: "<NEW ORDER PROPERTY HERE>",
}
},
])
So I was wondering how would I insert as a virtual property their order, where the first element after the sort should be 1, second 2, etc
One option (since mongoDB version 5.0) is to use $setWindowFields for this:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$setWindowFields: {
sortBy: {score: -1, _id: -1},
output: {
order: {
$sum: 1,
window: {documents: ["unbounded", "current"]}
}
}
}}
])
See how it works on the playground example
EDIT: If your mongoDB version is earlier than 5.0, you can use a less efficient query, involving $group and $unwind:
db.collection.aggregate([
{$match: {invalidated: false, assigned_agent: null}},
{$sort: {score: -1, _id: -1}},
{$group: {_id: 0, data: {$push: "$$ROOT"}}},
{$unwind: {path: "$data", includeArrayIndex: "order"}},
{$replaceRoot: {newRoot: {$mergeObjects: ["$data", {order: {$add: ["$order", 1]}}]}}}
])
See how it works on the playground example < 5.0

find missing elements from the passed array to mongodb qyery

for example
animals = ['cat','mat','rat'];
collection contains only 'cat' and 'mat'
I want the query to return 'rat' which is not there in collection..
collection contains
[
{
_id:objectid,
animal:'cat'
},
{
_id:objectid,
animal:'mat'
}
]
db.collection.find({'animal':{$nin:animals}})
(or)
db.collection.find({'animal':{$nin:['cat','mat','rat']}})
EDIT:
One option is:
Use $facet to $group all existing values to a set. using $facet allows to continue even if the db is empty, as #leoll2 mentioned.
$project with $cond to handle both cases: with or without data.
Find the set difference
db.collection.aggregate([
{$facet: {data: [{$group: {_id: 0, animals: {$addToSet: "$animal"}}}]}},
{$project: {
data: {
$cond: [{$gt: [{$size: "$data"}, 0]}, {$first: "$data"}, {animals: []}]
}
}},
{$project: {data: "$data.animals"}},
{$project: {_id: 0, missing: {$setDifference: [animals, "$data"]}}}
])
See how it works on the playground example - with data or playground example - without data

How to iterate through a set to get field value in MongoDB

Can somebody tell me please if is possible to iterate through a set to create a field value for key in mongodb result. If I have $facet state in pipeline like:
'missing': [{'$group': {'_id': '$foo', 'woo': {'$addToSet': '$wwo'}}},
{'$project': {'missing_woo': {'$setDifference': [woo_set, '$woo']}}
I would like to get result where code value will be the key like
{'missing_woo': 'missing_woo1'}, {'missing_woo': 'missing_woo2'},... {'missing_woo': 'missing_wooN'}
so that I can iterate through the set generated at $project and to create field values
You can simply use $unwind:
db.collection.aggregate([
{
$facet: {
missing: [
{$group: {_id: "$foo", woo: {$addToSet: "$wwo"}}},
{$project: {_id: 0, missing_woo:
{$setDifference: [
[
"woo1",
"woo2",
"wooN",
"missing_woo1",
"missing_woo2",
"missing_wooN"
],
"$woo"
]
}
}
},
{$unwind: "$missing_woo"}
]
}
}
])
See how it works on the playground example

mongo $project not projecting original values

I am new to Mongodb, and NoSQL in general and I am trying to use mongodbs aggregate function to aggregate data from one collection to be inserted into another. An example of the original collection would be this:
Original Collection
{
supplier: 'aldi',
timestamp: '1492807458',
user: 'eddardstark#gmail.com',
hasBeenAggregated:false,
items:[{
name: 'butter',
supplier: 'aldi',
expiry: '1492807458',
amount: 454,
measureSymbol: 'g',
cost: 2.19
},{
name: 'milk',
supplier: 'aldi',
expiry: '1492807458',
amount: 2000,
measureSymbol: 'ml',
cost: 1.49
}]
}
An example of the output I am trying to achieve would be:
New Collection
{
user:'eddardstark#gmail.com',
amount: 3.68,
isIncome: false,
title: 'food_shopping',
timestamp: '1492807458'
}
The aggregation function that I am using is:
Aggregation
var result = db.runCommand({
aggregate: 'food_transactions',
pipeline: [
{$match: {hasBeenAggregated: false}},
{$unwind: '$items'},
{$group:{_id: '$_id',amount:{$sum: '$items.cost'}}},
{$project: {
_id:0,
user:1,
amount:1,
isIncome: {$literal: false},
title:{$literal: 'food_shopping'},
timestamp:1
}}
]
});
printjson(result)
This aggregation function does not return the user or timestamp fields. Instead, I get the following output:
Output
{
"amount" : 3.6799999999999997,
"isIncome" : false,
"title" : "food_shopping"
}
If I don't group the results and perform the calculations in the $project stage, the fields are all projected correctly, but obviously, there is a new document created for each sub-document in the items array and that rather defeats the purpose of the aggregation.
What am I doing wrong?
Update your $group pipeline to include all the fields you wish to project further down the pipeline.
To include user field you can use $first
{$group:{_id: '$_id', user:{$first:'$user`}, amount:{$sum: '$items.cost'}}},
Additionally, if you are 3.4 version you can simplify your aggregation to below.
Use $reduce to sum all the item's cost in a single document. For all documents you can add $group after $reduce.
db.collection.aggregate([
{$match: {hasBeenAggregated: false}},
{$project: {
_id:0,
user:1,
amount: {
$reduce: {
input: "$items",
initialValue: 0,
in: { $add : ["$$value", "$$this.cost"] }
}
},
isIncome: {$literal: false},
title:{$literal: 'food_shopping'},
timestamp:1
}}
])

mongodb find matches based on count aggregation

I have a mongodb collection like this:
{"uid": "01370mask4",
"title": "hidden",
"post: "hidden",
"postTime": "01-23, 2016",
"unixPostTime": "1453538601",
"upvote": [2, 3]}
and I'd like to select post records from the users with more than 5 posts. The stucture should be the same, I just don't need the documents from users who don't have many posts.
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } }
]
)
Now I'm stuck at how to use the count values to find. I searched but didn't find any methods to add the count values back to the same collection by uid. Saving the aggregation output and joining them together seems not supported by mongodb. Please advise, thanks!
Update:
Sorry that I didn't make it clear earlier. Thanks for your prompt answers! I want a subset of the original collection, with post text, post timestamp, etc. I don't want a subset of the aggregation output.
If there aren't millions of documents, then you can try a shortcut way to achieve what you are trying using one aggregate and another find query,
Aggregate query:
var users = db.collection.aggregate(
[
{$group:{_id:'$uid', count:{$sum:1}}},
{$match:{count:{$gt:5}}},
{$group:{_id:null,users:{$push:'$_id'}}}
]
).toArray()[0]['users']
Then it's a straight ahead query to find the particular users:
db.collection.find({uid: {$in: users}})
Just add the $match after your group with the correct query and it works :
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } },
{ $match : { count : { $gt : 5 } }
]
)
Please try this one to select users with more than 5 posts. To keep the original fields through using $first, if the $uid is unique, please add the field as below.
db.collection.aggregate([
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}])
)
If there are multiple value for the same $uid, you should use $push to an array in the $group.
If you want to save the result to db, please try it as below
var cur = db.collection.aggregate(
[
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}
]
)
cur.forEach(function(doc) {
db.collectioin.update({_id: doc._id}, {/*the field should be updated */});
});