This seems like an easy question, but I can't seem to figure it out after trying for a substantial amount of time.
I have a mongodb collection that has the schema {user, documentID, rating}. Ratings are on a scale of 1-5, so the collection might look something like:
userA, documentA, 5
userA, documentB, 5
userB, documentA, 1
userC, documentB, 2
(and so on...)
Is there a way I can directly find the count of each rating on a single document with a single query? The desired output is something like:
documentA:{
"1": 23,
"2": 24,
"3": 131,
"4": 242,
"5": 500
}
I've read about how to use aggregate to group fields but I'm not sure how it can be used to return the count of distinct values (ie 1-5).
Will really appreciate any help provided!
you can achive this using aggregation
the query would look like
db.collection.aggregate([
{ $group:
{ _id: { document: "$document", rating: "$rating"},
sum: {$sum: 1}
}
}
])
the output would be like
{_id: {"document": "documentA", "rating": 1}, "sum": 1}
{_id: {"document": "documentA", "rating": 5}, "sum": 1}
{_id: {"document": "documentB", "rating": 2}, "sum": 1}
{_id: {"document": "documentB", "rating": 5}, "sum": 1}
Related
I need to create an aggregation pipeline that return price ranges for each product category.
What I need to avoid is to load all available categories and call the Database again, one by one with a $match on each category. There must be a better way to do it.
Product documents
{
Price: 500,
Category: 'A'
},
{
Price: 7500,
Category: 'A'
},
{
Price: 340,
Category: 'B'
},
{
Price: 60,
Category: 'B'
}
Now I could use a $group stage to group the prices into an array by their category.
{
_id: "$Category",
Prices: {
$addToSet: "$Price"
}
}
Which would result in
{
_id: 'A',
Prices: [500, 7500]
},
{
_id: 'B',
Prices: [340, 60]
}
But If I use $bucketAuto stage after this, I am unable to groupBy multiple properties. Meaning it would not take the categories into account.
I have tried the following
{
groupBy: "$Prices",
buckets: 5,
output: {
Count: { $sum: 1}
}
}
This does not take categories into account, but I need the generated buckets to be organised by category. Either having the category field within the _id as well or have it as another field and have 5 buckets for each distinct category:
{
_id: {min: 500, max: 7500, category: 'A'},
Count: 2
},
{
_id: {min: 60, max: 340, category: 'B'},
Count: 2
}...
Query1
if you want to group by category and find the max and min price for that category you can do it like this
Playmongo
aggregate(
[{"$group":
{"_id": "$Category",
"min-price": {"$min": "$Price"},
"max-price": {"$max": "$Price"}}}])
Query2
if you want to group by category and then apply the bucket inside the array of the prices, to create like 5 buckets like in your example
you can do it with a trick, that allows us to use stage operators to do operators inside the array
the trick is to have 1 extra collection with only 1 document [{}]
you do lookup, you unwind that array, you do what you want on it
here we unwind the array and do $bucketAuto on it, with 5 buckets, like in your example, this way we can have group by category, and the prices in 5 ranges (5 buckets)
Playmongo
aggregate(
[{"$group": {"_id": "$Category", "prices": {"$push": "$Price"}}},
{"$lookup":
{"from": "coll_with_1_empty_doc",
"pipeline":
[{"$set": {"prices": "$$prices"}}, {"$unwind": "$prices"},
{"$bucketAuto": {"groupBy": "$prices", "buckets": 5}}],
"as": "bucket-prices",
"let": {"prices": "$prices", "category": "$_id"}}}])
If none of the above works, if you can give sample documents and example output
This question already has answers here:
Fill missing dates in records
(5 answers)
Closed 2 years ago.
I'm trying to get a small table from mongodb aggregate, for example, number of fatal accidents by years.
I want to get all the years , even if sum is null (zero).
MongoDB query:
[
{"$match": {"city": "myCity"}},
{"$group": {
"_id": "$accident_year",
"count": {"$sum": 1}}
},
{"$sort": {"_id": 1}}
]
actual result:
[
{"_id": "2015", "count": 2},
{"_id": "2017", "count": 4},
{"_id": "2018", "count": 6},
{"_id": "2019", "count": 2}
]
desired result:
[
{"_id": "2015", "count": 2},
{"_id": "2016", "count": 0},
{"_id": "2017", "count": 4},
{"_id": "2018", "count": 6},
{"_id": "2019", "count": 2}
]
Thank you
I don't think it's possible to get an output of 0 for years that are not listed in the collection the way you have the aggregation pipeline created now. I don't see a way for the $group stage to know what values don't exist.
Since you are sorting the results, your application code that receives the output could check if years are listed and manipulate the aggregation pipeline results to include missing years. This is probably the best option.
Here is a hack to get around this if you really want the results in your aggregation pipeline:
Create a document for every year in your collection. Inside of those documents add a field called something like isAccident and set it to 0. For example:
{
{"_id":{"$oid":"5e662fca1c9d440000c1aa71"},
"accident_year":"2018",
"isAccident":"0"
}
Then you can update your pipeline to have a $project stage before the $group stage that adds the isAccident field to all of the documents that don't have the isAccident field and assigns them a value of 1. In your $project stage you can sum on the $isAccident field.
[{$project: {
_id: 1,
accident_year: 1,
isAccident: { $ifNull: ["$isAccident", 1] }
}}, {$group: {
_id: "$accident_year",
count: {
$sum: "$isAccident"
}
}}]
This will give you the results you're expecting. Beware that if others after you come to group and sum the accidents in this collection and don't realize you've created this extra documents for the years, their calculations will be off by one.
I have this data in a collection:
{id:1, types:{'A':4, 'B': 3, 'C':12}}
{id:1, types:{'A':8, 'B': 2, 'C':11}}
{id:2, types:{'A':7, 'B': 6, 'C':14}}
{id:3, types:{'A':1, 'B': 9, 'C':15}}
I want to query for the total of each type for id:1 but I also want to know the totals for each type for all ids in a single query. I would like the output to look something like this:
{id:1, types:{'A':12, 'B':5, 'C':12, 'sumA':20,'sumB':20,'sumC':52}}
I can do this by calling 2 separate queries. One query containing
{$match: {id:1}}
And one that does not have a $match option. But I would like to know if it can be done in a single query.
Edit: types A,B and C are dynamic so I won't know the values beforehand.
Thanks!
You can use below aggregation query.
$group aggregation with $sum to calculate total count and $cond to limit the count for specific id.
db.col.aggregate([
{"$group":{
"_id":null,
"sumA":{"$sum":"$types.A"},
"sumB":{"$sum":"$types.B"},
"sumC":{"$sum":"$types.C"},
"A":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.A",0]}},
"B":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.B",0]}},
"C":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.C",0]}},
}}
])
Update to below structure
{id:1, types:[{"k":'A', v:4}, { "k":'B', "v": 3}, { "k":'C', "v": 12}]}
{id:1, types:[{"k":'A', v:8}, { "k":'B', "v": 2}, { "k":'C', "v": 11}]}
{id:2, types:[{"k":'A', v:7}, { "k":'B', "v": 6}, { "k":'C', "v": 14}]}
{id:3, types:[{"k":'A', v:1}, { "k":'B', "v": 9}, { "k":'C', "v": 15}]}
Aggregation query:
db.col.aggregate([
{"$unwind":"$types"},
{"$group":{
"_id":"$types.k",
"sum":{"$sum":"$types.v"},
"type":{"$sum":{"$cond":[{"$eq":["$id",1]},"$types.v",0]}}
}}
])
I've built a relations graph in a MongoDB collection, for example:
{ "user_id": 1, "follower_id": 2 }
{ "user_id": 1, "follower_id": 3 }
{ "user_id": 2, "follower_id": 1 }
{ "user_id": 2, "follower_id": 3 }
{ "user_id": 3, "follower_id": 4 }
{ "user_id": 5, "follower_id": 2 }
This represents a directed graph like this:
Is there an efficient way to remove "leafs" from the graph? In the example I'd like to remove node 4 from the graph, because that node only has one link with node 3 and remove node 5 because only node 2 links to it.
Or to say it with graph terminology: only keep vertices with indegree > 1 or outdegree > 1
Short answer would be no - there is no efficient way to do what you want with schema like this. It can be by iterating over all nodes, for example using aggregation framework, and removing nodes as separate operation but I think it is all what can be done. Assuming nodes are in graph collection it could be something like below but it is far from effective:
db.graph.aggregate(
{$project: {index: {$const: [0, 1]}, user_id: 1, follower_id: 1}},
{$unwind: "$index"},
{$project: {id: {$cond: [{$eq: ["$index", 0 ]}, "$user_id", "$follower_id"]} }},
{$group: {_id: "$id", count: {$sum: 1}}},
{$match: {count: {$lte: 1}}}
).result.forEach(function(node) { db.graph.remove({user_id: node._id});})
You could use more document-like schema if you want operations like this to be efficient.
{
user_id: 1,
follows: [2, 3],
followed_by: [2]
}
I have an array stored in each document/record in a mongo database and I need to compute a score for each element in this array and aggregate the scores by another field in the array element.
It's hard for me to explain what I am trying to do in english so here is a python example of what I am looking to do.
records = [
{"state": "a", "initvalue": 1, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 4}]},
{"state": "a", "initvalue": 5, "data": [{"time": 1, "value": 7}, {"time": 2, "value": 9}]},
{"state": "b", "initvalue": 4, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 1}]},
{"state": "b", "initvalue": 5, "data": [{"time": 1, "value": 3}, {"time": 2, "value": 2}]}
]
def sign(record):
return 1 if record["state"] == "a" else -1
def score(record):
return [{"time": element["time"], "score": sign(record) * (element["value"] - record["initvalue"])} for element in record["data"]]
scores = []
for record in records:
scores += score(record)
sums = {}
for score in scores:
if score["time"] not in sums:
sums[score["time"]] = 0
sums[score["time"]] += score["score"]
print '{:>4} {:>5}'.format('time', 'score')
for time, value in sums.iteritems():
print '{:>4} {:>5}'.format(time, value)
This computes a slightly different score function for state a and for state b and then aggregates the scores across each time entry.
Here is the result
time score
1 7
2 13
I am trying to figure out how to do this in mongo, without pulling the records into python and reinventing aggregation.
Thanks for the help!
Ok. I figured this out. Once I really understood how pipeline's work and about the condition function everything came together.
from pymongo import MongoClient
client = MongoClient()
result = client.mydb.foo.aggregate([
{'$project': {'_id': 0, 'data': 1, 'initvalue': 1, 'state': 1}},
{'$unwind': '$data'},
{'$project': {
'time': '$data.time',
'score': {'$multiply': [
{'$cond': [{'$eq': ['$state', 'a']}, 1, -1]},
{'$subtract': ['$data.value', '$initvalue']}
]}
}},
{'$group': {
'_id': '$time',
'score': {'$sum': '$score'}
}},
{'$project': {'_id': 0, 'time': '$_id', 'score': 1}}
])
for record in result['result']:
print record
This yields the desired result
{u'score': 13, u'time': 2}
{u'score': 7, u'time': 1}