Record Aggregation in Mongo across an Array - mongodb

I have an array stored in each document/record in a mongo database and I need to compute a score for each element in this array and aggregate the scores by another field in the array element.
It's hard for me to explain what I am trying to do in english so here is a python example of what I am looking to do.
records = [
{"state": "a", "initvalue": 1, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 4}]},
{"state": "a", "initvalue": 5, "data": [{"time": 1, "value": 7}, {"time": 2, "value": 9}]},
{"state": "b", "initvalue": 4, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 1}]},
{"state": "b", "initvalue": 5, "data": [{"time": 1, "value": 3}, {"time": 2, "value": 2}]}
]
def sign(record):
return 1 if record["state"] == "a" else -1
def score(record):
return [{"time": element["time"], "score": sign(record) * (element["value"] - record["initvalue"])} for element in record["data"]]
scores = []
for record in records:
scores += score(record)
sums = {}
for score in scores:
if score["time"] not in sums:
sums[score["time"]] = 0
sums[score["time"]] += score["score"]
print '{:>4} {:>5}'.format('time', 'score')
for time, value in sums.iteritems():
print '{:>4} {:>5}'.format(time, value)
This computes a slightly different score function for state a and for state b and then aggregates the scores across each time entry.
Here is the result
time score
1 7
2 13
I am trying to figure out how to do this in mongo, without pulling the records into python and reinventing aggregation.
Thanks for the help!

Ok. I figured this out. Once I really understood how pipeline's work and about the condition function everything came together.
from pymongo import MongoClient
client = MongoClient()
result = client.mydb.foo.aggregate([
{'$project': {'_id': 0, 'data': 1, 'initvalue': 1, 'state': 1}},
{'$unwind': '$data'},
{'$project': {
'time': '$data.time',
'score': {'$multiply': [
{'$cond': [{'$eq': ['$state', 'a']}, 1, -1]},
{'$subtract': ['$data.value', '$initvalue']}
]}
}},
{'$group': {
'_id': '$time',
'score': {'$sum': '$score'}
}},
{'$project': {'_id': 0, 'time': '$_id', 'score': 1}}
])
for record in result['result']:
print record
This yields the desired result
{u'score': 13, u'time': 2}
{u'score': 7, u'time': 1}

Related

MongoDB: Upsert with array filter

I have collection like this:
mc.db.collection.insert_many([
{"key_array": [1], "another_array": ["a"]},
{"key_array": [2, 3], "another_array": ["b"]},
{"key_array": [4], "another_array": ["c", "d"]},
])
And I'm using this kind of updates:
mc.db.collection.update_one(
{"key_array": 5},
{"$addToSet": {"another_array": "f"}},
upsert=True
)
It works good with updates, but I have trouble when trying to upsert:
It creates a document with a non-array key_array field, like this
{
"_id": ObjectId(...)
"key_array": 5,
"another_array": ["f"]
}
while I want to have this one
{
"_id": ObjectId(...)
"key_array": [5],
"another_array": ["f"]
}
Also, I cannot use the {"key_array": [5]} style query, because it won't match the existing array with length > 1.
So, is there any chance to save such behavior on updates, and receive the correct document structure on inserts?
Any help will be appreciated
This should help.
https://www.mongodb.com/docs/manual/reference/operator/update/setOnInsert/
mc.db.collection.update_one(
{"key_array": 5},
{
"$addToSet": {"another_array": "f"},
"$setOnInsert": {"key_array": [5], ...}
},
upsert=True
)
how about this one.
db.collection.update({
"key_array": 5
},
{
"$addToSet": {
"another_array": "f",
},
"$set": {
"key_array": [
5
]
}
},
{
"upsert": true
})
https://mongoplayground.net/p/4YdhKuzr2I6
Ok, finally I had solved this issue with two consecutive updates, the first as specified in the question - upserts with non-array query field, and the second which converts the field to an array if it belongs to another type.
from pymongo import UpdateOne
bulk = [
UpdateOne(
{"key_array": 6},
{"$addToSet": {"another_array": "e"}},
upsert=True
),
UpdateOne(
{
"$and": [{"key_array": 6},
{"key_array": {"$not": {"$type": "array"}}}]
},
[{"$set": { "key_array": ["$key_array"]}}]
)
]
mc.db.collection.bulk_write(bulk)
But I'm still looking for a more elegant way to do such a thing.

Searching mongo for array of objects

Let's say I have the following data
{ "value": "apples", "category": 0 }
{ "value": "bananas", "category": 1 }
{ "value": "apples", "category": 2 }
{ "value": "avocados", "category": 2 }
I want to search the database for an array of objects.
If I didn't care about category it would be
.find({'value': {$in:["apples,bananas"]}})
How can I add the category field? I want for example all the apples with category: 0 and bananas with category: 1.
I don't want to 'for' loop and find them one by one.
If you only want to retrieve specific combinations you could use the $or operator:
.find({
$or: [
{'value': 'apples', 'category': 0},
{'value': 'bananas', 'category': 1}
]
})
First of all it's {$in:["apples","bananas"]} - an array of strings.
To match several fields you can use $or condition, if I understand the question:
.find({$or: [
{"value": "apples", "category": 0},
{"value": "bananas", "category": 1}
]})

Get the elements that satisfy the criteria from the array in the document from MongoDB collection?

If I have some documents in a collection like these:
db.test.find()
[{
author_id: 1
reviews: [{article_id: 1, score: 10, ...},
{article_id: 2, score: 7, ...},
{article_id: 3, score_9, ...}
...
]
},
{
author_id: 2
reviews: [{article_id: 2, score: 8, ...},
{article_id:4, score: 3, ...}
...
]
},
...
]
How can I get a list which contains all reviews satisfy the criteria, eg. article_id equels to 2, just like:
[{article_id: 2, author_id: 1, score: 7},
{article_id: 2, author_id: 2, score: 8}
...
]
I'm newer in MongoDB ^_^.
You can use below aggregation in mongo version 3.6.
$filter with $arrayElemAt to output the matching review followed by $mergeObjects to combine the matching review doc with author_id.
$replaceRoot to promote the merged document to top level.
db.col.aggregate({
"$replaceRoot":{
"newRoot":{
"$mergeObjects":[
{"author_id":"$author_id"},
{"$arrayElemAt":[
{"$filter":{
"input":"$reviews",
"as":"rv",
"cond":{"$eq":["$$rv.article_id",2]}
}},
0]}
]
}
}
})

Mongodb aggregating likert scale

This seems like an easy question, but I can't seem to figure it out after trying for a substantial amount of time.
I have a mongodb collection that has the schema {user, documentID, rating}. Ratings are on a scale of 1-5, so the collection might look something like:
userA, documentA, 5
userA, documentB, 5
userB, documentA, 1
userC, documentB, 2
(and so on...)
Is there a way I can directly find the count of each rating on a single document with a single query? The desired output is something like:
documentA:{
"1": 23,
"2": 24,
"3": 131,
"4": 242,
"5": 500
}
I've read about how to use aggregate to group fields but I'm not sure how it can be used to return the count of distinct values (ie 1-5).
Will really appreciate any help provided!
you can achive this using aggregation
the query would look like
db.collection.aggregate([
{ $group:
{ _id: { document: "$document", rating: "$rating"},
sum: {$sum: 1}
}
}
])
the output would be like
{_id: {"document": "documentA", "rating": 1}, "sum": 1}
{_id: {"document": "documentA", "rating": 5}, "sum": 1}
{_id: {"document": "documentB", "rating": 2}, "sum": 1}
{_id: {"document": "documentB", "rating": 5}, "sum": 1}

pymongo "match" doesn't filter out the correct date

I wrote a query that returns data that is within certain date (say 3/14 in this case), but the result returns something up to 3/29 (and nothing in 3/14)
my match is {'$lte': datetime.datetime(2016, 3, 14, 23, 59, 59, 999, tzinfo=tzutc()), '$gt': datetime.datetime(2016, 3, 14, 0, 0, tzinfo=tzutc())} which only gets data within the date, and my query command is
{'$match': match},
{'$unwind': '$'+needed_field},
{'$group': {
"_id":{
"date":{
"$concat": [
{"$substr": [{"$year": "$time"}, 0, 4 ]},
"-",
{"$substr": [{"$month": "$time"}, 0, 2 ]},
"-",
{"$substr": [{"$dayOfMonth": "$time"}, 0, 2 ]},
]
},
"state":"$needed_field.state"
},
"count":{"$sum": 1}}
}]
(a little messy, sorry)
This query returns me something up to 3/29 for some reason. Am I not setting my match correctly?
Turns out I was querying a different $time field (there's another time within needed_field, which is the one being used by $match)