Record Aggregation in Mongo across an Array

Record Aggregation in Mongo across an Array - mongodb

I have an array stored in each document/record in a mongo database and I need to compute a score for each element in this array and aggregate the scores by another field in the array element.
It's hard for me to explain what I am trying to do in english so here is a python example of what I am looking to do.
records = [
{"state": "a", "initvalue": 1, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 4}]},
{"state": "a", "initvalue": 5, "data": [{"time": 1, "value": 7}, {"time": 2, "value": 9}]},
{"state": "b", "initvalue": 4, "data": [{"time": 1, "value": 2}, {"time": 2, "value": 1}]},
{"state": "b", "initvalue": 5, "data": [{"time": 1, "value": 3}, {"time": 2, "value": 2}]}
]
def sign(record):
return 1 if record["state"] == "a" else -1
def score(record):
return [{"time": element["time"], "score": sign(record) * (element["value"] - record["initvalue"])} for element in record["data"]]
scores = []
for record in records:
scores += score(record)
sums = {}
for score in scores:
if score["time"] not in sums:
sums[score["time"]] = 0
sums[score["time"]] += score["score"]
print '{:>4} {:>5}'.format('time', 'score')
for time, value in sums.iteritems():
print '{:>4} {:>5}'.format(time, value)
This computes a slightly different score function for state a and for state b and then aggregates the scores across each time entry.
Here is the result
time score
1 7
2 13
I am trying to figure out how to do this in mongo, without pulling the records into python and reinventing aggregation.
Thanks for the help!

Ok. I figured this out. Once I really understood how pipeline's work and about the condition function everything came together.
from pymongo import MongoClient
client = MongoClient()
result = client.mydb.foo.aggregate([
{'$project': {'_id': 0, 'data': 1, 'initvalue': 1, 'state': 1}},
{'$unwind': '$data'},
{'$project': {
'time': '$data.time',
'score': {'$multiply': [
{'$cond': [{'$eq': ['$state', 'a']}, 1, -1]},
{'$subtract': ['$data.value', '$initvalue']}
]}
}},
{'$group': {
'_id': '$time',
'score': {'$sum': '$score'}
}},
{'$project': {'_id': 0, 'time': '$_id', 'score': 1}}
])
for record in result['result']:
print record
This yields the desired result
{u'score': 13, u'time': 2}
{u'score': 7, u'time': 1}

Related

MongoDB: Upsert with array filter

I have collection like this:
mc.db.collection.insert_many([
{"key_array": [1], "another_array": ["a"]},
{"key_array": [2, 3], "another_array": ["b"]},
{"key_array": [4], "another_array": ["c", "d"]},
])
And I'm using this kind of updates:
mc.db.collection.update_one(
{"key_array": 5},
{"$addToSet": {"another_array": "f"}},
upsert=True
)
It works good with updates, but I have trouble when trying to upsert:
It creates a document with a non-array key_array field, like this
{
"_id": ObjectId(...)
"key_array": 5,
"another_array": ["f"]
}
while I want to have this one
{
"_id": ObjectId(...)
"key_array": [5],
"another_array": ["f"]
}
Also, I cannot use the {"key_array": [5]} style query, because it won't match the existing array with length > 1.
So, is there any chance to save such behavior on updates, and receive the correct document structure on inserts?
Any help will be appreciated

This should help.
https://www.mongodb.com/docs/manual/reference/operator/update/setOnInsert/
mc.db.collection.update_one(
{"key_array": 5},
{
"$addToSet": {"another_array": "f"},
"$setOnInsert": {"key_array": [5], ...}
},
upsert=True
)

how about this one.
db.collection.update({
"key_array": 5
},
{
"$addToSet": {
"another_array": "f",
},
"$set": {
"key_array": [
5
]
}
},
{
"upsert": true
})
https://mongoplayground.net/p/4YdhKuzr2I6

Ok, finally I had solved this issue with two consecutive updates, the first as specified in the question - upserts with non-array query field, and the second which converts the field to an array if it belongs to another type.
from pymongo import UpdateOne
bulk = [
UpdateOne(
{"key_array": 6},
{"$addToSet": {"another_array": "e"}},
upsert=True
),
UpdateOne(
{
"$and": [{"key_array": 6},
{"key_array": {"$not": {"$type": "array"}}}]
},
[{"$set": { "key_array": ["$key_array"]}}]
)
]
mc.db.collection.bulk_write(bulk)
But I'm still looking for a more elegant way to do such a thing.

Searching mongo for array of objects

Let's say I have the following data
{ "value": "apples", "category": 0 }
{ "value": "bananas", "category": 1 }
{ "value": "apples", "category": 2 }
{ "value": "avocados", "category": 2 }
I want to search the database for an array of objects.
If I didn't care about category it would be
.find({'value': {$in:["apples,bananas"]}})
How can I add the category field? I want for example all the apples with category: 0 and bananas with category: 1.
I don't want to 'for' loop and find them one by one.

If you only want to retrieve specific combinations you could use the $or operator:
.find({
$or: [
{'value': 'apples', 'category': 0},
{'value': 'bananas', 'category': 1}
]
})

First of all it's {$in:["apples","bananas"]} - an array of strings.
To match several fields you can use $or condition, if I understand the question:
.find({$or: [
{"value": "apples", "category": 0},
{"value": "bananas", "category": 1}
]})

Get the elements that satisfy the criteria from the array in the document from MongoDB collection?

If I have some documents in a collection like these:
db.test.find()
[{
author_id: 1
reviews: [{article_id: 1, score: 10, ...},
{article_id: 2, score: 7, ...},
{article_id: 3, score_9, ...}
...
]
},
{
author_id: 2
reviews: [{article_id: 2, score: 8, ...},
{article_id:4, score: 3, ...}
...
]
},
...
]
How can I get a list which contains all reviews satisfy the criteria, eg. article_id equels to 2, just like:
[{article_id: 2, author_id: 1, score: 7},
{article_id: 2, author_id: 2, score: 8}
...
]
I'm newer in MongoDB ^_^.

You can use below aggregation in mongo version 3.6.
$filter with $arrayElemAt to output the matching review followed by $mergeObjects to combine the matching review doc with author_id.
$replaceRoot to promote the merged document to top level.
db.col.aggregate({
"$replaceRoot":{
"newRoot":{
"$mergeObjects":[
{"author_id":"$author_id"},
{"$arrayElemAt":[
{"$filter":{
"input":"$reviews",
"as":"rv",
"cond":{"$eq":["$$rv.article_id",2]}
}},
0]}
]
}
}
})

Mongodb aggregating likert scale

This seems like an easy question, but I can't seem to figure it out after trying for a substantial amount of time.
I have a mongodb collection that has the schema {user, documentID, rating}. Ratings are on a scale of 1-5, so the collection might look something like:
userA, documentA, 5
userA, documentB, 5
userB, documentA, 1
userC, documentB, 2
(and so on...)
Is there a way I can directly find the count of each rating on a single document with a single query? The desired output is something like:
documentA:{
"1": 23,
"2": 24,
"3": 131,
"4": 242,
"5": 500
}
I've read about how to use aggregate to group fields but I'm not sure how it can be used to return the count of distinct values (ie 1-5).
Will really appreciate any help provided!

you can achive this using aggregation
the query would look like
db.collection.aggregate([
{ $group:
{ _id: { document: "$document", rating: "$rating"},
sum: {$sum: 1}
}
}
])
the output would be like
{_id: {"document": "documentA", "rating": 1}, "sum": 1}
{_id: {"document": "documentA", "rating": 5}, "sum": 1}
{_id: {"document": "documentB", "rating": 2}, "sum": 1}
{_id: {"document": "documentB", "rating": 5}, "sum": 1}

pymongo "match" doesn't filter out the correct date

I wrote a query that returns data that is within certain date (say 3/14 in this case), but the result returns something up to 3/29 (and nothing in 3/14)
my match is {'$lte': datetime.datetime(2016, 3, 14, 23, 59, 59, 999, tzinfo=tzutc()), '$gt': datetime.datetime(2016, 3, 14, 0, 0, tzinfo=tzutc())} which only gets data within the date, and my query command is
{'$match': match},
{'$unwind': '$'+needed_field},
{'$group': {
"_id":{
"date":{
"$concat": [
{"$substr": [{"$year": "$time"}, 0, 4 ]},
"-",
{"$substr": [{"$month": "$time"}, 0, 2 ]},
"-",
{"$substr": [{"$dayOfMonth": "$time"}, 0, 2 ]},
]
},
"state":"$needed_field.state"
},
"count":{"$sum": 1}}
}]
(a little messy, sorry)
This query returns me something up to 3/29 for some reason. Am I not setting my match correctly?

Turns out I was querying a different $time field (there's another time within needed_field, which is the one being used by $match)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Record Aggregation in Mongo across an Array - mongodb

Related

MongoDB: Upsert with array filter

Searching mongo for array of objects

Get the elements that satisfy the criteria from the array in the document from MongoDB collection?

Mongodb aggregating likert scale

pymongo "match" doesn't filter out the correct date

Categories

Resources