MongoDB: count number of matches for OR condition query - mongodb

Given a MongoDB with nested documents
collection = client.test.or_example
new_documents = [
{'_id': 1, 'proportions':{'A': 0.3, 'B': 0.1}},
{'_id': 2, 'proportions':{'C': 0.3, 'D': 0.1}},
{'_id': 3, 'proportions':{'A': 0.3, 'C': 0.3}},
{'_id': 4, 'proportions':{'B': 0.1, 'D': 0.3}},
{'_id': 5, 'proportions':{'A': 0.1, 'B': 0.3}}]
collection.insert_many(new_documents)
I can construct a query that uses OR conditions
collection.find({'$or': [{'proportions.A': {'$gt': 0.2}},
{'proportions.B': {'$gt': 0.2}},
{'proportions.C': {'$gt': 0.2}}]}
which returns four documents (id's 1, 2, 3, 5). Now I'd like to sort these documents by the number of OR conditions they satisfy, so 3, 1, 2, 5 (with respective number of matches 2, 1, 1, 1).
I've been experimenting with counting the number of OR matches in an aggregation pipeline, but can't get it to work. I've managed to create a related field "coverage", but my current try for "number_matches" isn't valid syntax.
results = collection.aggregate([
{
'$match': {'$or': [{'proportions.A': {'$gt': 0.2}},
{'proportions.B': {'$gt': 0.2}},
{'proportions.C': {'$gt': 0.2}}]}
},
{
'$addFields':
{
'coverage': {'$sum': [ '$proportions.A', '$proportions.B', '$proportions.C']},
'number_matches': {'$sum': [ {'cond': [{'proportions.A': {'$gt': 0.2}}, 1, 0]},
{'cond': [{'proportions.B': {'$gt': 0.2}}, 1, 0]},
{'cond': [{'proportions.C': {'$gt': 0.2}}, 1, 0]} ] }
}
},
{
'$sort': {'number_matches': -1}
}
])
Also, my current try feels rather convoluted, so I hope there might be a simpler way.
I'm looking for a solution that works with MongoDB 3.4, but in case there's a more elegant or faster solution for 3.6, I'd also be interested in that.

Maybe you will need to add an intermediate step
Execute the $match
Create a match for each proportion (matchA, matchB, matchC)
Sum the matches

Related

Mongodb aggregate frequencies of every field (dichotomous) in one query

i am fairly new to mongodb and try to develop a nice way of evaluating a so called multiple choice question.
The data looks like this:
db.test.insertMany([
{
Q2_1: 1,
Q2_2: -77,
Q2_3: 1
},
{
Q2_1: -77,
Q2_2: -77,
Q2_3: 1
},
{
Q2_1: 1,
Q2_2: 0,
Q2_3: 0
},
{
Q2_1: 0,
Q2_2: 1,
Q2_3: 0
}
])
In this example we have 4 probands, who gave answers to 3 items.
Every field can contain one of three values -77, 0, 1
-77: proband did not see the item. So it is neither calculated in the 'base' NOR in 'abs'.
0: proband did see the item, but did not choose it. (counts for 'base' BUT NOT for 'abs')
1: proband did see the item, and chose it. (counts for 'base' AND for 'abs')
now i want a result for every item. So item 1 (Q2_1 has the key value of 1 and so on)
so item 1 would have been seen by 3 probands so the 'base' would be 3.
it would have been chosen by two probands so the 'abs' would be 2.
and therefore the 'perc' would be 0.666666.
expected result array:
[
{
"key": 1,
"abs": 2,
"base": 3,
"perc": 0.6666666666
},
{
"key": 2,
"abs": 1,
"base": 2,
"perc": 0.5
},
{
"key": 3,
"abs": 2,
"base": 4,
"perc": 0.5
}
]
Is it possible to do this evaluation in one aggregation query and get this expected result array?
thanks for help :-)
Query
$objectToArray to remove the data from the keys (you should not save data on fields, fields are for the schema only, MongoDB query language is not made for data in fields)
unwind and replace root
group and 2 condition based accumulators base and abs
add the perc and fix the key, split on _ and take the second part
sort by key
*query is bigger because data on fields doesn't work good in MongoDB, so try to avoid it
Playmongo (you can put the mouse in the end of each stage to see what it does)
aggregate(
[{"$unset": ["_id"]}, {"$set": {"data": {"$objectToArray": "$$ROOT"}}},
{"$unwind": "$data"}, {"$replaceRoot": {"newRoot": "$data"}},
{"$group":
{"_id": "$k",
"base": {"$sum": {"$cond": [{"$eq": ["$v", -77]}, 0, 1]}},
"abs": {"$sum": {"$cond": [{"$eq": ["$v", 1]}, 1, 0]}}}},
{"$set": {"key": {"$arrayElemAt": [{"$split": ["$_id", "_"]}, 1]}}},
{"$set": {"_id": "$$REMOVE", "perc": {"$divide": ["$abs", "$base"]}}},
{"$sort": {"key": 1}}])

MongoDB aggregate query for values in an array

So I have data that looks like this:
{
_id: 1,
ranking: 5,
tags: ['Good service', 'Clean room']
}
Each of these stand for a review. There can be multiple reviews with a ranking of 5. The tags field can be filled with up to 4 different tags.
4 tags are: 'Good service', 'Good food', 'Clean room', 'Need improvement'
I want to make a MongoDB aggregate query where I say 'for each ranking (1-5) give me the number of times each tag occurred for each ranking.
So an example result might look like this, _id being the ranking:
[
{ _id: 5,
totalCount: 5,
tags: {
goodService: 1,
goodFood: 3,
cleanRoom: 1,
needImprovement: 0
},
{ _id: 4,
totalCount: 7,
tags: {
goodService: 0,
goodFood: 2,
cleanRoom: 3,
needImprovement: 0
},
...
]
Having trouble with the counting the occurrences of each tag. Any help would be appreciated
You can try below aggregation.
db.colname.aggregate([
{"$unwind":"$tags"},
{"$group":{
"_id":{
"ranking":"$ranking",
"tags":"$tags"
},
"count":{"$sum":1}
}},
{"$group":{
"_id":"$_id.ranking",
"totalCount":{"$sum":"$count"},
"tags":{"$push":{"tags":"$_id.tags","count":"$count"}}
}}
])
To get the key value pair instead of array you can replace $push with $mergeObjects from 3.6 version.
"tags":{"$mergeObjects":{"$arrayToObject":[[["$_id.tags","$count"]]]}}

How can I merge two documents in MongoDB

I have two documents in one collection.
{id: 1, list_data: [1, 2, 4, 5]}
{id: 1, list_data: [2, 5, 8, 9]}
I want merge those data into one document.
{id: 1, list_data: [1, 2, 4, 5, 8, 9]}
How can I do this job?
Please help me.
Thanks.
According to MongoDB documentation
Aggregation operations group values from multiple documents together,
and can perform a variety of operations on the grouped data to return
a single result.
Please refer the aggregation query as mentioned below .
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: {
path:'$list_data'
}
},
// Stage 2
{
$group: {
_id:{id:'$id'},
list_data:{$addToSet:'$list_data'}
}
},
// Stage 3
{
$project: {
'_id.id':1,
"list_data":1
}
},
]
);
In above query document is processed through multiple stages of aggregation pipeline

How to query items in mongodb with a particular dictionary including at least one non zero value?

Suppose I have the following item structure:
"_id": "12325523623453254",
"blas": {
"blaA": 0,
"blaB": 0,
"blaC": 0,
"blaD": 1,
}
}
I like to find the items with "blas" including at least one non zero value.
You can do this with an $or query that uses dot notation in the keys to access the fields within blas:
db.test.find({$or: [
{'blas.blaA': {$ne: 0}},
{'blas.blaB': {$ne: 0}},
{'blas.blaC': {$ne: 0}},
{'blas.blaD': {$ne: 0}}
]})

Pymongo $size operator

Is $size equivalent operator for query condition in pymongo?
like
{'a': {'$size': 3}}
for {a: [1,2,3]}
I don't quite understand your question, but if you're asking if db.foo.find({a: {$size: 3}}) would return the document {a: [1, 2, 3]}, then the answer is yes.