Finding all docs that match minimum amount of array values - mongodb

First time I am writing MongoDB query and managed up to certain point for now but stuck at the moment. I looked into the match property but not sure if it is relevant.
The query below will return all the user documents that contain at least one given role.
roles := []string{"admin", "super_admin", "manager", "student"}
a.db.Collection("users").Find(ctx, bson.M{"roles": bson.M{"$in": roles}})
// db.users.find({roles: { $in: ["admin", "super_admin", "manager", "student"] }})
What I need now is that, specify the minimum matching criteria. For example, the user document must match at least 2 given roles (doesn't matter which ones they are). I will need to use something like EQ, GTE, GT, LT, LTE operators.
Update
It is ok to just handle minimum match so happy to ignore all the listed operators above.

I am not sure there are any other easy way to achieve this, you can use aggregation operators in find if you are using MongoDB v4.4 or you can use aggregate(), I don't know about go syntax but I can do it in MongoDB driver query,
$reduce to iterate loop of roles array, se initial value to 0, check condition if current role is in you input role then add one in initial value otherwise return existing initial value,
check expression with $gte that return number is greater than 2
db.users.find({
$expr: {
$gte: [
{
$reduce: {
input: "$roles",
initialValue: 0,
in: {
$cond: [
{ $in: [$$this", ["admin","super_admin","manager","student"]] },
{ $add: ["$$value", 1] },
"$$value"
]
}
}
},
2 // input your number
]
}
})
Playground
Using aggregate():
Playground

Follows up to accepted answer, this is the Go version of it. Just for those who might need some reference.
filter := bson.M{
"$expr": bson.M{
"$gte": []interface{}{
bson.M{
"$reduce": bson.M{
"input": "$roles",
"initialValue": 0,
"in": bson.M{
"$cond": []interface{}{
bson.M{
"$in": []interface{}{
"$$this",
roles, // This is your []string slice
},
},
bson.M{
"$add": []interface{}{
"$$value",
1,
},
},
"$$value",
},
},
},
},
minMatch, // This is the limit
},
},
}

Related

MongoDB aggregate different collections

i've been trying to do this aggregation since yesterday with no luck, hope you guys can give me some ideas. A little background, I have 2 collections, one for results and another for questions. Results is basically where people solve questions, so it can have 1 question or up to 99 if i'm not mistaken. This is the simplified schema:
Results Schema:
_id: ObjectId("6010664ac5c4f77f26f5d005")
questions: [
{
_id: ObjectId("5d2627c94bb703bfcc910763"),
correct: false
},
{
_id: ObjectId("5d2627c94bb703bfcc910764"),
correct: true
},
{
_id: ObjectId("5d2627c94bb703bfcc910765"),
correct: true
}
]
So, on this specific object, the user answered 3 questions and got 2 of them correct.
Questions Schema:
_id: ObjectId("5d2627c94bb703bfcc910763")
What i'm struggling to do is: for each element in all the questions schema, I have to check if the question was answered - i.e (check if there's an _id of questions array == _id on the Questions Schema, yes there can be multiple questions with the same _id as Questions Schema, but Questions Schema _id is unique). If that question was answered, I need to check if correct = true, if so, I add it to a correctAnswer variable, if not, wrongAnswer.
I've tried many things so far with conditions, group to get the $sum of correct and wrong answers, lookup to join both collections but so far, I can't even get to show just the aggregation result.
One of the things I tried (i was trying to do baby steps first) but as mentioned before, couldn't even get the result printed.
Result.aggregate([
$lookup: {
from: 'question',
localField: 'questions._id',
foreignField: '_id',
as: 'same'
},
But this gets me both collections combined, and 'same' comes as empty array, tried using match with also no luck.
I also did a $project to just get the information I wanted
$project: {
_id: 0,
questions: {
_id: 1,
correct: 1
}
},
Tried using $group:
$group: {
_id: "$_id",
$cond: {if: {"$correct": { $eq: true}}, then: {testField: {$sum: 1}}, else: {testField: 0}}
}
And as I said, i was just trying to do baby steps so it the testField was beeing manually set, also tried many other things from stackoverflow.
Would appreciate the help and sorry for the very long text, just wanted to put in some examples that I did and tried.
TLDR: Need to find a question from the Results Schema where _id matches an _id from the Questions Schema, if there is, check if correct: true or correct: false. Update Questions Schema accordingly with how many were correct and how many were wrong for each question from Questions Schema.
Example: newField: {correctAnswer: 4, wrongAnswer: 3} so in this case, there were 7 questions from the Result schema question array that matched an _id from Question Schema, 4 had correct: true and 3 had correct: false. Then it goes on like this for the rest of Question Schema
For a scenario where "$lookup" can't be used because the Question collection is in a different database, the Result collection may be used to generate output documents to update the Question collection.
Here's one way to do it.
db.Result.aggregate([
{
"$unwind": "$questions"
},
{
"$group": {
"_id": "$questions._id",
"correct": {
"$push": "$questions.correct"
}
}
},
{
"$project": {
"newField": {
"correctAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": "$$bool"
}
}
},
"wrongAnswers": {
"$size": {
"$filter": {
"input": "$correct",
"as": "bool",
"cond": { "$not": "$$bool" }
}
}
}
}
}
}
])
Try it on mongoplayground.net.
I don't know of a way to "$lookup" and update at the same time. There's probably a better way to do this, but the aggregation pipeline below creates the documents that could be used in a subsequent update. The pipeline correctly counts repeat questions by a single Result _id, in case someone keeps trying a question until they get it right. One possible issue is that if a question has no Result answers, then no "newField": { "correctAnswers": 0, "wrongAnswers": 0 } document is created.
db.Question.aggregate([
{
// lookup documents in Result that match _id
"$lookup": {
"from": "Result",
"localField": "_id",
"foreignField": "questions._id",
"as": "results"
}
},
{
// unwind everything
"$unwind": "$results"
},
{
// more of everything
"$unwind": "$results.questions"
},
{
// only keep answers that match question
"$match": {
"$expr": { "$eq": [ "$_id", "$results.questions._id" ] }
}
},
{
// reassemble and count correct/wrong answers
"$group": {
"_id": "$_id",
"correct": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", true ] }, 1, 0 ]
}
},
"wrong": {
"$sum": {
"$cond": [ { "$eq": [ "$results.questions.correct", false ] }, 1, 0 ]
}
}
}
},
{
// project what you want as output
"$project": {
newField: {
correctAnswers: "$correct",
wrongAnswers: "$wrong"
}
}
}
])
Try it on mongoplayground.net.

Comparing number values as strings in MongoDb

I am currently trying to compare two strings which are numbers in a find query. I need to use strings because the numbers would cause an overflow in Javascript if I save them as such.
The to compared value comes via an API call and the value is inside an array:
const auction = await this.findOne({
$and: [{
$or: [
{ 'bids.amount': amount },
{ 'bids.signature': signature },
{ 'bids.amount': { $gte: amount } }
]
}, { 'tokenId': tokenId }, { 'isActive': true }]
});
How would I change the query in order to handle the strings as numbers, so my comparison would actually be correct?
The bellow query assumes that bids is an array like
bids=[{"amount" : "1224245" , "signature" : "234454523"} ...]
If signature is not a number remove the $toLong from signature.
aggregate(
[{"$match":
{"$expr":
{"$and":
[{"$eq": ["$tokenId", "tokenId_VAR"]},
{"$eq": ["$isActive", true]},
{"$reduce":
{"input": "$bids",
"initialValue": false,
"in":
{"$or":
["$$value",
{"$gte": [{"$toLong": "$$this.amount"}, "amount_VAR"]},
{"$eq": [{"$toLong": "$$this.signature"}, "signature_VAR"]}]}}}]}}}])

Inner Join on two Fields

I have the following schemas
var User = mongoose.Schema({
email:{type: String, trim: true, index: true, unique: true, sparse: true},
password: String,
name:{type: String, trim: true, index: true, unique: true, sparse: true},
gender: String,
});
var Song = Schema({
track: { type: Schema.Types.ObjectId, ref: 'track' },//Track can be deleted
author: { type: Schema.Types.ObjectId, ref: 'user' },
url: String,
title: String,
photo: String,
publishDate: Date,
views: [{ type: Schema.Types.ObjectId, ref: 'user' }],
likes: [{ type: Schema.Types.ObjectId, ref: 'user' }],
collaborators: [{ type: Schema.Types.ObjectId, ref: 'user' }],
});
I want to select all users (without the password value) , but I want each user will have all the songs where he is the author or one of the collaborators and the was published in the last 2 weeks.
What is the best strategy perform this action (binding between the user.id and song .collaborators) ? Can it be done in one select?
It's very possible in one request, and the basic tool for this with MongoDB is $lookup.
I would think this actually makes more sense to query from the Song collection instead, since your criteria is that they must be listed in one of two properties on that collection.
Optimal INNER Join - Reversed
Presuming the actual "model" names are what is listed above:
var today = new Date.now(),
oneDay = 1000 * 60 * 60 * 24,
twoWeeksAgo = new Date(today - ( oneDay * 14 ));
var userIds; // Should be assigned as an 'Array`, even if only one
Song.aggregate([
{ "$match": {
"$or": [
{ "author": { "$in": userIds } },
{ "collaborators": { "$in": userIds } }
],
"publishedDate": { "$gt": twoWeeksAgo }
}},
{ "$addFields": {
"users": {
"$setIntersection": [
userIds,
{ "$setUnion": [ ["$author"], "$collaborators" ] }
]
}
}},
{ "$lookup": {
"from": User.collection.name,
"localField": "users",
"foreignField": "_id",
"as": "users"
}},
{ "$unwind": "$users" },
{ "$group": {
"_id": "$users._id",
"email": { "$first": "$users.email" },
"name": { "$first": "$users.name" },
"gender": { "$first": "$users.gender" },
"songs": {
"$push": {
"_id": "$_id",
"track": "$track",
"author": "$author",
"url": "$url",
"title": "$title",
"photo": "$photo",
"publishedDate": "$publishedDate",
"views": "$views",
"likes": "$likes",
"collaborators": "$collaborators"
}
}
}}
])
That to me is the most logical course as long as it's an "INNER JOIN" you want from the results, meaning that "all users MUST have a mention on at least one song" in the two properties involved.
The $setUnion takes the "unique list" ( ObjectId is unique anyway ) of combining those two. So if an "author" is also a "collaborator" then they are only listed once for that song.
The $setIntersection "filters" the list from that combined list to only those that were specified in the query condition. This removes any other "collaborator" entries that would not have been in the selection.
The $lookup does the "join" on that combined data to get the users, and the $unwind is done because you want the User to be the main detail. So we basically reverse the "array of users" into "array of songs" in the result.
Also, since the main criteria is from Song, then it makes sense to query from that collection as the direction.
Optional LEFT Join
Doing this the other way around is where the "LEFT JOIN" is wanted, being "ALL Users" regardless if there are any associated songs or not:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "author",
"as": "authors"
}},
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "collaborators",
"as": "collaborators"
}},
{ "$project": {
"email": 1,
"name": 1,
"gender": 1,
"songs": { "$setUnion": [ "$authors", "$collaborators" ] }
}}
])
So the listing of the statement "looks" shorter, but it is forcing "two" $lookup stages in order to obtain results for possible "authors" and "collaborators" rather than one. So the actual "join" operations can be costly in execution time.
The rest is pretty straightforward in applying the same $setUnion but this time the the "result arrays" rather than the original source of the data.
If you wanted similar "query" conditions to above on the "filter" for the "songs" and not the actual User documents returned, then for LEFT Join you actually $filter the array content "post" $lookup:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "author",
"as": "authors"
}},
{ "$lookup": {
"from": Song.collection.name,
"localField": "_id",
"foreignField": "collaborators",
"as": "collaborators"
}},
{ "$project": {
"email": 1,
"name": 1,
"gender": 1,
"songs": {
"$filter": {
"input": { "$setUnion": [ "$authors", "$collaborators" ] },
"as": "s",
"cond": {
"$and": [
{ "$setIsSubset": [
userIds
{ "$setUnion": [ ["$$s.author"], "$$s.collaborators" ] }
]},
{ "$gte": [ "$$s.publishedDate", oneWeekAgo ] }
]
}
}
}
}}
])
Which would mean that by LEFT JOIN Conditions, ALL User documents are returned but the only ones which will contain any "songs" will be those that met the "filter" conditions as being part of the supplied userIds. And even those users which were contained in the list will only show those "songs" within the required range for publishedDate.
The main addition within the $filter is the $setIsSubset operator, which is a short way of comparing the supplied list in userIds to the "combined" list from the two fields present in the document. Noting here the the "current user" already had to be "related" due to the earlier conditions of each $lookup.
MongoDB 3.6 Preview
A new "sub-pipeline" syntax available for $lookup from the MongoDB 3.6 release means that rather than "two" $lookup stages as shown for the LEFT Join variant, you can in fact structure this as a "sub-pipeline", which also optimally filters content before returning results:
User.aggregate([
{ "$lookup": {
"from": Song.collection.name,
"let": {
"user": "$_id"
},
"pipeline": [
{ "$match": {
"$or": [
{ "author": { "$in": userIds } },
{ "collaborators": { "$in": userIds } }
],
"publishedDate": { "$gt": twoWeeksAgo },
"$expr": {
"$or": [
{ "$eq": [ "$$user", "$author" ] },
{ "$setIsSubset": [ ["$$user"], "$collaborators" ]
]
}
}}
],
"as": "songs"
}}
])
And that is all there is to it in that case, since $expr allows usage of the $$user variable declared in "let" to be compared with each entry in the song collection to select only those that are matching in addition to the other query criteria. The result being only those matching songs per user or an empty array. Thus making the whole "sub-pipeline" simply a $match expression, which is pretty much the same as additional logic as opposed to fixed local and foreign keys.
So you could even add a stage to the pipeline following $lookup to filter out any "empty" array results, making the overall result an INNER Join.
So personally I would go for the first approach when you can and only use the second approach where you need to.
NOTE: There are a couple of options here that don't really apply as well. The first being a special case of $lookup + $unwind + $match coalescence in which whilst the basic case applies to the initial INNER Join example it cannot be applied with the LEFT Join Case.
This is because in order for a LEFT Join to be obtained, the usage of $unwind must be implemented with preserveNullAndEmptyArrays: true, and this breaks the rule of application in that the unwinding and matching cannot be "rolled up" within the $lookup and applied to the foreign collection "before" returning results.
Hence why it is not applied in the sample and we use $filter on the returned array instead, since there is no optimal action that can be applied to the foreign collection "before" the results are returned, and nothing stopping all results for songs matching on simply the foreign key from returning. INNER Joins are of course different.
The other case is .populate() with mongoose. The most important distinction being that .populate() is not a single request, but just a programming "shorthand" for actually issuing multiple queries. So at any rate, there would actually be multiple queries issued and always requiring ALL results in order to apply any filtering.
Which leads to the limitation on where the filtering is actually applied, and generally means that you cannot really implement "paging" concepts when you utilize "client side joins" that require conditions to be applied on the foreign collection.
There are some more details on this on Querying after populate in Mongoose, and an actual demonstration of how the basic functionality can be wired in as a custom method in mongoose schema's anyway, but actually using the $lookup pipeline processing underneath.

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

How to search embedded array

I want to get all matching values, using $elemMatch.
// create test data
db.foo.insert({values:[0,1,2,3,4,5,6,7,8,9]})
db.foo.find({},{
'values':{
'$elemMatch':{
'$gt':3
}
}
}) ;
My expecected result is {values:[3,4,5,6,7,8,9]} . but , really result is {values:[4]}.
I read mongo document , I understand this is specification.
How do I search for multi values ?
And more, I use 'skip' and 'limit'.
Any idea ?
Using Aggregation:
db.foo.aggregate([
{$unwind:"$values"},
{$match:{"values":{$gt:3}}},
{$group:{"_id":"$_id","values":{$push:"$values"}}}
])
You can add further filter condition in the $match, if you would like to.
You can't achieve this using an $elemMatch operator since, mongoDB doc says:
The $elemMatch projection operator limits the contents of an array
field that is included in the query results to contain only the array
element that matches the $elemMatch condition.
Note
The elements of the array are documents.
If you look carefully at the documentation on $elemMatch or the counterpart to query of the positional $ operator then you would see that only the "first" matched element is returned by this type of "projection".
What you are looking for is actually "manipulation" of the document contents where you want to "filter" the content of the array in the document rather than return the original or "matched" element, as there can be only one match.
For true "filtering" you need the aggregation framework, as there is more support there for document manipulation:
db.foo.aggregate([
// No point selecting documents that do not match your condition
{ "$match": { "values": { "$gt": 3 } } },
// Unwind the array to de-normalize as documents
{ "$unwind": "$values },
// Match to "filter" the array
{ "$match": { "values": { "$gt": 3 } } },
// Group by to the array form
{ "$group": {
"_id": "$_id",
"values": { "$push": "$values" }
}}
])
Or with modern versions of MongoDB from 2.6 and onwards, where the array values are "unique" you could do this:
db.foo.aggregate([
{ "$project": {
"values": {
"$setDifference": [
{ "$map": {
"input": "$values",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el", 3 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])