MongoDB find document with most matched elements when using $in operator - mongodb

So, I have the next schema, it's an index of car parts that is used to automatically find the part types according to the keywords.
nameEnglish: {
type: String
},
keywords: [{
type: String
}]
I have two documents on the database, one is:
[
{ _id: 1, nameEnglish: 'abcdef', keywords: ['a', 'b', 'c', 'd', 'e', 'f'] },
{ _id: 2, nameEnglish: 'cde', keywords: ['c', 'd', 'e'] }
]
What I am want to do now is query this collection with the MOST number of matched keywords.
myArr = ['b', 'c', 'f'];
db.collection.find({ keywords: { $in: myArr } });
I want this to always return the first document, since it has 3 matched keywords, and the second has only one. How can I achieve this?

You can try below aggregation using $setIntersection to find the matching elements between keywords array and input array followed by $size to count the matching elements and $sort descending and $limit 1 to output the most matched document.
You can drop the count field using project exclusion {$project:{count:0}} as final stage.
db.col.aggregate(
[{
$addFields: {
"count": {
$size: {
$setIntersection: ["$keywords", myArr]
}
}
}
}, {
$sort: {
"count": -1
}
}, {
$limit: 1
}]
)

You need to use aggregation with $setIntersection to get the number of matches, and sort by the no of matches, limit to expected number of results.
db.col.aggregate([
{$addFields : {noOfMatch :{$size :{$setIntersection : ["$keywords", ['b', 'c', 'f']]}}}},
{$sort : {"noOfMatch" : -1}},
{$limit : 1}
])

Related

MongoDB Aggregation Pipeline filter based on properties of child objects within array

Given documents with the following structure:
{ 'id': 1, name: 'bob', type: 'foo', children: [{'id': 2}, {'id': 3}]}
{ 'id': 2, name: 'bob', type: 'foo' }
{ 'id': 3, name: 'bob', type: 'bar' }
{ 'id': 4, name: 'bob', type: 'foo', children: [{'id': 5}, {'id': 6}]}
{ 'id': 5, name: 'bob', type: 'foo' }
{ 'id': 6, name: 'bob', type: 'foo' }
How could I write an aggregate pipeline query to find all the documents where, if they have children, all the children are of type foo (and the parent is of type 'foo')?
Additional notes:
children is an array of objects with a property referencing other documents in the same collection
not all documents have children
changing the document structure is not an option
I've looked into $unwind and $lookup, but this results in many documents and I only want the parent document at the end of this.
After some additional toying with the aggregation pipeline API, here is one potential solution.
The steps are:
First $match based on the type criterion, to ensure only the parent documents with the appropriate type are used subsequently in the pipeline.
Perform a simple $lookup on the child documents. Although this doesn't appear to be explicitly documented, $lookup can use properties of nested objects in arrays with no difficulty.
Perform a final match on the resulting documents, making use of $elemMatch and some negation to achieve the desired effect.
Here's what that looks like using Robo3T (should be easily translated to other query clients):
Note: In this particular case, id is just a placeholder for whatever the documents are being joined on, it is not the "official" _id mongo field
db.getCollection('items').aggregate([
{ $match : { "type": "foo" } },
{
$lookup: {
from: "items",
localField: "children.id",
foreignField: "id",
as: "items"
}
},
{
$match : {
"items": {
$not: {
$elemMatch: {
"type": { $ne: "foo" }
}
}
}
}
}
])
This will exclude documents 1 and 3, since 3 has a type of "bar" and 1 includes it.
This may not be an optimal solution, I have not tested it on large datasets. Also, the final match using $elemMatch is quite messy, so recommendations for improvements on that are welcome.
I have written a solution. please take a look the query. solution checkup link: https://mongoplayground.net/p/cu7Mf8XZHDI
this query is similar to the question: MongoDB (Mongoose) how to returning all document fields using $elemMatch
db.collection.find({},
{
students: {
$filter: {
input: "$students",
as: "student",
cond: {
$or: [
{
$eq: [
"$$student.age",
8
]
},
{
$eq: [
"$$student.age",
15
]
}
]
}
}
}
})

MongoDB Find values passed in that don't match

Currently stuck with an issue using MongoDB aggregation. I have a array of '_ids' that I need to check exist in a specific collection.
Example:
I have 3 records in 'Collection 1' with _id 1,2,3. I can find the matching values using:
$match: {
_id: {
$in: [1, 2, 3, 4]
}
}
However what I want to know is from the values I have passed in (1,2,3,4). Which ones don't match up to a record. (In this case _id 4 will not have a matching record)
So instead of returning records with _id 1, 2, 3. It needs to return the _id that doesn't exist. So in this example '_id: 4'
The query should also disregard any extra records in the collection. Example, if the collection held records with ID 1-10, and I passed in a query to determine if the _ids: 1, 7, 15 existed. The the value i'm expecting would be along the lines of ' _id: 15 doesn't exist
The first thought was to use to use $project within a aggregation to hold each _id that was passed in, and then attach each record in the collection. To the matching _id passed in. E.g:
Record 1:
{
_id: 1,
Collection1: [
record details: ...,
...
...
]
},
{
_id: 2,
Collection1: [] // This _id passed in, doesn't have a matching collection
}
However cant seem to get a working example in this instance. Any help would be appreciated!
If the input documents are:
{ _id: 1 },
{ _id: 2 },
{ _id: 5 },
{ _id: 10 }
And the array to match is:
var INPUT_ARRAY = [ 1, 7, 15 ]
The following aggregation:
db.test.aggregate( [
{
$match: {
_id: {
$in: INPUT_ARRAY
}
}
},
{
$group: {
_id: null,
matches: { $push: "$_id" }
}
},
{
$project: {
ids_not_exist: { $setDifference: [ INPUT_ARRAY, "$matches" ] },
_id: 0
}
}
] )
Returns:
{ "ids_not_exist" : [ 7, 15 ] }
Are you looking for $not ?
MDB Docs

Counting data per user with mongo aggregation framework

I have a collection, where each document contains user_ids as a property, which is an Array field. Example document(s) would be :
[{
_id: 'i3oi1u31o2yi12o3i1',
unique_prop: 33,
prop1: 'some string value',
prop2: 212,
user_ids: [1, 2, 3 ,4]
},
{
_id: 'i3oi1u88ffdfi12o3i1',
unique_prop: 34,
prop1: 'some string value',
prop2: 216,
user_ids: [2, 3 ,4]
},
{
_id: 'i3oi1u8834432ddsda12o3i1',
unique_prop: 35,
prop1: 'some string value',
prop2: 211,
user_ids: [2]
}]
My goal is to get number of documents per user, so sample output would be :
[
{user_id: 1, count: 1},
{user_id: 2, count: 3},
{user_id: 3, count: 2},
{user_id: 4, count: 2}
]
I've tried couple of things none of which worked, lastly I tried :
aggregate([
{ $group: {
_id: { unique_prop: "$unique_prop"},
users: { "$addToSet": "$user_ids" },
count: { "$sum": 1 }
}}
]
But it just returned the users per document. I m still trying to learn the any resource or advice would help.
You need to $unwind the "user_ids" array and in the $group stage count the number of time each "id" appears in the collection.
db.collection.aggregate([
{ "$unwind": "$user_ids" },
{ "$group": { "_id": "$user_ids", "count": {"$sum": 1 }}}
])
MongoDB aggregation performs computation on group of values from documents in a collection and return computed result through executing its stages in a pipeline.
According to above mentioned description please try executing following aggregate query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: "$user_ids"
},
// Stage 2
{
$group: {
_id:{user_id:'$user_ids'},
total:{$sum:1}
}
},
// Stage 3
{
$project: {
_id:0,
user_id:'$_id.user_id',
count:'$total'
}
},
]
);
In above aggregate query initially $unwind operator breaks an array field user_ids of each document into multiple documents for each element of array field and then it groups documents by value of user_ids field contained into each document and performs summation of documents for each value of user_ids field.

Compare document array size to other document field

The document might look like:
{
_id: 'abc',
programId: 'xyz',
enrollment: 'open',
people: ['a', 'b', 'c'],
maxPeople: 5
}
I need to return all documents where enrollment is open and the length of people is less than maxPeople
I got this to work with $where:
const
exists = ['enrollment', 'maxPeople', 'people'],
query = _.reduce(exists, (existsQuery, field) => {
existsQuery[field] = {'$exists': true}; return existsQuery;
}, {});
query['$and'] = [{enrollment: 'open'}];
query['$where'] = 'this.people.length<this.maxPeople';
return db.coll.find(query, {fields: {programId: 1, maxPeople: 1, people: 1}});
But could I do this with aggregation, and why would it be better?
Also, if aggregation is better/faster, I don't understand how I could convert the above query to use aggregation. I'm stuck at:
db.coll.aggregate([
{$project: {ab: {$cmp: ['$maxPeople','$someHowComputePeopleLength']}}},
{$match: {ab:{$gt:0}}}
]);
UPDATE:
Based on #chridam answer, I was able to implement a solution like so, note the $and in the $match, for those of you that need a similar query:
return Coll.aggregate([
{
$match: {
$and: [
{"enrollment": "open"},
{"times.start.dateTime": {$gte: new Date()}}
]
}
},
{
"$redact": {
"$cond": [
{"$lt": [{"$size": "$students" }, "$maxStudents" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
]);
The $redact pipeline operator in the aggregation framework should work for you in this case. This will recursively descend through the document structure and do some actions based on an evaluation of specified conditions at each level. The concept can be a bit tricky to grasp but basically the operator allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which restricts contents of the result set based on the access required to view the data and is more efficient.
To run a query on all documents where enrollment is open and the length of people is less than maxPeople, include a $redact stage as in the following::
db.coll.aggregate([
{ "$match": { "enrollment": "open" } },
{
"$redact": {
"$cond": [
{ "$lt": [ { "$size": "$people" }, "$maxPeople" ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
You can do :
1 $project that create a new field featuring the result of the comparison for the array size of people to maxPeople
1 $match that match the previous comparison result & enrollment to open
Query is :
db.coll.aggregate([{
$project: {
_id: 1,
programId: 1,
enrollment: 1,
cmp: {
$cmp: ["$maxPeople", { $size: "$people" }]
}
}
}, {
$match: {
$and: [
{ cmp: { $gt: 0 } },
{ enrollment: "open" }
]
}
}])

How to aggregate a result by condition

I have a collection based on which I need to calculate a score which is as simple as this.
Calculate score for all students in my collection
If a student belongs to class 'A' or 'B' he gets a score of 5 else if he belongs to class 'C' or 'D' he gets 4
Student:
{
name:"Aster",
classes:['A','B']
}
Aggregation doesn't allow $in operator on $cond so how do i proceed
Ps:Excuse. Brevity sent on the go
Not sure this can cover totally your problem, but you can use $setIsSubset in Mongo 2.6:
db.collection.aggregate([
{ $project: { name: 1 ,
grade: { $cond: [{$setIsSubset: ["$classes", ["A","B"]]}, 5, 4]}
}
}])
In case classes can be either string or array:
db.collection.aggregate([
{ $project: { name: 1 ,
grade: {$cond: [{$or: [{$eq: ["$classes", "A"]},
{$eq: ["$classes", "B"]},
{$setIsSubset: ["$classes", ["A","B"]]}]},
5, 4]}
}
}])