Mongodb Intersection of array with list of embedded documents - mongodb

I will explain the exact use case
I have an array lets say ratings = [1,2,3,4]
and I have a MongoDB record
{
"_id": "1232123",
"data": [
{
"rating": 1,
"reviewed_on": "datetime"
},
{
"rating": 5,
"reviewed_on": "datetime"
}
]
}
Something like above. So I want to fetch those records which filter the records in data field whose ratings matches the given array and return the records whose rating matches with the one in the array
Expected output:
{"_id": '1232123', "data": [{"rating": 1, "reviewed_on": "datetime"}]}
One of the approaches I could think of is to fetch all the results and then filter it on the application level but the set is large so I was assuming to handle it on the db level.
Let me know if the question is not clear and if you want me to add any specific data. Thanks

There are plenty of ways youcan do this, I show one way using filter
[{
$match: {
_id: "1232123"
}
}, {
$project: {
data: {
$filter: {
input: "$data",
cond: {
$in: ["$$this.rating", [1, 2, 3, 4]]
}
}
}
}
}]
Mongo playground

Related

Is there a way to give order field to the result of MongoDB aggregation?

Is there any way to give order or rankings to MongoDB aggregation results?
My result is:
{
"score":100
"name": "John"
},
{
"score":80
"name": "Jane"
},
{
"score":60
"name": "Lee"
}
My wanted result is:
{
"score":100
"name": "John",
"rank": 1
},
{
"score":80
"name": "Jane"
"rank": 2
},
{
"score":60
"name": "Lee"
"rank": 3
}
I know there is a operator called $includeArrayIndex but this only works with $unwind operator.
Is there any way to give rank without using $unwind?
Using $unwind requires grouping on my collection, and I'm afraid grouping pipeline would be too huge to process.
The other way is to use $map and add rank in document using its index, and don't use $unwind stage because it would be single field array you can directly access using its key name as mention in last line of code,
$group by null and make array of documents in root array,
$map to iterate loop of root array, get the index of current object from root array using $indexOfArray and increment that returned index number using $add because index start from 0, and that is how we are creating rank field, merge object with current element object and rank field using $mergeObjects
let result = await db.collection.aggregate([
{
$group: {
_id: null,
root: {
$push: "$$ROOT"
}
}
},
{
$project: {
_id: 0,
root: {
$map: {
input: "$root",
in: {
$mergeObjects: [
"$$this",
{
rank: { $add: [{ $indexOfArray: ["$root", "$$this"] }, 1] }
}
]
}
}
}
}
}
]);
// you can access result using root key
let finalResult = result[0]['root'];
Playground

MongoDB Find values passed in that don't match

Currently stuck with an issue using MongoDB aggregation. I have a array of '_ids' that I need to check exist in a specific collection.
Example:
I have 3 records in 'Collection 1' with _id 1,2,3. I can find the matching values using:
$match: {
_id: {
$in: [1, 2, 3, 4]
}
}
However what I want to know is from the values I have passed in (1,2,3,4). Which ones don't match up to a record. (In this case _id 4 will not have a matching record)
So instead of returning records with _id 1, 2, 3. It needs to return the _id that doesn't exist. So in this example '_id: 4'
The query should also disregard any extra records in the collection. Example, if the collection held records with ID 1-10, and I passed in a query to determine if the _ids: 1, 7, 15 existed. The the value i'm expecting would be along the lines of ' _id: 15 doesn't exist
The first thought was to use to use $project within a aggregation to hold each _id that was passed in, and then attach each record in the collection. To the matching _id passed in. E.g:
Record 1:
{
_id: 1,
Collection1: [
record details: ...,
...
...
]
},
{
_id: 2,
Collection1: [] // This _id passed in, doesn't have a matching collection
}
However cant seem to get a working example in this instance. Any help would be appreciated!
If the input documents are:
{ _id: 1 },
{ _id: 2 },
{ _id: 5 },
{ _id: 10 }
And the array to match is:
var INPUT_ARRAY = [ 1, 7, 15 ]
The following aggregation:
db.test.aggregate( [
{
$match: {
_id: {
$in: INPUT_ARRAY
}
}
},
{
$group: {
_id: null,
matches: { $push: "$_id" }
}
},
{
$project: {
ids_not_exist: { $setDifference: [ INPUT_ARRAY, "$matches" ] },
_id: 0
}
}
] )
Returns:
{ "ids_not_exist" : [ 7, 15 ] }
Are you looking for $not ?
MDB Docs

How to find documents with child collection containing only certain value

I have a following JSON structure:
{
"id": "5cea8bde0c80ee2af9590e7b",
"name": "Sofitel",
"pricePerNight": 88,
"address": {
"city": "Rome",
"country": "Italy"
},
"reviews": [
{
"userName": "John",
"rating": 10,
"approved": true
},
{
"userName": "Marry",
"rating": 7,
"approved": true
}
]
}
I want to find a list of similar documents where ALL ratings values of a review meet a certain criteria eg. less than 8. The document above wouldn't qualify as on of the review has rating 10.
with Querydsl in the following form I still obtain that documnt
BooleanExpression filterByRating = qHotel.reviews.any().rating.lt(8);
You can use $filter and $match to filter out the transactions that you don't need. Following query should do it:
Note: The cond in the $filter is the opposite of your criteria. Since you need ratings less than 8, in this case you gonna need ratings greater than or equals 8
db.qHotel.aggregate([
{
$addFields: {
tempReviews: {
$filter: {
input: "$reviews",
as: "review",
cond: { $gte: [ "$$review.rating", 8 ] } // Opposite of < 8, which is >= 8
}
}
}
},
{
$match : {
tempReviews : [] // This will exclude the documents for which there is at least one review with review.rating >= 8
}
}
]);
The result in the end will contain empty field named tempReviews, you can just use $project to remove it.
EDIT:
Check the example here.

MongoDB sort by array size with large number of documents

I have an article collection which stores a list tags as following:
{
id: 1,
title: "Sample title"
tags: ["tag1", "tag2", "tag3", "tag4"]
}
In order to match articles to user's interest I use aggregation "match" and "setIntersection"
to count how many common tags between user's interest and articles tags then sort them to get best match.
db.article.aggregate([
{
"$match": {
{"tags": {"$in": ["tags", ["tag1", ..., "tag100"]}}
}
},
{
"$project": {
"tags_match": {
"$setIntersection": ["tags", ["tag1", ..., "tag100"]]
},
}
},
{
"$project": {
"tags_match_size": {
"$size": "$tags_match"
},
}
},
{"$sort": {"tags_match_size" : 1}}
{ "$limit" : 40 }
]
);
It works fine if I have few hundred documents in the article collection. Now I have around 1M articles, it takes around half an hour to finish.
I can't create index for "tags_match_size" to run faster as it is a new field in aggregate query.
How can I make the query run faster?
Thank you.
Create an index for tags field. Index will work for only first $match.

MongoDB Nested Array Intersection Query

and thank you in advance for your help.
I have a mongoDB database structured like this:
{
'_id' : objectID(...),
'userID' : id,
'movies' : [{
'movieID' : movieID,
'rating' : rating
}]
}
My question is:
I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.
I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.
Any ideias?
Thank you
There are a couple of ways to do this using the aggregation framework
Just a simple set of data for example:
{
"_id" : ObjectId("538181738d6bd23253654690"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 2, "rating": 6 },
{ "_id": 3, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654691"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 4, "rating": 6 },
{ "_id": 2, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654692"),
"movies": [
{ "_id": 2, "rating": 5 },
{ "_id": 5, "rating": 6 },
{ "_id": 6, "rating": 7 }
]
}
Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.
For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document if you want to keep more than `_id`
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
}},
// Unwind the array
{ "$unwind": "$movies" },
// Build the array back with just `_id` values
{ "$group": {
"_id": "$_id",
"movies": { "$push": "$movies._id" }
}},
// Find the "set intersection" of the two arrays
{ "$project": {
"movies": {
"$size": {
"$setIntersection": [
[ 1, 2, 3 ],
"$movies"
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document along with the "set" to match
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
"set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
}},
// Unwind both those arrays
{ "$unwind": "$movies" },
{ "$unwind": "$set" },
// Group back the count where both `_id` values are equal
{ "$group": {
"_id": "$_id",
"movies": {
"$sum": {
"$cond":[
{ "$eq": [ "$movies._id", "$set" ] },
1,
0
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
In Detail
That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.
$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".
The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:
"$and": [
{ "movies": { "$not": { "$size": 1 } } },
{ "movies": { "$not": { "$size": 2 } } },
{ "movies": { "$not": { "$size": 3 } } }
]
So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.
$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.
What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.
$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.
$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.
Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".
$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.
$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.
Final
That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.
All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".