I have a following JSON structure:
{
"id": "5cea8bde0c80ee2af9590e7b",
"name": "Sofitel",
"pricePerNight": 88,
"address": {
"city": "Rome",
"country": "Italy"
},
"reviews": [
{
"userName": "John",
"rating": 10,
"approved": true
},
{
"userName": "Marry",
"rating": 7,
"approved": true
}
]
}
I want to find a list of similar documents where ALL ratings values of a review meet a certain criteria eg. less than 8. The document above wouldn't qualify as on of the review has rating 10.
with Querydsl in the following form I still obtain that documnt
BooleanExpression filterByRating = qHotel.reviews.any().rating.lt(8);
You can use $filter and $match to filter out the transactions that you don't need. Following query should do it:
Note: The cond in the $filter is the opposite of your criteria. Since you need ratings less than 8, in this case you gonna need ratings greater than or equals 8
db.qHotel.aggregate([
{
$addFields: {
tempReviews: {
$filter: {
input: "$reviews",
as: "review",
cond: { $gte: [ "$$review.rating", 8 ] } // Opposite of < 8, which is >= 8
}
}
}
},
{
$match : {
tempReviews : [] // This will exclude the documents for which there is at least one review with review.rating >= 8
}
}
]);
The result in the end will contain empty field named tempReviews, you can just use $project to remove it.
EDIT:
Check the example here.
Related
I have following documents in a MongoDb:
from pymongo import MongoClient
client = MongoClient(host='my_host', port=27017)
database = client.forecast
collection = database.regions
collection.delete_many({})
regions = [
{
'id': 'DE',
'sites': [
{
'name': 'paper_factory',
'energy_consumption': 1000
},
{
'name': 'chair_factory',
'energy_consumption': 2000
},
]
},
{
'id': 'FR',
'sites': [
{
'name': 'pizza_factory',
'energy_consumption': 3000
},
{
'name': 'foo_factory',
'energy_consumption': 4000
},
]
}
]
collection.insert_many(regions)
Now I would like to copy the property sites.energy_consumption to a new field sites.new_field for each site:
set_stage = {
"$set": {
"sites.new_field": "$sites.energy_consumption"
}
}
pipeline = [set_stage]
collection.aggregate(pipeline)
However, instead of copying the individual value per site, all site values are collected and added as an array. Intead of 'new_field': [1000, 2000] I would like to get 'new_field': 1000 for the first site:
{
"_id": ObjectId("61600c11732a5d6b103ba6be"),
"id": "DE",
"sites": [
{
"name": "paper_factory",
"energy_consumption": 1000,
"new_field": [
1000,
2000
]
},
{
"name": "chair_factory",
"energy_consumption": 2000,
"new_field": [
1000,
2000
]
}
]
},
{
"_id": ObjectId("61600c11732a5d6b103ba6bf"),
"id": "FR",
"sites": [
{
"name": "pizza_factory",
"energy_consumption": 3000,
"new_field": [
3000,
4000
]
},
{
"name": "foo_factory",
"energy_consumption": 4000,
"new_field": [
3000,
4000
]
}
]
}
=> What expression can I use to only use the corresponding entry of the array?
Is there some sort of current-index operator:
$sites[<current_index>].energy_consumption
or an alternative dot operator (would remind me on difference between * multiplication and .* element wise matrix multiplication)?
$sites:energy_consumption
Or is this a bug?
Edit
I also tried to use the "$" positional operator, e.g. with
sites.$.new_field
or
$sites.$.energy_consumption
but then I get the error
FieldPath field names may not start with '$'
Related:
https://docs.mongodb.com/manual/reference/operator/aggregation/set/#std-label-set-add-field-to-embedded
In MongoDB how do you use $set to update a nested value/embedded document?
If the field is member of an array by selecting it you are selecting all of them.
{ar :[{"a" : 1}, {"a" : 2}]}
"$ar.a" = [1 ,2]
Also you cant mix update operators with aggregation, you cant use things like
$sites.$.energy_consumption, if you are doing aggregation you have to use aggregate operators, with only exception the $match stage where you can use query operators.
Query
alternative slightly different solution from yours using $setField
i guess it will be faster, but probably little difference
no need to use javascript it will be slower
this is >= MongoDB 5 solution, $setField is new operator
Test code here
aggregate(
[{"$set":
{"sites":
{"$map":
{"input":"$sites",
"in":
{"$setField":
{"field":"new_field",
"input":"$$this",
"value":"$$this.energy_consumption"}}}}}}]
)
use $addFields
db.collection.update({},
[
{
"$addFields": {
"sites": {
$map: {
input: "$sites",
as: "s",
in: {
name: "$$s.name",
energy_consumption: "$$s.energy_consumption",
new_field: {
$map: {
input: "$sites",
as: "value",
in: "$$value.energy_consumption"
}
}
}
}
}
}
}
])
mongoplayground
I found following ugly workarounds that set the complete sites instead of only specifying a new field with dot notation:
a) based on javascript function
set_stage = {
"$set": {
"sites": {
"$function": {
"body": "function(sites) {return sites.map(site => {site.new_field = site.energy_consumption_in_mwh; return site})}",
"args": ["$sites"],
"lang": "js"
}
}
}
}
b) based on map and mergeObjects
set_stage = {
"$set": {
"sites": {
"$map": {
"input": "$sites",
"in": {
"$mergeObjects": ["$$this", {
"new_field": "$$this.energy_consumption_in_mwh"
}]
}
}
}
}
}
If there is some kind of $$this context for the dot operator expression, allowing a more elegant solution, please let me know.
I will explain the exact use case
I have an array lets say ratings = [1,2,3,4]
and I have a MongoDB record
{
"_id": "1232123",
"data": [
{
"rating": 1,
"reviewed_on": "datetime"
},
{
"rating": 5,
"reviewed_on": "datetime"
}
]
}
Something like above. So I want to fetch those records which filter the records in data field whose ratings matches the given array and return the records whose rating matches with the one in the array
Expected output:
{"_id": '1232123', "data": [{"rating": 1, "reviewed_on": "datetime"}]}
One of the approaches I could think of is to fetch all the results and then filter it on the application level but the set is large so I was assuming to handle it on the db level.
Let me know if the question is not clear and if you want me to add any specific data. Thanks
There are plenty of ways youcan do this, I show one way using filter
[{
$match: {
_id: "1232123"
}
}, {
$project: {
data: {
$filter: {
input: "$data",
cond: {
$in: ["$$this.rating", [1, 2, 3, 4]]
}
}
}
}
}]
Mongo playground
This question already has answers here:
Query for documents where array size is greater than 1
(14 answers)
Closed 6 years ago.
I have a collection like
{
"_id": "201503110040020021",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
{
"Secuence": 183,
"Title": 613,
},
...
],
} {
"_id": "201503110040020022",
"Line": "1", // several documents may have this Line value
"LineStart": ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND": [{
"Secuence": 10,
"Title": 1,
},
],
}
SSCEXPEND is an array. I am trying to count the size of SSC array and project if the count is greater than or equal to 2. My query is something like this
db.entity.aggregate(
[
{
$project: {
SSCEXPEND_count: {$size: "$SSCEXPEND"}
}
},
{
$match: {
"SSCEXPEND_count2": {$gte: ["$SSCEXPEND_count",2]}
}
}
]
)
I am expecting the output to be only the the first document whose array size is greater than 2.
Project part is working fine and I am able to get the counts but I need to project only those which has count greater than or equal to two but my match part is not working. Can any one guide me as where am I going wrong?
You need to project the other fields and your $match pipeline will just need to do a query on the newly-created field to filter the documents based on the array size. Something like the following should work:
db.entity.aggregate([
{
"$project": {
"Line": 1,
"LineStart": 1, "SSCEXPEND": 1,
"SSCEXPEND_count": { "$size": "$SSCEXPEND" }
}
},
{
"$match": {
"SSCEXPEND_count": { "$gte": 2 }
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : "201503110040020021",
"Line" : "1",
"LineStart" : ISODate("2015-03-11T06:49:35.000Z"),
"SSCEXPEND" : [
{
"Secuence" : 10,
"Title" : 1
},
{
"Secuence" : 183,
"Title" : 613
}
],
"SSCEXPEND_count" : 2
}
],
"ok" : 1
}
This is actually a very simple query, where the trick is to use a property of "dot notation" in order to test the array. All you really need to ask for is documents where the array index of 2 $exists, which means the array must contain 3 elements or more:
db.entity.find({ "SSCEXPEND.2": { "$exists": true } })
It's the fastest way to do it and can even use indexes. No need for calculations in aggregation operations.
I have a document like this:
{
timestamp: ISODate("2013-10-10T23:00:00.000Z"),
values: {
0: 25,
1: 2,
3: 16,
4: 12,
5: 10
}
}
Two questions:
How can I get the "argmax" 0 from the nested document in values?
If I have multiple documents like this, can I query for all documents with an "argmax" of 2, for instance?
You really need to change the way you are structuring your documents as what you have right now is not good. Nested objects like you have cannot be "traversed" in normal query operations, so their is no way of efficiently searching "across keys".
The only way to do this is using JavaScript evaluation of $where, and this means "no index" can be used to optimise searching. It is also basically a "brute force" match against every document in the collection:
db.collection.find(function() {
var values = this.values;
return Object.keys(values).some(function(key) {
return values[key] == 2;
});
})
That is just to find a value of "2" in the nested key, in order to find if the "maximum" value was "2" then you would do:
db.collection.find(function() {
var values = this.values;
return Math.max.apply(null, Object.keys(values).map(function(key) {
return values[key];
})) == 2;
})
So brute force is not good. Better to structure your document with "values" as an "array". Then all native queries work fine:
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10]
}
Now you can do:
db.collection.aggregate([
{ "$match": { "values": 2 } },
{ "$unwind": "$values" },
{ "$group": {
"_id": "$_id",
"timestamp": { "$first": "$timestamp" },
"values": { "$push": "$values" },
"maxVal": { "$max": "$values" }
}},
{ "$match": { "maxVal": 2 } }
])
Which might at first glance seem more cumbersome, but the construction of this using native operators as opposed to JavaScript translation does make this much more efficient. It is also notably more efficient in that it is now possible to actually search whether the "values" array actually even contains "2" as a value using an index even, without needing to test all content in looped code.
The main work is done within testing the "max" value for the array, so even if this was not the most shining example, you can see even see the clear difference in how a normal query operation can be combined with JavaScript evaluation now, to make that process faster:
db.collection.find({
"values": 2,
"$where": function() {
return Math.max.apply(null,this.values) == 2;
}
})
So the initial "values": 2 will filter the documents immediately for those that contain "2", and the subsequent expression merely filters down further those documents where the "max" value of the array is "2" as well.
Moreover, if it was your intention to query for "maximum" values like this on a regular basis, then you would be better off storing this value as a discrete field in the document itself, like so:
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10],
"minValue": 2,
"maxValue": 25
}
Then finding documents with the "maximum value" of 2 is as simple as:
db.collection.find({ "maxValue": 2 })
Or the largest "max" within all documents:
db.collection.find().sort({ "maxValue": -1 }).limit(1)
Or even both "min" and "max" from all documents at the same time:
db.collection.aggregate([
{ "$group": {
"_id": null,
"minValue": { "$min": "$minValue" },
"maxValue": { "$max": "$maxValue" }
}}
])
Maintaining this data when adding new "values" is a simple matter of employing the $min and $max update operators as you update the document. So to add "26" to the values:
db.collection.update(
{ "timestamp": ISODate("2013-10-10T23:00:00.000Z") },
{
"$push": { "values": 26 },
"$min": { "minValue": 26 },
"$max": { "maxValue": 26 }
}
)
Which results in only ajusting values where either $min or $max respectively was less than or greater than the current value.
{
"timestamp": ISODate("2013-10-10T23:00:00.000Z"),
"values": [25, 2, 16, 12, 10, 26],
"minValue": 2,
"maxValue": 26
}
Therefore it should be clear to see why the structure is important, and that nested objects should be avoided in preference to an array where it is your intention to traverse the data, in either analysing the document itself, or indeed across multiple documents in a collection.
and thank you in advance for your help.
I have a mongoDB database structured like this:
{
'_id' : objectID(...),
'userID' : id,
'movies' : [{
'movieID' : movieID,
'rating' : rating
}]
}
My question is:
I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.
I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.
Any ideias?
Thank you
There are a couple of ways to do this using the aggregation framework
Just a simple set of data for example:
{
"_id" : ObjectId("538181738d6bd23253654690"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 2, "rating": 6 },
{ "_id": 3, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654691"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 4, "rating": 6 },
{ "_id": 2, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654692"),
"movies": [
{ "_id": 2, "rating": 5 },
{ "_id": 5, "rating": 6 },
{ "_id": 6, "rating": 7 }
]
}
Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.
For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document if you want to keep more than `_id`
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
}},
// Unwind the array
{ "$unwind": "$movies" },
// Build the array back with just `_id` values
{ "$group": {
"_id": "$_id",
"movies": { "$push": "$movies._id" }
}},
// Find the "set intersection" of the two arrays
{ "$project": {
"movies": {
"$size": {
"$setIntersection": [
[ 1, 2, 3 ],
"$movies"
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document along with the "set" to match
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
"set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
}},
// Unwind both those arrays
{ "$unwind": "$movies" },
{ "$unwind": "$set" },
// Group back the count where both `_id` values are equal
{ "$group": {
"_id": "$_id",
"movies": {
"$sum": {
"$cond":[
{ "$eq": [ "$movies._id", "$set" ] },
1,
0
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
In Detail
That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.
$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".
The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:
"$and": [
{ "movies": { "$not": { "$size": 1 } } },
{ "movies": { "$not": { "$size": 2 } } },
{ "movies": { "$not": { "$size": 3 } } }
]
So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.
$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.
What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.
$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.
$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.
Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".
$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.
$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.
Final
That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.
All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".