How to search embedded array - mongodb

I want to get all matching values, using $elemMatch.
// create test data
db.foo.insert({values:[0,1,2,3,4,5,6,7,8,9]})
db.foo.find({},{
'values':{
'$elemMatch':{
'$gt':3
}
}
}) ;
My expecected result is {values:[3,4,5,6,7,8,9]} . but , really result is {values:[4]}.
I read mongo document , I understand this is specification.
How do I search for multi values ?
And more, I use 'skip' and 'limit'.
Any idea ?

Using Aggregation:
db.foo.aggregate([
{$unwind:"$values"},
{$match:{"values":{$gt:3}}},
{$group:{"_id":"$_id","values":{$push:"$values"}}}
])
You can add further filter condition in the $match, if you would like to.
You can't achieve this using an $elemMatch operator since, mongoDB doc says:
The $elemMatch projection operator limits the contents of an array
field that is included in the query results to contain only the array
element that matches the $elemMatch condition.
Note
The elements of the array are documents.

If you look carefully at the documentation on $elemMatch or the counterpart to query of the positional $ operator then you would see that only the "first" matched element is returned by this type of "projection".
What you are looking for is actually "manipulation" of the document contents where you want to "filter" the content of the array in the document rather than return the original or "matched" element, as there can be only one match.
For true "filtering" you need the aggregation framework, as there is more support there for document manipulation:
db.foo.aggregate([
// No point selecting documents that do not match your condition
{ "$match": { "values": { "$gt": 3 } } },
// Unwind the array to de-normalize as documents
{ "$unwind": "$values },
// Match to "filter" the array
{ "$match": { "values": { "$gt": 3 } } },
// Group by to the array form
{ "$group": {
"_id": "$_id",
"values": { "$push": "$values" }
}}
])
Or with modern versions of MongoDB from 2.6 and onwards, where the array values are "unique" you could do this:
db.foo.aggregate([
{ "$project": {
"values": {
"$setDifference": [
{ "$map": {
"input": "$values",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el", 3 ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])

Related

if mongodb match inside aggregation returns nothing, how to make a new query?

I use match to select some documents from the collection, and then output all other documents except those found.
If match doesn't find any documents, then I need to display all available documents from the collection.
How can this be done?
Without an example I don't know if I've understood correctly, but you can try this aggregation query (or add this aggregation stages into your query).
The ide is using $facet create two ways:
Frist way: Match the value
Second way: Get everything
And use $project to output one of these options using $cond and $size.
Into the $project if the array returned in the "exists way" is 0 (any result) the result is no_exists(i.e. all values) otherwise is the exists value.
db.collection.aggregate([
{
"$facet": {
"exists": [
{
"$match": {
// your match
}
}
],
"no_exists": []
}
},
{
"$project": {
"result": {
"$cond": {
"if": {
"$eq": [
{
"$size": "$exists"
},
0
]
},
"then": "$no_exists",
"else": "$exists"
}
}
}
}
])
Example here where value exists and output only the value, and here where not exists and output all collection.

Encapsulate/Transform an Array into an Array of Objects

I am currently in the process of modifying a schema and I need to do a relatively trivial transform using the aggregation framework and a bulkWrite.
I want to be able to take this array:
{
...,
"images" : [
"http://example.com/...",
"http://example.com/...",
"http://example.com/..."
]
}
and aggregate to a similar array where the original value is encapsulated:
{
...,
"images" : [
{url: "http://example.com/..."},
{url: "http://example.com/..."},
{url: "http://example.com/..."}
]
}
This slow query works, but it is ridiculously expensive to unwind an entire collection.
[
{
$match: {}
},
{
$unwind: {
path : "$images",
}
},
{
$group: {
_id: "$_id",
images_2: {$addToSet: {url: "$images"}}
}
},
]
How can this be achieved with project or some other cheaper aggregation?
$map expression should do the job, try this:
db.col.aggregate([
{
$project: {
images: {
$map: {
input: '$images',
as: 'url',
in: {
url: '$$url'
}
}
}
}
}
]);
You don't need to use the bulkWrite() method for this.
You can use the $map aggregation array operator to apply an expression to each element element in your array.
Here, the expression simply create a new object where the value is the item in the array.
let mapExpr = {
"$map": {
"input": "$images",
"as": "imageUrl",
"in": { "url": "$$imageUrl }
}
};
Finally you can use the $out aggregation pipeline operator to overwrite your collection or write the result into a different collection.
Of course $map is not an aggregation pipeline operator so which means that the $map expression must be use in a pipeline stage.
The way you do this depends on your MongoDB version.
The best way is in MongoDB 3.4 using $addFields to change the value of the "images" field in your document.
db.collection.aggregate([
{ "$addFields": { "images": mapExpr }},
{ "$out": "collection }
])
From MongoDB 3.2 backwards, you need to use the $project pipeline stage but you also need to include all the other fields manually in your document
db.collection.aggregate([
{ "$project": { "images": mapExpr } },
{ "$out": "collection }
])

Return only matched sub-document elements within a nested array

The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
Now I try to find all offers, which are available in the size L.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size XS,X and M.
How I can force MongoDB to return only the offers, which matched my query?
Greetings and thanks.
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
MongoDB has a positional $ operator which will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores" array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers", and as such every "offer" within the matchd "stores" array would still be returned.
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
In order of how you can achieve this per version.
First with MongoDB 3.2.x with using the $filter operation:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.x and above with $map and $setDifference:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.x where the aggregation framework was introduced.
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
MongoDB 3.2.x and greater
So generally speaking, $filter is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers" within "stores" to examime and $filter that content.
The simple comparison here is "Does the "size" array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset operation to compare an array ("set") of ["L"] to the target array. Where that condition is true ( it contains "L" ) then the array element for "offers" is retained and returned in the result.
In the higher level $filter, you are then looking to see if the result from that previous $filter returned an empty array [] for "offers". If it is not empty, then the element is returned or otherwise it is removed.
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter in this version you can use $map to inspect each element and then use $setDifference to filter out any elements that were returned as false.
So $map is going to return the whole array, but the $cond operation just decides whether to return the element or instead a false value. In the comparison of $setDifference to a single element "set" of [false] all false elements in the returned array would be removed.
In all other ways, the logic is the same as above.
MongoDB 2.2.x and up
So below MongoDB 2.6 the only tool for working with arrays is $unwind, and for this purpose alone you should not use the aggregation framework "just" for this purpose.
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id values at all levels, so these just need to be included at every level of grouping.
But the problem is that $unwind is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
Conclusion
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costly if implemented incorrectly.
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find().
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
But unless you are doing this in a modern release with only $match and $project, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
In all cases, you get the same result:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints.
Have a fun!
It's also works without aggregate.
here is the solution link:https://mongoplayground.net/p/Q5lxPvGK03A
db.collection.find({
"stores.offers.size": "L"
},
{
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [
[
"L"
],
"$$offer.size"
]
}
}
}
}
}
},
"as": "store",
"cond": {
"$ne": [
"$$store.offers",
[]
]
}
}
}
})

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

Mongodb Aggregate a $slice to get an element in exact position from a nested array

I would like to retrieve a value from a nested array where it exists at an exact position within the array.
I want to create name value pairs by doing $slice[0,1] for the name and then $slice[1,1] for the value.
Before I attempt to use aggregate, I want to attempt a find within a nested array. I can do what I want on a single depth array in a document as shown below:
{
"_id" : ObjectId("565cc5261506995581569439"),
"a" : [
4,
2,
8,
71,
21
]
}
I apply the following: db.getCollection('anothertest').find({},{_id:0, a: {$slice:[0,1]}})
and I get:
{
"a" : [
4
]
}
This is fantastic. However, what if the array I want to $slice [0,1] is located within the document at objectRawOriginData.Reports.Rows.Rows.Cells?
If I can first of all FIND then I want to apply the same as an AGGREGATE.
Your best bet here and especially if your application is not yet ready for release is to hold off until MongoDB 3.2 for deployment, or at least start working with a release candidate in the interim. The main reason being is that the "projection" $slice does not work with the aggregation framework, as do not other forms of array matching projection as well. But this has been addressed for the upcoming release.
This is going to give you a couple of new operators, being $slice and even $arrayElemAt which can be used to address array elements by position in the aggregation pipeline.
Either:
db.getCollection('anothertest').aggregate([
{ "$project": {
"_id": 0,
"a": { "$slice": ["$a",0,1] }
}}
])
Which returns the familiar:
{ "a" : [ 4 ] }
Or:
db.getCollection('anothertest').aggregate([
{ "$project": {
"_id": 0,
"a": { "$arrayElemAt": ["$a", 0] }
}}
])
Which is just the element and not an array:
{ "a" : 4 }
Until that release becomes available other than in release candidate form, the currently available operators make it quite easy for the "first" element of the array:
db.getCollection('anothertest').aggregate([
{ "$unwind": "$a" },
{ "$group": {
"_id": "$_id",
"a": { "$first": "$a" }
}}
])
Through use of the $first operator after $unwind. But getting another indexed position becomes horribly iterative:
db.getCollection('anothertest').aggregate([
{ "$unwind": "$a" },
// Keeps the first element
{ "$group": {
"_id": "$_id",
"first": { "$first": "$a" },
"a": { "$push": "$a" }
}},
{ "$unwind": "$a" },
// Removes the first element
{ "$redact": {
"$cond": {
"if": { "$ne": [ "$first", "$a" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Top is now the second element
{ "$group": {
"_id": "$_id",
"second": { "$first": "$a" }
}}
])
And so on, and also a lot of handling to alter that to deal with arrays that might be shorter than the "nth" element you are looking for. So "possible", but ugly and not performant.
Also noting that is "not really" working with "indexed positions", and is purely matching on values. So duplicate values would easily be removed, unless there was another unique identifier per array element to work with. Future $unwind also has the ability to project an array index, which is handy for other purposes, but the other operators are more useful for this specific case than that feature.
So for my money I would wait till you had the feature available to be able to integrate this in an aggregation pipeline, or at least re-consider why you believe you need it and possibly design around it.