The problem is that given documents with two arrays each containing documents as their elements that I want to find the documents that essentially have:
"obj1.a" === "obj2.b"
So given the sample documents, but actually expecting much larger arrays, then how do do this?:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
]
},
{
"obj1": [
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "a" }
]
}
One approach might be to compare these with JavaScript and the $where operator, but looping large arrays from within JavaScript doesn't sound very favorable.
Another approach is using the aggregation framework to do the comparison, but this involves unwinding two arrays on top of each other which can create a lot of documents to be processed in the pipeline:
db.objects.aggregate([
{ "$unwind": "$obj1" },
{ "$unwind": "$obj2" },
{ "$project": {
"match": { "$eq": [ "$obj1.a", "$obj2.b" ] }
}},
{ "$group": {
"_id": "$_id",
"match": { "$max": "$match" }
}},
{ "$match": { "match": true } }
])
Where performance is a concern it is easy to see how the number of documents actually processing through $project and $group can end up many times larger than the original documents in the collection.
So in order to do this there has to be some way of comparing the array elements without needing to perform an $unwind on those arrays and end up grouping the documents back together. How could this be done?
You can get this sort of result using the $map operator that was introduced in MongoDB 2.6. This operates by taking an input array and allowing an expression to be evaluated over each element producing a new array as the result:
db.objects.aggregate([
{ "$project": {
"match": {
"$size": {
"$setIntersection": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": { "$concat": ["$$el.a",""] }
}},
{ "$map": {
"input": "$obj2",
"as": "el",
"in": { "$concat": ["$$el.b",""] }
}}
]
}
}
}},
{ "$match": { "match": { "$gte": 1 } } }
])
Here this is used with the $setIntersection and $size operators. As the $map returns just the property values from the elements that you want to compare you end up with two arrays just containing those values.
The only this is that the "in" option for $map currently requires an operator to be present within the Object {} notation of it's arguments. You cannot presently say:
"in": "$$el.a"
To get around this we are using $concat to join the string value with an empty string. Other operators can be used for different types of even $ifNull which would be fairly generic and gets around "type" problems
"in": { "$ifNull": [ "$$el.a", false ] }
The $setIntersection that wraps these, is used to determine which values of those "sets" are the same and returns it's result as another array containing only the matching values.
Finally the $size operator here is an aggregation method that returns the actual "size" of the array as an integer. So this can be used in the following $match to then filter out any results that did not return a "size" value of 1 or greater.
Essentially this does all the work that was done in four individual stages, where the first two are exponentially growing the number of documents to be processed, within two simple passes, all without increasing the number of documents that were received as input.
Related
I have a collection with data that looks sort of like this
{
"part": [
{ "a": "1", "b": "a" },
{ "a": "23", "b": "b" },
{ "a": "4", "b": "c" },
]
}
What I would like is a way of searching for documents where the join of all a parts equals the search that I am looking for.
for example 1234 should match the document above, but 124 should not.
is this possible with MongoDB?
You can do it with Aggregation framework:
$match with $eq - To filter only documents where concatenated a properties of part array are equal to the input string.
$reduce with $concat - To concatenate all a properties of part array for each document.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$eq": [
"1234",
{
"$reduce": {
"input": "$part",
"initialValue": "",
"in": {
"$concat": [
"$$value",
"$$this.a"
]
}
}
}
]
}
}
}
])
Working example
You can use aggregate with $reduce to join string then $match to filter your string.
Here is the playground.
I have one requirement where i need to do aggregation on two records both have an array field with different value. What I need that when I do aggregation on these records the result should have one array with unique values from both different arrays. Here is example :
First record
{ Host:"abc.com" ArtId:"123", tags:[ "tag1", "tag2" ] }
Second record
{ Host:"abc.com" ArtId:"123", tags:[ "tag2", "tag3" ] }
After aggregation on host and artid i need result like this:
{ Host: "abc.com", ArtId: "123", count :"2", tags:[ "tag1", "tag2", "tag3" ]}
I tried $addToset in group statement but it gives me like this tags :[["tag1","tag2"],["tag2","tag3"]]
Could you please help me how i can achieve this in aggregation
TLDR;
Modern releases should use $reduce with $setUnion after the initial $group as is shown:
db.collection.aggregate([
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"count": { "$sum": 1 },
"tags": { "$addToSet": "$tags" }
}},
{ "$addFields": {
"tags": {
"$reduce": {
"input": "$tags",
"initialValue": [],
"in": { "$setUnion": [ "$$value", "$$this" ] }
}
}
}}
])
You were right in finding the $addToSet operator, but when working with content in an array you generally need to process with $unwind first. This "de-normalizes" the array entries and essentially makes a "copy" of the parent document with each array entry as a singular value in the field. That's what you need to avoid the behavior you are seeing without using that.
Your "count" poses an interesting problem though, but easily solved through the use of a "double unwind" after an initial $group operation:
db.collection.aggregate([
// Group on the compound key and get the occurrences first
{ "$group": {
"_id": { "Host": "$Host", "ArtId": "$ArtId" },
"tcount": { "$sum": 1 },
"ttags": { "$push": "$tags" }
}},
// Unwind twice because "ttags" is now an array of arrays
{ "$unwind": "$ttags" },
{ "$unwind": "$ttags" },
// Now use $addToSet to get the distinct values
{ "$group": {
"_id": "$_id",
"tcount": { "$first": "$tcount" },
"tags": { "$addToSet": "$ttags" }
}},
// Optionally $project to get the fields out of the _id key
{ "$project": {
"_id": 0,
"Host": "$_id.Host",
"ArtId": "$_id.ArtId",
"count": "$tcount",
"tags": "$ttags"
}}
])
That final bit with $project is also there because I used "temporary" names for each of the fields in other stages of the aggregation pipeline. This is because there is an optimization in $project that "copies" the fields from an existing stage in the order they already appeared "before" any "new" fields are added to the document.
Otherwise the output would look like:
{ "count":2 , "tags":[ "tag1", "tag2", "tag3" ], "Host": "abc.com", "ArtId": "123" }
Where the fields are not in the same order as you might think. Trivial really, but it matters to some people, so worth explaining why, and how to handle.
So $unwind does the work to keep the items separated and not in arrays, and doing the $group first allows you to get the "count" of the occurrences of the "grouping" key.
The $first operator used later "keeps" that "count" value, as it just got "duplicated" for every value present in the "tags" array. It's all the same value anyway so it does not matter. Just pick one.
I have a MongoDB collection indicators/
It returns statistical data such as:
/indicators/population
{
id: "population"
data : [
{
country : "A",
value : 100
},
{
country : "B",
value : 150
}
]
}
I would like to be able to limit the response to specific countries.
MongoDB doesn't seem to support this, so should I:
Restructure the MongoDB collection setup to allow this via native find()
Extend my API so that it allows filtering of the data array before returning to client
Other?
This is actually a very simple operation that just involves "projection" using the positional $ operator in order to match a given condition. In the case of a "singular" match that is:
db.collection.find(
{ "data.country": "A" },
{ "data.$": 1 }
)
And that will match the first element in the array which matches the condition as given in the query.
For more than one match, you need to invoke the aggregation framework for MongoDB:
db.collection.agggregate([
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Unwind the array to "de-normalize" the documents
{ "$unwind": "$data" },
// Actually filter the now "expanded" array items
{ "$match": {
"data.country": "A"
}},
// Group back together
{ "$group": {
"_id": "$_id",
"data": { "$push": "$data" }
}}
])
Or with MongoDB 2.6 or greater, a little bit cleaner, or at least without the $unwind:
db.collection.aggregate({
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Filter out the array in place
{ "$project": {
"data": {
"$setDifference": [
{
"$map": {
"input": "$data",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.country", "A" },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
If my understanding of the problem is ok, then you can use :
db.population.find({"population.data.country": {$in : ["A", "C"]}});
Let's say we have records of following structure in database.
{
"_id": 1234,
"tags" : [ "t1", "t2", "t3" ]
}
Now, I want to check if database contains a record with any of the tags specified in array tagsArray which is [ "t3", "t4", "t5" ]
I know about $in operator but I not only want to know whether any of the records in database has any of the tag specified in tagsArray, I also want to know which tag of the record in database matches with any of the tags specified in tagsArray. (i.e. t3 in for the case of record mentioned above)
That is, I want to compare two arrays (one of the record and other given by me) and find out the common element.
I need to have this expression along with many expressions in the query so projection operators like $, $elematch etc won't be of much use. (Or is there a way it can be used without having to iterate over all records?)
I think I can use $where operator but I don't think that is the best way to do this.
How can this problem be solved?
There are a few approaches to do what you want, it just depends on your version of MongoDB. Just submitting the shell responses. The content is basically JSON representation which is not hard to translate for DBObject entities in Java, or JavaScript to be executed on the server so that really does not change.
The first and the fastest approach is with MongoDB 2.6 and greater where you get the new set operations:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tagMatch": {
"$setIntersection": [
"$tags",
test
]
},
"sizeMatch": {
"$size": {
"$setIntersection": [
"$tags",
test
]
}
}
}},
{ "$match": { "sizeMatch": { "$gte": 1 } } },
{ "$project": { "tagMatch": 1 } }
])
The new operators there are $setIntersection that is doing the main work and also the $size operator which measures the array size and helps for the latter filtering. This ends up as a basic comparison of "sets" in order to find the items that intersect.
If you have an earlier version of MongoDB then this is still possible, but you need a few more stages and this might affect performance somewhat depending if you have large arrays:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tags": 1,
"match": { "$const": test }
}},
{ "$unwind": "$tags" },
{ "$unwind": "$match" },
{ "$project": {
"tags": 1,
"matched": { "$eq": [ "$tags", "$match" ] }
}},
{ "$match": { "matched": true }},
{ "$group": {
"_id": "$_id",
"tagMatch": { "$push": "$tags" },
"count": { "$sum": 1 }
}}
{ "$match": { "count": { "$gte": 1 } }},
{ "$project": { "tagMatch": 1 }}
])
Or if all of that seems to involved or your arrays are large enough to make a performance difference then there is always mapReduce:
var test = [ "t3", "t4", "t5" ];
db.collection.mapReduce(
function () {
var intersection = this.tags.filter(function(x){
return ( test.indexOf( x ) != -1 );
});
if ( intersection.length > 0 )
emit ( this._id, intersection );
},
function(){},
{
"query": { "tags": { "$in": test } },
"scope": { "test": test },
"output": { "inline": 1 }
}
)
Note that in all cases the $in operator still helps you to reduce the results even though it is not the full match. The other common element is checking the "size" of the intersection result to reduce the response.
All pretty easy to code up, convince the boss to switch to MongoDB 2.6 or greater if you are not already there for the best results.
I have a collection which contains documents with multiple arrays. These are generally quite large, but for purposes of explaining you can consider the following two documents:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
]
},
{
"obj1": [
{ "a": "c", "b": "b" }
],
"obj2": [
{ "a": "c", "b": "c" }
]
}
The idea is to just get the matching elements in the array to the query. There are multiple matches required and within multiple arrays so this is not within the scope of what can be done with projection and the positional $ operator. The desired result would be like:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "b" },
]
},
A traditional approach would be something like this:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$unwind": "$obj1" },
{ "$match": {
"obj1.a": "a",
"obj1.b": "b"
}},
{ "$unwind": "$obj2" },
{ "$match": { "obj2.b": "b" }},
{ "$group": {
"_id": "$_id",
"obj1": { "$addToSet": "$obj1" },
"obj2": { "$addToSet": "$obj2" }
}}
])
But the use of $unwind there for both arrays causes the overall set to use a lot of memory and slows things down. There are also possible problems there with $addToSet and splitting the $group stages for each array can make things even slower.
So I am looking for a process that is not so intensive but arrives at the same result.
Since MongoDB 3.0 we have the $filter operator, which makes this really quite simple:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$filter": {
"input": "$obj1",
"as": "el",
"cond": {
"$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]
}
}
},
"obj2": {
"$filter": {
"input": "$obj2",
"as": "el",
"cond": { "$eq": [ "$$el.b", "b" ] }
}
}
}}
])
MongoDB 2.6 introduces the $map operator which can act on arrays in place without the need to $unwind. Combined with some other logical operators and additional set operators that have been added to the aggregation framework there is a solution to this problem and others.
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$setDifference": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]},
"$$el",
false
]
}
}},
[false]
]
},
"obj2": {
"$setDifference": [
{ "$map": {
"input": "$obj2",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.b", "b" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
The core of this is in the $map operator which works like an and internalized $unwind by allowing processing of all the array elements, but also allows operations to act on those array elements in the same statement. Typically this would be done in several pipeline stages but here we can process within a single $project, $group or $redact stage.
In this case that inner processing utilizes the $cond operator which combines with a logical condition in order to return a different result for true or false. Here we act on usage of the $eq operator to test values of the fields contained within the current element in much the same way as a separate $match pipeline stage would be used. The $and condition is another logical operator which works on combining the results of multiple conditions on the element, much in the same way as the $elemMatch operator would work within a $match pipeline stage.
Finally, since our $cond operator was used to either return the value of the current element or false if the condition was not true we need to "filter" any false values from the array produced my the $map operation. The is where the $setDifference operator is used to compare the two input arrays and return the difference. So when compared to an array that only contains false for it's element, the result will be the elements that were returned from the $map without the false elements coming out of $cond when the conditions were not met.
The result filters only the matching elements from the array without having to run through seperate pipeline stages for $unwind, $match and $group.
return more then one match,
const { timeSlots } = req.body;
let ts = [];
for (const slot of timeSlots) {
ts.push({
$eq: ['$$timeSlots.id',slot.id],
});
}
const products = await Product.aggregate<ProductDoc>([
{
$match: {
_id: req.params.productId,
recordStatus: RecordStatus.Active,
},
},
{
$project: {
timeSlots: {
$filter: {
input: '$timeSlots',
as: 'timeSlots',
cond: {
$or: ts,
},
},
},
name: 1,
mrp: 1,
},
},
]);