Project field defined by another field's value - mongodb

I have a document structured like so:
mode: "b",
a: [0,1,2],
b: [1,4,5],
c: [2,2]
And I want to project the field that equals mode. The end result should be something like:
data: [1,4,5] // since mode == "b", it returns b's value
I tried $$CURRENT[$mode], but it looks like you can't use brackets like that in mongo. I tried using a local variable like so:
$let: {
vars: {mode: "$mode"},
in: "$$CURRENT.$$mode"
}
but that doesn't work either. I'm considering using $switch and then manually putting in all the possible modes. But I'm wondering if there is a better way to do it.

You are looking in the wrong place, but if you can use $switch then you have MongoDB 3.4 and you can use $objectToArray which is actually the correct thing to do. Your problem is you are trying to "dynamicaly" refer to a property by the "value" of it's "key name". You cannot do that, so $objectToArray makes the "key" a "value"
So given your document:
{ "mode": "a", "a": [0,1,2], "b": [1,4,5], "c": [2,2] }
Then you do the aggregate, using $map and $filter to work with the converted elements as an array:
db.sample.aggregate([
{ "$project": {
"_id": 0,
"mode": 1,
"data": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$eq": ["$$this.k","$mode"] }
}
},
"in": "$$this.v"
}},
0
]
}
}}
])
Or using $let and $indexOfArray if that seems more sensible to you:
db.sample.aggregate([
{ "$project": {
"_id": 0,
"mode": 1,
"data": {
"$let": {
"vars": { "doc": { "$objectToArray": "$$ROOT" } },
"in": {
"$arrayElemAt": [
"$$doc.v",
{ "$indexOfArray": [ "$$doc.k", "$mode" ] }
]
}
}
}
}}
])
Which matches the selected field:
{
"mode" : "a",
"data" : [
0.0,
1.0,
2.0
]
}
If you look at "just" what $objectToArray is doing here, then the reasons should be self evident:
{
"data" : [
{
"k" : "_id",
"v" : ObjectId("597915787dcd6a5f6a9b4b98")
},
{
"k" : "mode",
"v" : "a"
},
{
"k" : "a",
"v" : [
0.0,
1.0,
2.0
]
},
{
"k" : "b",
"v" : [
1.0,
4.0,
5.0
]
},
{
"k" : "c",
"v" : [
2.0,
2.0
]
}
]
}
So now instead of there being an "object" with named properties, the "array" consistently contains "k" named as the "key" and "v" containing the "value". This is easy to $filter and obtain the desired results, or basically use any method that works with arrays to obtain the match.

Related

Break Down Multiple Arrays to Documents with Comparison

I'd like to break down a record into new multiple records in mongo. How to do it?
Current data:
{"user_id": 123, "scores": [65, 71, 79, 80], "materials": ["A", "B", "C", "D"]}
And I want to create the data below from above one:
{"user_id": 123, "score_original": 65, "score_improvement": 6, "material": "A"},
{"user_id": 123, "score_original": 71, "score_improvement": 8, "material": "B"},
{"user_id": 123, "score_original": 79, "score_improvement": 1, "material": "C"}
The question really could do with some clarification, but if your intention is to expand each item with a comparison to each "next" array element and where the arrays are always of equal length in direct comparison, then there are a couple of approaches.
That is there is a "simple way" ( at the end ) and more complex ways, depending on your needs. So to step through them so you understand what is involved from each:
Agregate
With a modern MongoDB release you can use the aggregation framework to merge and compare the array elements and then expand into new items like so:
db.getCollection('junk').aggregate([
{ "$project": {
"user_id": 1,
"data": {
"$map": {
"input": {
"$slice": [
{ "$zip": {
"inputs": [
"$scores",
{ "$map": {
"input": {
"$reverseArray": {
"$reduce": {
"input": { "$reverseArray": "$scores" },
"initialValue": [],
"in": {
"$concatArrays": [
"$$value",
[
[
"$$this",
{ "$subtract": [
{ "$arrayElemAt": [
{ "$ifNull": [{ "$arrayElemAt": ["$$value", -1] }, [] ]},
0
]},
"$$this"
]}
]
]
]
}
}
}
},
"in": { "$arrayElemAt": ["$$this",-1] }
}},
"$materials"
]
}},
{ "$subtract": [{ "$size": "$scores" },1] }
]
},
"in": {
"score_original": { "$arrayElemAt": [ "$$this", 0 ] },
"score_improvement": { "$arrayElemAt": [ "$$this", 1 ] },
"material": { "$arrayElemAt": [ "$$this", 2 ] }
}
}
}
}},
{ "$unwind": "$data" },
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$concatArrays": [
[{ "k": "user_id", "v": "$user_id" }],
{ "$objectToArray": "$data" }
]
}
}
}}
])
Which returns the desired result:
/* 1 */
{
"user_id" : 123.0,
"score_original" : 65.0,
"score_improvement" : 6.0,
"material" : "A"
}
/* 2 */
{
"user_id" : 123.0,
"score_original" : 71.0,
"score_improvement" : 8.0,
"material" : "B"
}
/* 3 */
{
"user_id" : 123.0,
"score_original" : 79.0,
"score_improvement" : 1.0,
"material" : "C"
}
Much of the work is done with $reduce from "reversed" array content via $reverseArray since you want to compare to the "next" item. It's generally easier to do a "last" comparison than try to work with calculated index values in the aggregation framework operations, so this is why you "reverse" it.
The basic premise for the "improvement" values is to work through the "reversed" array comparing the present value to the last in the output array and calculate the difference using $subtract. Since you need to output both the "improvement" and also need the "previous" value for comparison, which is done via $arrayElemAt along with $ifNull checks to extract the value for comparison.
These are stored in "array pairs" for output before feeding to the next operation. Naturally you $reverseArray again to maintain the original order with the new output.
Since there are now essentially "three" arrays of values, one way of "combining" these into one is $zip which would make an array of arrays for each of the elements. It's not the only way, but again it's probably a bit clearer to read than juggling index values for extraction.
Then of course you use $map to get to the final "object" form for each array entry. But not before applying $slice since the "last" array element is being discarded due to no "improvement" over it's "next" item, which does not exist. At least that's following the logic you seem to present.
The final parts are simply using $unwind to turn the array construct into separate documents, and then reshaping the final output. Here this is applied using $replaceRoot as well as the $objectToArray and $arrayToObject operators to construct a new root document without explicit naming. However this may as well just be a simple $project instead:
{ "$project": {
"user_id": 1,
"score_original": "$data.score_original",
"score_improvement": "$data.score_improvement",
"material": "$data.material"
}}
So there are different ways that can be applied both there and in the "object" construction of the array as well. It's just that the newer operators such as $objectToArray require MongoDB 3.4.4 at least. All other things can be done with MongoDB 3.4.
Aggregate Alternate
You can alternately just work with the array indexes supplied using $range where available:
db.getCollection('junk').aggregate([
{ "$project": {
"_id": 0,
"user_id": 1,
"data": {
"$map": {
"input": { "$range": [ 0, { "$subtract": [{ "$size": "$scores" }, 1] } ] },
"as": "r",
"in": {
"score_original": { "$arrayElemAt": [ "$scores", "$$r" ] },
"score_improvement": {
"$subtract": [
{ "$arrayElemAt": [ "$scores", { "$add": [ "$$r", 1 ] } ] },
{ "$arrayElemAt": [ "$scores", "$$r" ] }
]
},
"material": { "$arrayElemAt": [ "$materials", "$$r" ] }
}
}
}
}},
{ "$unwind": "$data" },
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$concatArrays": [
[{ "k": "user_id", "v": "$user_id" }],
{ "$objectToArray": "$data" }
]
}
}
}}
])
That has the same output and also follows the basic logic as shown in the following approaches.
Map Reduce
If you don't have a MongoDB 3.4 supporting the operators used, then you can always apply mapReduce and simply calculate and emit for each array value:
db.getCollection('junk').mapReduce(
function() {
for( var i=0; i < this.scores.length-1; i++ ) {
var id = this._id.valueOf() + '_' + i;
emit(id, {
"user_id": this.user_id,
"score_original": this.scores[i],
"score_improvement": this.scores[i+1] - this.scores[i],
"material": this.materials[i]
});
}
},
function() {},
{ "out": { "inline": 1 } }
)
This does have it's own specific output via the rules of mapReduce, which should be evident in the construct of the "unique" _id value to emit:
"results" : [
{
"_id" : "59e4144331be3474a2f28a92_0",
"value" : {
"user_id" : 123.0,
"score_original" : 65.0,
"score_improvement" : 6.0,
"material" : "A"
}
},
{
"_id" : "59e4144331be3474a2f28a92_1",
"value" : {
"user_id" : 123.0,
"score_original" : 71.0,
"score_improvement" : 8.0,
"material" : "B"
}
},
{
"_id" : "59e4144331be3474a2f28a92_2",
"value" : {
"user_id" : 123.0,
"score_original" : 79.0,
"score_improvement" : 1.0,
"material" : "C"
}
}
],
You should note that aside from being far less complex in implementation there is also actual no "reducer" function defined at all. Which should also lead the inevitable conclusion here.
Iterate the cursor
This really is just a basic cursor iteration and expansion you are asking for, so that is all you really need to do. Which means working from the base defined in out mapper function as a simple shell abstraction:
db.getCollection('junk').find().forEach(d => {
for (var i=0; i < d.scores.length-1; i++) {
printjson({
"user_id": d.user_id,
"score_original": d.scores[i],
"score_improvement": d.scores[i+1] - d.scores[i],
"material": d.materials[i]
})
}
})
Which gives the output as desired:
{
"user_id" : 123,
"score_original" : 65,
"score_improvement" : 6,
"material" : "A"
}
{
"user_id" : 123,
"score_original" : 71,
"score_improvement" : 8,
"material" : "B"
}
{
"user_id" : 123,
"score_original" : 79,
"score_improvement" : 1,
"material" : "C"
}
And it really is that simple.
The base lesson here is that "whilst you can" ask a database to do complicated things, unless it actually results in a significant reduction in the data load returned from the server, then the usual best case is to simply process the data in native client code instead.
Even if the presented data in the question was obtained from some other aggregation operation, it would still generally be better at this stage to simply iterate the cursor result for the final transformation.
And if the transformation where required for further aggregation operations, then by all means follow the first process. However if the data presented is actually obtained by aggregation already and there is a need to transform in further aggregation, then you probably should examine the existing aggregation process you have, since you may not even need the intermediate state with multiple arrays, which is where most of the complexity comes from.

How to Join Arrays in the Same Document?

I would like to combine the data in one collection using the IDs of the two arrays.
An example is shown below.
{
"_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
"People": [
{
"_id": 1, <--- ID
"Name": "jane"
},
{
"_id": 2, <--- ID
"Name": "Mark"
}
],
"Contents": [
{
"userID": 2, <--- People ID
"Text": "111"
},
{
"userID": 1, <--- People ID
"Text": "Hi"
}
]
}
I want to make the above document as below.
{
"_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
"People": [
{
"_id": 1,
"Name" : "Jane"
},
{
"_id": 2,
"Name": "Mark"
}
],
"Contents": [
{
"userID": 2,
"Name": "Mark", <-- Adding
"Text": "111",
},
{
"userID": 1,
"Name": "Jane", <-- Adding
"Text": "Hi",
}
]
}
I have tried various things like $lookup or $unwind of .aggregate() but I cannot get the result.
You want $map and $indexOfArray ideally:
db.collection.aggregate([
{ "$addFields": {
"Contents": {
"$map": {
"input": "$Contents",
"as": "c",
"in": {
"userID": "$$c.userID",
"Name": {
"$arrayElemAt": [
"$People.Name",
{ "$indexOfArray": [ "$People._id", "$$c.userID" ] }
]
},
"Text": "$$c.Text"
}
}
}
}}
])
Which basically grabs the value from the other array via $arrayElemAt for the matching "index" returned by $indexOfArray.
If your MongoDB needs to fall back a version without that operator, then you could use $filter instead:
db.collection.aggregate([
{ "$addFields": {
"Contents": {
"$map": {
"input": "$Contents",
"as": "c",
"in": {
"userID": "$$c.userID",
"Name": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": "$People",
"as": "p",
"cond": { "$eq": [ "$$p._id", "$$c.userID" ] }
}
},
"as": "p",
"in": "$$p.Name"
}},
0
]
},
"Text": "$$c.Text"
}
}
}
}}
])
Where basically you $filter the results down of the other array in comparison and simply return the first matching element by the 0 index with $arrayElemAt.
In either case, there is no need to "self-join" using $lookup, and that's just really unnecessary overhead best avoided.
From the document in the question you get the following:
/* 1 */
{
"_id" : ObjectId("5976fd2eb0adec0a32fa9831"),
"People" : [
{
"_id" : 1.0,
"Name" : "jane"
},
{
"_id" : 2.0,
"Name" : "Mark"
}
],
"Contents" : [
{
"userID" : 2.0,
"Name" : "Mark",
"Text" : "111"
},
{
"userID" : 1.0,
"Name" : "jane",
"Text" : "Hi"
}
]
}
Generally speaking though, there is no such reason for any aggregation operators at all, as this sort of operation is generally best left to post-processing in the cursor. In fact since you are actually "adding" data to the document to return, it's better to do modification after the document is sent over the network.
As a common idiom of the above shown as JavaScript for the shell:
db.collection.find().map( d =>
Object.assign(
d,
{
"Contents": d.Contents.map( c =>
Object.assign(c,
{ "Name": d.People.map(p => p.Name)[d.People.map(p => p._id).indexOf(c.userID)] }
)
)
}
)
)
Produces the exact same result, and is generally a bit easier on the eyes to read and interpret

Retrieve specific element of a nested document

Just cannot figure this out. This is the document format from a MongoDB of jobs, which is derived from an XML file the layout of which I have no control over:
{
"reference" : [ "93417" ],
"Title" : [ "RN - Pediatric Director of Nursing" ],
"Description" : [ "...a paragraph or two..." ],
"Classifications" : [
{
"Classification" : [
{
"_" : "Nurse / Midwife",
"name" : [ "Category" ]
},
{
"_" : "FL - Jacksonville",
"name" : [ "Location" ],
},
{
"_" : "Permanent / Full Time",
"name" : [ "Work Type" ],
},
{
"_" : "Some Health Care Org",
"name" : [ "Company Name" ],
}
]
}
],
"Apply" : [
{
"EmailTo" : [ "jess#recruiting.co" ]
}
]
}
The intention is to pull a list of jobs from the DB, to include 'Location', which is buried down there as the second document at 'Classifications.Classification._'.
I've tried various 'aggregate' permutations of $project, $unwind, $match, $filter, $group… but I don't seem to be getting anywhere. Experimenting with just retrieving the company name, I was expecting this to work:
db.collection(JOBS_COLLECTION).aggregate([
{ "$project" : { "meta": "$Classifications.Classification" } },
{ "$project" : { "meta": 1, _id: 0 } },
{ "$unwind" : "$meta" },
{ "$match": { "meta.name" : "Company Name" } },
{ "$project" : { "Company" : "$meta._" } },
])
But that pulled everything for every record, thus:
[{
"Company":[
"Nurse / Midwife",
"TX - San Antonio",
"Permanent / Full Time",
"Some Health Care Org"
]
}, { etc etc }]
What am I missing, or misusing?
Ideally with MongoDB 3.4 available you would simply $project, and use the array operators of $map, $filter and $reduce. The latter to "compact" the arrays and the former to to extract the relevant element and detail. Also $arrayElemAt takes just the "element" from the array(s):
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": {
"$reduce": {
"input": "$Classifications.Classification",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
},
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
Or even skip the $reduce which is merely applying the $concatArrays to "merge" and simply grab the "first" array index ( since there is only one ) using $arrayElemAt:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$arrayElemAt": [ "$Classifications.Classification", 0 ] },
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
That makes the operation compatible with MongoDB 3.2, which you "should" be running at least.
Which in turn allows you to consider alternate syntax for MongoDB 3.4 using $indexOfArray based on the initial input variable of the "first" array index using $let to somewhat shorten the syntax:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$let": {
"vars": {
"meta": {
"$arrayElemAt": [
"$Classifications.Classification",
0
]
}
},
"in": {
"$arrayElemAt": [
"$$meta._",
{ "$indexOfArray": [
"$$meta.name", [ "Location" ]
]}
]
}
}
}
}}
])
If indeed you consider that to be "shorter", that is.
In the other sense though, much like above there is an "array inside and array", so in order to process it, you $unwind twice, which is effectively what the $concatArrays inside $reduce is countering in the ideal case:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$unwind": "$Classifications" },
{ "$unwind": "$Classifications.Classification" },
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": { "_id": 0, "output": "$Classifications.Classification._" } }
])
All statements actually produce:
{
"output" : "FL - Jacksonville"
}
Which is the matching value of "_" in the inner array element for the "Location" as selected by your original intent.
Keeping in mind of course that all statements really should be preceded with the relevant [$match]9 statement as shown:
{ "$match": { "Classifications.Classification.name": "Location" } },
Since without that you would be possibly processing documents unnecessarily, which did not actually contain an array element matching that condition. Of course this may not be the case due to the nature of the documents, but it's generally good practice to make sure the "initial" selection always matches the conditions of details you later intend to "extract".
All of that said, even if this is the result of a direct import from XML, the structure should be changed since it does not efficiently present itself for queries. MongoDB documents do not work how XPATH does in terms of issuing queries. Therefore anything "XML Like" is not going to be a good structure, and if the "import" process cannot be changed to a more accommodating format, then there should at least be a "post process" to manipulate this into a separate storage in a more usable form.

Querying MongoDB array of similar objects

I've to work with old MongoDB where objects in one collection are structured like this.
{
"_id": ObjectId("57fdfcc7a7c81fde38b79a3d"),
"parameters": [
{
"key": "key1",
"value": "value1"
},
{
"key": "key2",
"value": "value2"
}
]
}
The problem is that parameters is an array of objects, which makes efficient querying difficult. There can be about 50 different objects, which all have "key" and "value" properties. Is it possible to make a query, where the query targets "key" and "value" inside one object? I've tried
db.collection.find({$and:[{"parameters.key":"value"}, {"parameters.value":"another value"}]})
but this query hits all the objects in parameters array.
EDIT. Nikhil Jagtiani found solution to my original question, but actually I should be able query to target multiple objects inside parameters array. E.g. check keys and values in two different objects in parameters array.
Please refer below mongo shell aggregate query :
db.collection.aggregate([
{
$unwind:"$parameters"
},
{
$match:
{
"parameters.key":"key1",
"parameters.value":"value1"
}
}
])
1) Stage 1 - Unwind : Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
2) Stage 2 - Match : Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
Without aggregation, queries will return the entire document even if one subdocument matches. This pipeline will only return the required subdocuments.
Edit: If you need to specify multiple key value pairs, what we need is $in for parameters field.
db.collection.aggregate([{$unwind:"$parameters"},{$match:{"parameters":{$in:[{ "key" : "key1", "value" : "value1"},{ "key" : "key2", "value" : "value2" }]}}}])
will match the following two pairs of key-values as subdocuments:
1) { "key" : "key1", "value" : "value1" }
2) { "key" : "key2", "value" : "value2" }
There is a $filter operator in the aggregation framework which is perfect for such queries. A bit verbose but very efficient, you can use it as follows:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$filter": {
"input": "$parameters",
"as": "item",
"cond": {
"$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]
}
}
}
}
}
])
You can also do this with more set operators in MongoDB 2.6 without using $unwind:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$project": {
"parameters": {
"$setDifference": [
{ "$map": {
"input": "$parameters",
"as": "item",
"in": {
"$cond": [
{ "$or": [
{
"$and" : [
{ "$eq": ["$$item.key", "key1"] },
{ "$eq": ["$$item.value", "val1"] }
]
},
{
"$and" : [
{ "$eq": ["$$item.key", "key2"] },
{ "$eq": ["$$item.value", "val2"] }
]
}
]},
"$$item",
false
]
}
}},
[false]
]
}
}
}
])
For a solution with MongoDB 2.4, you would need to use the $unwind operator unfortunately:
db.surveys.aggregate([
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{ "$unwind": "$parameters" },
{ "$match": {
"$and": [
{
"parameters.key": "key1",
"parameters.value": "val1"
},
{
"parameters.key": "key2",
"parameters.value": "val2"
}
]
}},
{
"$group": {
"_id": "$_id",
"parameters": { "$push": "$parameters" }
}
}
]);
Is it possible to make a query, where the query targets "key" and
"value" inside one object?
This is possible if you know which object(id) you are going to query upfront(to be given as input parameter in the find query). If that is not possible then we can try on the below approach for efficient querying.
Build an index on the parameters.key and if needed also on parameters.value. This would considerably improve the query performance.
Please see
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/core/index-multikey/

How to address arrays with mongodb Set Operators

Using the example zipcodes collection, I have a query like this:
db.zipcodes.aggregate([
{ "$match": {"state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id": { "city": "$city" }, "ZipsPerCity": {"$addToSet": "$_id"}}},
{ "$match": { "ZipsPerCity" : { "$size": 2 }}},
]).pretty()
This is just an example that looks for cities (in the state of NY and PA) that have 2 zipcodes:
{
"_id" : {
"city" : "BETHLEHEM"
},
"ZipsPerCity" : [
"18018",
"18015"
]
}
{
"_id" : {
"city" : "BEAVER SPRINGS"
},
"ZipsPerCity" : [
"17843",
"17812"
]
}
Now suppose that I want to compare "BEAVER SPRINGS" zip codes to "BETHLEHEM" zip codes, using the "$setDifference" set operator? I tried using the "$setDifference" operator in a $project operator, like this:
db.zipcodes.aggregate([
{ "$match": { "state": {"$in": ["PA","NY"]}}},
{ "$group": { "_id: {city : "$city"},"ZipsPerCity": {$addToSet: "$_id"}}},
{ "$match": { "ZipsPerCity" : { $size: 2 }}},
{ "$project": {
"int": { "$setDifference":[
"$_id.city.BETHLEHEM.ZipsPerCity",
"$_id.city.BEAVER SPRINGS.ZipsPerCity"
]}
}}
]).pretty()
That doesn't even look right, let alone produce results. No errors though.
How would you refer to a couple of arrays built using $addToSet like this, using $setDifference (or any of the set operators)?
The first thing about what you are trying to do here is that the arrays you want to compare are actually in two different documents. All of the aggregation framework operators in fact work on only one document at a time, with the exception of $group which is meant to "aggregate" documents and possibly $unwind which essentially turns one document into many.
In order to compare you would need the data to occur in one document, or at least be "paired" in some way. So there is a technique to do that:
db.zipcodes.aggregate([
{ "$match": {"state": { "$in": [ "PA","NY" ] } }},
{ "$group": {
"_id": "$city",
"ZipsPerCity": { "$addToSet": "$_id"}
}},
{ "$match": { "ZipsPerCity" : { "$size": 2 } }},
{ "$group": {
"_id": null,
"A": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BETHLEHEM" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}},
"B": { "$min": {
"$cond": [
{ "$eq": [ "$_id", "BEAVER SPRINGS" ] },
{ "city": "$_id", "ZipsPerCity": "$ZipsPerCity" },
false
]
}}
}},
{ "$project": {
"A": 1,
"B": 1,
"C": { "$setDifference": [ "$A.ZipsPerCity", "$B.ZipsPerCity" ] }
}}
])
That is a little contrived and I am well aware that the actual result set has more than two cities, but the point it to illustrate that the arrays/sets sent to the "set operators" such as $setDifference need to be in the same document.
The result here compares the "left" array with the "right" array, returning the members from the "left" that are different to the "right". Both sets are unique here with no overlap so the results should be expected:
{
"_id" : null,
"A" : {
"city" : "BETHLEHEM",
"ZipsPerCity" : [
"18018",
"18015"
]
},
"B" : {
"city" : "BEAVER SPRINGS",
"ZipsPerCity" : [
"17843",
"17812"
]
},
"C" : [
"18018",
"18015"
]
}
This is really better illustrated with actual "sets" with common members. So this document:
{ "A" : [ "A", "A", "B", "C", "D" ], "B" : [ "B", "C" ] }
Responds to $setDifference:
{ "C" : [ "A", "D" ] }
And $setEquals:
{ "C" : false }
$setIntersection:
{ "C" : [ "B", "C" ] }
$setUnion:
{ "C" : [ "B", "D", "C", "A" ] }
$setIsSubSet reversing the order to $B, $A:
{ "C" : true }
The other set operators $anyElementTrue and $allElementsTrue are likely most useful when used along with the $map operator which can re-shape arrays and evaluate conditions against each element.
A very good usage of $map is alongside $setDifference, where you can "filter" array contents without using $unwind:
db.arrays.aggregate([
{ "$project": {
"A": {
"$setDifference": [
{
"$map": {
"input": "$A",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
That can be very handy when you have a lot of results in the pipeline and you do not want to "expand" out all of those results by "unwinding" the array. But note that this is a "set" and as such only one element matching "A" is returned:
{ "A" : ["A"] }
So the things to keep in mind here are that you:
Operate only within the "same" document at a time
The results are generally "sets" and that means they are both "unique" and "un-ordered" as a result.
Overall that should be a decent run-down on what the set operators are and how you use them.