Mongodb many to many relations among sub-documents - mongodb

TL;DR: Imagine first $match stage gave you several documents, but you want to refine them inside, like $redact does. But the problem is your sub-documents have relations, and you want to make $where like checks among them. How can one accomplish that? I cannot $unwind, because it is causing performance problems, (1.5 mb of document with 5 times 1000 array length, a single unwind is causing 1000x~1mb documents).
My schema looks like:
{
userName: "user44",
userID: "44",
posts : [
...
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
...
],
comments: [
...
{
id: "1910",
postId : "123",
commentTitle : "comment1",
comment : "some comment",
user: "user13"
},
{
id: "1911",
postId : "124",
title : "comment2",
commentTitle : "some comment",
user: "user22"
},
{
id: "1912",
postId : "124",
title : "comment2",
commentTitle : "some comment",
user: "user22"
},
...
],
commentUpvotes: [
...
{
id : 12,
commentId : "1910",
upvotedBy: "user91",
upvoteDate: 1000,
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92",
upvoteDate: 2000
},
{
id: 14,
commentId : "1911",
upvotedBy: "user92",
upvoteDate: 2100
},
...
]
}
Although this has nothing to do with my database, original schema is exactly as above. So, this example above is a user collection, where I store posts of the user; comments that made to the posts by other users, commentUpvotes to store information about who upvoted. Don't think about the logic of its design & contents; I made them up, and please don't suggest any other schema.
Question: I am looking a way to find posts and comments which has upvoted after a specific date, such
db.users.find("commentUpvotes.upvoteDate" : {$gte:0})
and result:
{
"_id" : ObjectId("539065d3cd0f2aac5f55778e"),
"posts" : [
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
],
"comments" : [
{
id: 1910,
postId : "123",
title : "comment1",
comment : "some comment",
user: "user13"
},
{
id: 1911,
postId : "124",
title : "comment2",
comment : "some comment",
user: "user22"
},
],
"commentUpVotes" : [
{
id : 12,
commentId : "1910",
upvotedBy: "user91",
upvoteDate: 1000,
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92",
upvoteDate: 2000
},
{
id: 14,
commentId : "1911",
upvotedBy: "user92",
upvoteDate: 2100
}
]
}
NOTE: It is a post-question, and former one can be found here. I wanted to extend it a bit in this one.

I let this sit for a while as I did comment to you on the last question what the basic process would be to do this. I also commented that $redact is not the animal for this type of operation, and for reasons that are two involved to explain in addition to the answer here. Suffice to say that you know the filtered values and not just filter them.
So much as was given before, you still need some usage of $unwind but rather than the traditional usage that can blow out the number of documents to process in the pipeline, it is only being used after the array contents have been filtered. The only real differences here are that we are mindful that the "filtered array" actually is going to contain more than one element, so you handle it appropriately:
db.users.aggregate([
{ "$match": {
"commentUpvotes.upvoteDate": { "$gte": 0 }
}},
{ "$project": {
"posts": 1,
"comments": 1,
"commentUpVotes": {
"$setDifference": [
{
"$map": {
"input": "$commentUpvotes",
"as": "el",
"in": {
"$cond": [
{ "$gte": [ "$$el.upvoteDate", 0 ] },
"$$el",
false
]
}
}
},
[false]
]
}
}},
{ "$project": {
"posts": 1,
"comments": 1,
"kcommentUpVotes": "$commentUpVotes",
"commentUpVotes": 1
}},
{ "$unwind": "$commentUpVotes" },
{ "$project": {
"posts": 1,
"comments": {
"$setDifference": [
{
"$map": {
"input": "$comments",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
{ "$substr": [ "$$el.id", 0, 4 ] },
"$commentUpVotes.commentId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"commentUpVotes": "$kcommentUpVotes"
}},
{ "$unwind": "$comments" },
{ "$group": {
"_id": "$_id",
"posts": { "$first": "$posts" },
"comments": { "$addToSet": "$comments" },
"kcomments": { "$addToSet": "$comments" },
"commentUpVotes": { "$first": "$commentUpVotes" }
}},
{ "$unwind": "$comments" },
{ "$project": {
"posts": {
"$setDifference": [
{
"$map": {
"input": "$posts",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
"$$el.id",
"$comments.postId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"comments": "$kcomments",
"commentUpVotes": 1
}},
{ "$unwind": "$posts" },
{ "$group": {
"_id": "$_id",
"posts": { "$addToSet": "$posts" },
"comments": { "$first": "$comments" },
"commentUpVotes": { "$first": "$commentUpVotes" }
}}
])
So there is a point here to understand exactly what each stage ( or the repeated process ) is doing and why the $unwind operations here are important.
If you take the first $project here into consideration, the result returned is always going to be an array. This is how the "filtering" with $map works, and makes perfect sense as you are expecting the possibility of several ( in this example, all ) matches.
The important part takes place before you try to match those values against another array in your document as when you look at the anatomy of a $map the point is to compare the element against a singular value. This is why you need to $unwind in order to get at those "singular" values to compare.
So aside from keeping a copy of the "filtered" array to make things a bit cleaner, let's skip down to the part after matching against the "comments" array. Since the "commentUpvotes" array was "unwound" there is a copy for each document now with it's own filtered version of the array there. Noting that each result array can only contain a single element.
As these are indeed arrays, in order to combine them between documents you need to unwind these "single element" arrays and then group them back together. Keep in mind here that while there were "three" matches for "commentUpvotes" there are only "two" comments that would match but of the "three" matches "two" of them share the same id. This is where grouping back using $addToSet becomes important as you do not want to duplicate that matching post.
Once all the matched elements are in the array it's time again to $unwind and repeat.
So the general premise remains the same as the previous example and question. Indeed the approach here can be considered a "version 2.0" of the previous listing as it will cater for singular and "many" matches in all cases.
The one "caveat" to mention here is the basic principle that these items are indeed related and there is no "orphaned" detail in any of the arrays. The obvious reason for this that anything that was tested to match from one array to the other that did not match would result in an empty array. There may be other matches, but if one of those tests came up empty then you would have to handle the empty array that was produced.
The concept of that final note is easy by simply testing the $size of the result and otherwise entering a singe value of false and filtering that out at a later stage. But for the purpose of the exercise I am considering that your "relationships" are indeed intact, and leaving any additional handling up to your own implementation.
The end result is of course you get the desired result without resorting to the same level of "blowout" by simply unwinding the unfiltered arrays against each other and trying to match equality against those records.

I found a way to make it work without using $unwind and with $redact + $$ROOT. As you know $redact scans the document from parent to child, therefore to make comparison among subdocuments I needed to use $$ROOT.
Since it is only processed inside of the document, I believe this is the most efficient way. I will still be glad if some propose better ways. There are not many resources on $redact, and I believe the code below still can be improved:
// first query match
{
"$match": {
"commentUpvotes.upvoteDate": {
"$gte": 0
}
}
},
// exclude commentUpvotes
{
$redact: {
$cond: {
if: {
$or: [
{
$gte: [
"$upvoteDate",
0
]
},
{
$not: "$upvoteDate"
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
// exclude comments
{
$redact: {
$cond: {
if: {
$or: [
{
$not: "$postId"
},
{
$anyElementTrue: { $map: {
input: "$$ROOT.commentUpvotes",
as: "el",
in: { $cond: { if: { $eq: [ "$$el.commentId", "$id" ] },
then: true, else: false
}
}
}
}
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
// exclude posts
{
$redact: {
$cond: {
if: {
$or: [
{
$not: "$title"
},
{
$anyElementTrue: {
$map: {
input: "$$ROOT.comments",
as: "el",
in: {
$cond: {
if: {
$eq: [
"$$el.postId",
"$id"
]
},
then: true,
else: false
}
}
}
}
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}

Related

Retrieve specific element of a nested document

Just cannot figure this out. This is the document format from a MongoDB of jobs, which is derived from an XML file the layout of which I have no control over:
{
"reference" : [ "93417" ],
"Title" : [ "RN - Pediatric Director of Nursing" ],
"Description" : [ "...a paragraph or two..." ],
"Classifications" : [
{
"Classification" : [
{
"_" : "Nurse / Midwife",
"name" : [ "Category" ]
},
{
"_" : "FL - Jacksonville",
"name" : [ "Location" ],
},
{
"_" : "Permanent / Full Time",
"name" : [ "Work Type" ],
},
{
"_" : "Some Health Care Org",
"name" : [ "Company Name" ],
}
]
}
],
"Apply" : [
{
"EmailTo" : [ "jess#recruiting.co" ]
}
]
}
The intention is to pull a list of jobs from the DB, to include 'Location', which is buried down there as the second document at 'Classifications.Classification._'.
I've tried various 'aggregate' permutations of $project, $unwind, $match, $filter, $group… but I don't seem to be getting anywhere. Experimenting with just retrieving the company name, I was expecting this to work:
db.collection(JOBS_COLLECTION).aggregate([
{ "$project" : { "meta": "$Classifications.Classification" } },
{ "$project" : { "meta": 1, _id: 0 } },
{ "$unwind" : "$meta" },
{ "$match": { "meta.name" : "Company Name" } },
{ "$project" : { "Company" : "$meta._" } },
])
But that pulled everything for every record, thus:
[{
"Company":[
"Nurse / Midwife",
"TX - San Antonio",
"Permanent / Full Time",
"Some Health Care Org"
]
}, { etc etc }]
What am I missing, or misusing?
Ideally with MongoDB 3.4 available you would simply $project, and use the array operators of $map, $filter and $reduce. The latter to "compact" the arrays and the former to to extract the relevant element and detail. Also $arrayElemAt takes just the "element" from the array(s):
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": {
"$reduce": {
"input": "$Classifications.Classification",
"initialValue": [],
"in": {
"$concatArrays": [ "$$value", "$$this" ]
}
}
},
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
Or even skip the $reduce which is merely applying the $concatArrays to "merge" and simply grab the "first" array index ( since there is only one ) using $arrayElemAt:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$arrayElemAt": [
{ "$map": {
"input": {
"$filter": {
"input": { "$arrayElemAt": [ "$Classifications.Classification", 0 ] },
"as": "c",
"cond": { "$eq": [ "$$c.name", ["Location"] ] }
}
},
"as": "c",
"in": "$$c._"
}},
0
]
}
}}
])
That makes the operation compatible with MongoDB 3.2, which you "should" be running at least.
Which in turn allows you to consider alternate syntax for MongoDB 3.4 using $indexOfArray based on the initial input variable of the "first" array index using $let to somewhat shorten the syntax:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": {
"_id": 0,
"output": {
"$let": {
"vars": {
"meta": {
"$arrayElemAt": [
"$Classifications.Classification",
0
]
}
},
"in": {
"$arrayElemAt": [
"$$meta._",
{ "$indexOfArray": [
"$$meta.name", [ "Location" ]
]}
]
}
}
}
}}
])
If indeed you consider that to be "shorter", that is.
In the other sense though, much like above there is an "array inside and array", so in order to process it, you $unwind twice, which is effectively what the $concatArrays inside $reduce is countering in the ideal case:
db.collection(JOBS_COLLECTION).aggregate([
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$unwind": "$Classifications" },
{ "$unwind": "$Classifications.Classification" },
{ "$match": { "Classifications.Classification.name": "Location" } },
{ "$project": { "_id": 0, "output": "$Classifications.Classification._" } }
])
All statements actually produce:
{
"output" : "FL - Jacksonville"
}
Which is the matching value of "_" in the inner array element for the "Location" as selected by your original intent.
Keeping in mind of course that all statements really should be preceded with the relevant [$match]9 statement as shown:
{ "$match": { "Classifications.Classification.name": "Location" } },
Since without that you would be possibly processing documents unnecessarily, which did not actually contain an array element matching that condition. Of course this may not be the case due to the nature of the documents, but it's generally good practice to make sure the "initial" selection always matches the conditions of details you later intend to "extract".
All of that said, even if this is the result of a direct import from XML, the structure should be changed since it does not efficiently present itself for queries. MongoDB documents do not work how XPATH does in terms of issuing queries. Therefore anything "XML Like" is not going to be a good structure, and if the "import" process cannot be changed to a more accommodating format, then there should at least be a "post process" to manipulate this into a separate storage in a more usable form.

Get Distinct list of two properties using MongoDB 2.4

I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }

How to find document and single subdocument matching given criterias in MongoDB collection

I have collection of products. Each product contains array of items.
> db.products.find().pretty()
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "01.02.2100",
"purchasePrice" : 1,
"sellingPrice" : 10,
"count" : 15
},
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
So, can you please give me an advice, how I can query MongoDB to retrieve all products with only single item which date is equals to the date I pass to query as parameter.
The result for "31.08.2014" must be:
{
"_id" : ObjectId("54023e8bcef998273f36041d"),
"shop" : "shop1",
"name" : "product1",
"items" : [
{
"date" : "31.08.2014",
"purchasePrice" : 10,
"sellingPrice" : 1,
"count" : 5
}
]
}
What you are looking for is the positional $ operator and "projection". For a single field you need to match the required array element using "dot notation", for more than one field use $elemMatch:
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
)
Or the $elemMatch for more than one matching field:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "shop": 1, "name":1, "items.$": 1 }
)
These work for a single array element only though and only one will be returned. If you want more than one array element to be returned from your conditions then you need more advanced handling with the aggregation framework.
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$unwind": "$items" },
{ "$match": { "items.date": "31.08.2014" } },
{ "$group": {
"_id": "$_id",
"shop": { "$first": "$shop" },
"name": { "$first": "$name" },
"items": { "$push": "$items" }
}}
])
Or possibly in shorter/faster form since MongoDB 2.6 where your array of items contains unique entries:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$project": {
"shop": 1,
"name": 1,
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.date", "31.08.2014" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
Or possibly with $redact, but a little contrived:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$redact": {
"$cond": [
{ "$eq": [ { "$ifNull": [ "$date", "31.08.2014" ] }, "31.08.2014" ] },
"$$DESCEND",
"$$PRUNE"
]
}}
])
More modern, you would use $filter:
db.products.aggregate([
{ "$match": { "items.date": "31.08.2014" } },
{ "$addFields": {
"items": {
"input": "$items",
"cond": { "$eq": [ "$$this.date", "31.08.2014" ] }
}
}}
])
And with multiple conditions, the $elemMatch and $and within the $filter:
db.products.aggregate([
{ "$match": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ "$addFields": {
"items": {
"input": "$items",
"cond": {
"$and": [
{ "$eq": [ "$$this.date", "31.08.2014" ] },
{ "$eq": [ "$$this.purchasePrice", 1 ] }
]
}
}
}}
])
So it just depends on whether you always expect a single element to match or multiple elements, and then which approach is better. But where possible the .find() method will generally be faster since it lacks the overhead of the other operations, which in those last to forms does not lag that far behind at all.
As a side note, your "dates" are represented as strings which is not a very good idea going forward. Consider changing these to proper Date object types, which will greatly help you in the future.
Based on Neil Lunn's code I work with this solution, it includes automatically all first level keys (but you could also exclude keys if you want):
db.products.find(
{ "items.date": "31.08.2014" },
{ "shop": 1, "name":1, "items.$": 1 }
{ items: { $elemMatch: { date: "31.08.2014" } } },
)
With multiple requirements:
db.products.find(
{ "items": {
"$elemMatch": { "date": "31.08.2014", "purchasePrice": 1 }
}},
{ items: { $elemMatch: { "date": "31.08.2014", "purchasePrice": 1 } } },
)
Mongo supports dot notation for sub-queries.
See: http://docs.mongodb.org/manual/reference/glossary/#term-dot-notation
Depending on your driver, you want something like:
db.products.find({"items.date":"31.08.2014"});
Note that the attribute is in quotes for dot notation, even if usually your driver doesn't require this.

select documents with sub arrays that match some critieria

I have a collections with documents such as:
{
_id: "1234",
_class: "com.acme.classA",
a_collection: [
{
otherdata: 'somedata',
type: 'a'
},
{
otherdata: 'bar',
type: 'a'
},
{
otherdata: 'foo',
type: 'b'
}
],
lastChange: ISODate("2014-08-17T22:25:48.918Z")
}
I want to find all document by id and a subset of the sub array. for example I want to find all documents with id "1234" and a_collection.type is 'a' giving this result:
{
_id: "1234",
_class: "com.acme.classA",
a_collection: [
{
otherdata: 'somedata',
type: 'a'
},
{
otherdata: 'bar',
type: 'a'
}
],
lastChange: ISODate("2014-08-17T22:25:48.918Z")
}
I have tried this :
db.collection_name.aggregate({
$match: {
'a_collection.type': 'a'
}
},
{
$unwind: "$a_collection"
},
{
$match: {
"a_collection.type": 'a'
}
},
{
$group: {
_id: "$_id",
a_collection: {
$addToSet: "$a_collection"
},
}
}).pretty()
but this doesnt return other properties ( such as 'lastChange' )
what is the correct way to do this ?
Are you using PHP?
And is this the only way you can get the "text"?
maybe you can rewrite it that it is like an JSON element.
something like that:
{
"_id": "1234",
"_class": "com.acme.classA",
"a_collection": [
{
"otherdata": "somedata",
"type": "a"
},
{
"otherdata": "bar",
"type": "a"
},
{
"otherdata": "foo",
"type": "b"
}
]
}
Then you can use the json_decode() function from PHP to make an array and then you can search and return only the needed data.
Edit: I read read false. do you search for a funktion like this?
db.inventory.find( {
$or: [ { _id: "1234" }, { 'a_collection.type': 'a' }]
} )
[Here][1] I found the code ;) [1]: http://docs.mongodb.org/manual/tutorial/query-documents/
this is the correct query:
db.collection_name.aggregate({
$match: {
'a_collection.type': 'a'
}
},
{
$unwind: "$a_collection"
},
{
$match: {
"a_collection.type": 'a'
}
},
{
$group: {
_id: "$_id",
a_collection: {
$addToSet: "$a_collection"
},
lastChange : { $first : "$lastChange" }
}
}).pretty()
Something is very strange about your desired query (and your pipelines). First of all, _id is a reserved field with a unique index on it. The result of finding all documents with _id = "1234" can only be 0 or 1 documents. Second, to find documents with a_collection.type = "a" for some element of the array a_collection, you don't need the aggregation framework. You just need a find query:
> db.test.find({ "a_collection.type" : "a" })
So all the work here appears to be winnowing the subarray of one document down to just those elements with a_collection.type = "a". Why do you have these objects in the same document if most of what you do is split them up and eliminate some to find a result set? How common and how truly necessary is it to harvest just the array elements with a_collection.type = "a"? Perhaps you want to model your data differently so a query like
> db.test.find({ <some condition>, "a_collection.type" : "a" })
returns you the correct documents. I can't say how you can do it best with the given information, but I can say that your current approach strongly suggests revision is needed (and I'm happy to help with suggestions if you include further information or post a new question).
I would agree with the answer you have submitted yourself, but for that in MongoDB 2.6 and greater there is a better way to do this with $map and $setDifference. Which wer both introduced at that version. But where available, this is much faster in the approach:
db.collection.aggregate([
{ "$match": { "a_collection.type": "a" } },
{ "$project": {
"$setDifference": [
{ "$map": [
"input": "$a_collection",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "a" ] },
"$$el",
false
]
}
]},
[false]
]
}}
])
So that has no "group" or initial "unwind" which both can be costly options, along with the $match stage. So MongoDB 2.6 does it better.

Mongodb 1to1 relation among subdocuments

I have a huge collection where each document has subdocuments that have relations among them. My schema looks like this:
{
userName: "user44",
userID: "44",
posts : [
...
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
...
],
comments: [
...
{
id: 1910,
postId : "123",
title : "comment1",
comment : "some comment",
user: "user13"
},
{
id: 1911,
postId : "124",
title : "comment2",
comment : "some comment",
user: "user22"
},
...
],
commentUpvotes: [
...
{
id : 12,
commentId : "1910",
upvotedBy: "user91"
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92"
},
...
]
}
Although this has nothing to do with my database, original schema is exactly as above. So, this example above is a user collection, where I store posts of the user; comments that made to the posts by other users, commentUpvotes to store information about who upvoted. Don't think about the logic of its design, and don't please suggest any other schema.
Question: db.users.find({"commentUpvotes.id" : 12}) should return this collection, but only with the comment(1910) and post(123) that this upvote made to. I solved it with $unwinding which caused performance problems. Therefore please suggest to solve it without unwinding. Any ideas on that?
Considering the "indentation" I am using in the listing, this may actually look longer than what you are doing, but really it isn't.
This is a another really good example of using $map as available to MongoDB 2.6 and greater. There is still some use of $unwind, but the arrays being "unwound" actually only ever have one element in them. So please forgive my "Highlander" references which I could not resist :)
db.users.aggregate([
// Match your document or documents
{ "$match": {
"commentUpvotes.id": 12
}},
// Get the one "up-votes" entry that matches
{ "$project": {
"posts": 1,
"comments": 1,
"commentUpVotes": {
"$setDifference": [
{
"$map": {
"input": "$commentUpvotes",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.id", 12 ] },
"$$el",
false
]
}
}
},
[false]
]
}
}},
// There is only one!
{ "$unwind": "$commentUpVotes" },
// Get the one comments entry that matches
{ "$project": {
"posts": 1,
"comments": {
"$setDifference": [
{
"$map": {
"input": "$comments",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
{ "$substr": [ "$$el.id", 0, 4 ] },
"$commentUpVotes.commentId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"commentUpVotes": 1
}},
// And there is only one!
{ "$unwind": "$comments" },
// Get the one post that matches
{ "$project": {
"posts": {
"$setDifference": [
{
"$map": {
"input": "$posts",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
"$$el.id",
"$comments.postId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"comments": 1,
"commentUpVotes": 1
}},
// Optionally group back to arrays. There can be only one!
{ "$group": {
"_id": "$_id",
"posts": { "$first": "$posts" },
"comments": { "$push": "$comments" },
"commentUpVotes": { "$push": "$commentUpVotes" }
}}
])
So the end result would be:
{
"_id" : ObjectId("539065d3cd0f2aac5f55778e"),
"posts" : [
{
"title" : "post1",
"id" : "123"
}
],
"comments" : [
{
"id" : 1910,
"postId" : "123",
"title" : "comment1",
"comment" : "some comment",
"user" : "user13"
}
],
"commentUpVotes" : [
{
"id" : 12,
"commentId" : "1910",
"upvotedBy" : "user91"
}
]
}
I know you asked for "no schema changes", but not really a schema change to say it is a good idea to keep your id values here of a consistent type. Currently you are mixing Integers and strings in this process ( I hope it's just an example ) which is not a good idea.
Thus there is some "limited casting" that actually is available employed here using $substr, however your actual solution may vary in how to really do this. I strongly suggest fixing the data if it really does need fixing.
At any rate, a pretty cool usage for $map