select documents with sub arrays that match some critieria - mongodb

I have a collections with documents such as:
{
_id: "1234",
_class: "com.acme.classA",
a_collection: [
{
otherdata: 'somedata',
type: 'a'
},
{
otherdata: 'bar',
type: 'a'
},
{
otherdata: 'foo',
type: 'b'
}
],
lastChange: ISODate("2014-08-17T22:25:48.918Z")
}
I want to find all document by id and a subset of the sub array. for example I want to find all documents with id "1234" and a_collection.type is 'a' giving this result:
{
_id: "1234",
_class: "com.acme.classA",
a_collection: [
{
otherdata: 'somedata',
type: 'a'
},
{
otherdata: 'bar',
type: 'a'
}
],
lastChange: ISODate("2014-08-17T22:25:48.918Z")
}
I have tried this :
db.collection_name.aggregate({
$match: {
'a_collection.type': 'a'
}
},
{
$unwind: "$a_collection"
},
{
$match: {
"a_collection.type": 'a'
}
},
{
$group: {
_id: "$_id",
a_collection: {
$addToSet: "$a_collection"
},
}
}).pretty()
but this doesnt return other properties ( such as 'lastChange' )
what is the correct way to do this ?

Are you using PHP?
And is this the only way you can get the "text"?
maybe you can rewrite it that it is like an JSON element.
something like that:
{
"_id": "1234",
"_class": "com.acme.classA",
"a_collection": [
{
"otherdata": "somedata",
"type": "a"
},
{
"otherdata": "bar",
"type": "a"
},
{
"otherdata": "foo",
"type": "b"
}
]
}
Then you can use the json_decode() function from PHP to make an array and then you can search and return only the needed data.
Edit: I read read false. do you search for a funktion like this?
db.inventory.find( {
$or: [ { _id: "1234" }, { 'a_collection.type': 'a' }]
} )
[Here][1] I found the code ;) [1]: http://docs.mongodb.org/manual/tutorial/query-documents/

this is the correct query:
db.collection_name.aggregate({
$match: {
'a_collection.type': 'a'
}
},
{
$unwind: "$a_collection"
},
{
$match: {
"a_collection.type": 'a'
}
},
{
$group: {
_id: "$_id",
a_collection: {
$addToSet: "$a_collection"
},
lastChange : { $first : "$lastChange" }
}
}).pretty()

Something is very strange about your desired query (and your pipelines). First of all, _id is a reserved field with a unique index on it. The result of finding all documents with _id = "1234" can only be 0 or 1 documents. Second, to find documents with a_collection.type = "a" for some element of the array a_collection, you don't need the aggregation framework. You just need a find query:
> db.test.find({ "a_collection.type" : "a" })
So all the work here appears to be winnowing the subarray of one document down to just those elements with a_collection.type = "a". Why do you have these objects in the same document if most of what you do is split them up and eliminate some to find a result set? How common and how truly necessary is it to harvest just the array elements with a_collection.type = "a"? Perhaps you want to model your data differently so a query like
> db.test.find({ <some condition>, "a_collection.type" : "a" })
returns you the correct documents. I can't say how you can do it best with the given information, but I can say that your current approach strongly suggests revision is needed (and I'm happy to help with suggestions if you include further information or post a new question).

I would agree with the answer you have submitted yourself, but for that in MongoDB 2.6 and greater there is a better way to do this with $map and $setDifference. Which wer both introduced at that version. But where available, this is much faster in the approach:
db.collection.aggregate([
{ "$match": { "a_collection.type": "a" } },
{ "$project": {
"$setDifference": [
{ "$map": [
"input": "$a_collection",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.type", "a" ] },
"$$el",
false
]
}
]},
[false]
]
}}
])
So that has no "group" or initial "unwind" which both can be costly options, along with the $match stage. So MongoDB 2.6 does it better.

Related

Combining data from 2 mongoDB collections into 1 document

I want to filter 2 collections and return one document.
I have 2 MongoDB collections modelled as such
Analytics_Region
_id:5ecf3445365eca3e58ff57c0,
type:"city"
name:"Toronto"
CSD:"3520005"
CSDTYPE:"C"
PR:"35"
PRNAME:"Ontario"
geometry:Object
country:"CAN"
updatedAt:2021-04-23T18:25:50.774+00:00
province:"ON"
Analytics_Region_Custom
_id:5ecbe871d8ab4ab6845c5142
geometry:Object
name:"henry12"
user:5cbdd019b9d9170007d15990
__v:0
I want to output a single collection in alphabetical order by name,
{
_id: 5ecbe871d8ab4ab6845c5142,
name: "henry12",
type: "custom",
province: null
},
{
_id:5ecf3445365eca3e58ff57c0,
name:"Toronto"
type:"city"
province:"ON",
}
Things to note: In the output, we have added a type of "custom" for every document in Analytics_Region_custom. We also add a province of "null" for every document.
So far I looked into $lookup (to fetch results from another collection) but it does not seem to work for my needs since it adds an array onto every document
You can use $unionWith
Documents will be added to the pipeline(no check for duplicates), and from those documents we will project the fields
if type is missing => custom
if province missing => null
*if those 2 have any false value, like false/0/null the old value is kept (new value only if field is missing)
Test code here
db.coll1.aggregate([
{
"$unionWith": {
"coll": "coll2"
}
},
{
"$project": {
"_id": "$_id",
"name": "$name",
"type": {
"$cond": [
{
"$ne": [
{
"$type": "$type"
},
"missing"
]
},
"$type",
"custom"
]
},
"province": {
"$cond": [
{
"$ne": [
{
"$type": "$province"
},
"missing"
]
},
"$province",
null
]
}
}
},
{
"$sort": {
"name": 1
}
}
])
$unionWith to perform union of both collections
$project to project only fields that you want
sort to sort by name field
db.orders.aggregate([
{
$unionWith: "inventory"
},
{
$project: {
_id: 1,
name: 1,
province: { $cond: { if: "$province", then: "$province", else: null } },
type: { $cond: { if: "$type", then: "$type", else: "custom" } }
}
},
{
$sort: { name: 1 }
}
])
Working example

MongoDB Aggregation: How to check if an object containing multiple properties exists in an array

I have an array of objects and I want to check if there is an object that matches multiple properties. I have tried using $in and $and but it does not work the way I want it to.
Here is my current implementation.
I have an array like
"choices": [
{
"name": "choiceA",
"id": 0,
"l": "k"
},
{
"name": "choiceB",
"id": 1,
"l": "j"
},
{
"name": "choiceC",
"id": 2,
"l": "l"
}
]
I am trying to write aggregation code that can check if there is an object that contains both "id":2 and "l":"j" properties. My current implementation checks if there is an object containing the first property then checks if there is an object containing the second one.
How can I get my desired results?
Below, see my aggregation query. The full code is here
db.poll.aggregate([
{
"$match": {
"_id": 100
}
},
{
$project: {
numberOfVotes: {
$and: [
{
$in: [
2,
"$choices.id"
]
},
{
$in: [
"j",
"$choices.l"
]
}
]
},
}
}
])
The above query returns true yet there is no object in the array both of the properties id:2 and "l":"J". I know the code works as expected. How can I get my desired results?
You want to use something like $elemMatch
db.collection.find({
choices: {
$elemMatch: {
id: 2,
l: "j"
}
}
})
MongoPlayground
EDIT
In an aggregation $project stage I would use $filter
db.poll.aggregate([
{
"$match": {
"_id": 100
}
},
{
$project: {
numberOfVotes: {
$gt: [
{
$size: {
$filter: {
input: "$choices",
as: "choice",
cond: {
$and: [
{
$eq: [
"$$choice.id",
2
]
},
{
$eq: [
"$$choice.l",
"j"
]
}
]
}
}
}
},
0
]
}
}
}
])
MongoPlayground

MongoDB - multiple queries based on condition

I have a query that looks something like this.
employees.aggregate(
[{ "$match":
{"$and": [
{"$or": [
{name : { "$regex": param, "$options":"i"}},
{title : { "$regex": param, "$options":"i"}},
]},
{ tenure : true }
]}
},
{"$sort":{experience : -1}},
{"$limit" : 100}
])
I would like to update this query to something like this.
search the employees collection where name = param and tenure = true
if data exists the sort the results by experience and limit the results to 100
if no results found then search the same collection using title and no need to sort the results.
Can someone please help with this?
You need to apply a conditional sorting in query. And another important thing is the value of $regex must have in string(" "). You can't pass like
$regex: User
you need to pass it in string.
db.hotspot.aggregate(
[{
"$match": {
"$and": [{
"$or": [{
name: {
"$regex": "SD",
"$options": "i"
}
},
{
radius: {
"$regex": "250",
"$options": "i"
}
},
]
},
{
infinite: false
}
]
}
},
{
$project: {
sort: {
$cond: {
if: {
$eq: ["$radius", 250]
},
then: "$name",
else: "$_id"
}
}
}
},
{
$sort: {
sort: 1
}
}
])

Get Distinct list of two properties using MongoDB 2.4

I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }

Mongodb many to many relations among sub-documents

TL;DR: Imagine first $match stage gave you several documents, but you want to refine them inside, like $redact does. But the problem is your sub-documents have relations, and you want to make $where like checks among them. How can one accomplish that? I cannot $unwind, because it is causing performance problems, (1.5 mb of document with 5 times 1000 array length, a single unwind is causing 1000x~1mb documents).
My schema looks like:
{
userName: "user44",
userID: "44",
posts : [
...
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
...
],
comments: [
...
{
id: "1910",
postId : "123",
commentTitle : "comment1",
comment : "some comment",
user: "user13"
},
{
id: "1911",
postId : "124",
title : "comment2",
commentTitle : "some comment",
user: "user22"
},
{
id: "1912",
postId : "124",
title : "comment2",
commentTitle : "some comment",
user: "user22"
},
...
],
commentUpvotes: [
...
{
id : 12,
commentId : "1910",
upvotedBy: "user91",
upvoteDate: 1000,
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92",
upvoteDate: 2000
},
{
id: 14,
commentId : "1911",
upvotedBy: "user92",
upvoteDate: 2100
},
...
]
}
Although this has nothing to do with my database, original schema is exactly as above. So, this example above is a user collection, where I store posts of the user; comments that made to the posts by other users, commentUpvotes to store information about who upvoted. Don't think about the logic of its design & contents; I made them up, and please don't suggest any other schema.
Question: I am looking a way to find posts and comments which has upvoted after a specific date, such
db.users.find("commentUpvotes.upvoteDate" : {$gte:0})
and result:
{
"_id" : ObjectId("539065d3cd0f2aac5f55778e"),
"posts" : [
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
],
"comments" : [
{
id: 1910,
postId : "123",
title : "comment1",
comment : "some comment",
user: "user13"
},
{
id: 1911,
postId : "124",
title : "comment2",
comment : "some comment",
user: "user22"
},
],
"commentUpVotes" : [
{
id : 12,
commentId : "1910",
upvotedBy: "user91",
upvoteDate: 1000,
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92",
upvoteDate: 2000
},
{
id: 14,
commentId : "1911",
upvotedBy: "user92",
upvoteDate: 2100
}
]
}
NOTE: It is a post-question, and former one can be found here. I wanted to extend it a bit in this one.
I let this sit for a while as I did comment to you on the last question what the basic process would be to do this. I also commented that $redact is not the animal for this type of operation, and for reasons that are two involved to explain in addition to the answer here. Suffice to say that you know the filtered values and not just filter them.
So much as was given before, you still need some usage of $unwind but rather than the traditional usage that can blow out the number of documents to process in the pipeline, it is only being used after the array contents have been filtered. The only real differences here are that we are mindful that the "filtered array" actually is going to contain more than one element, so you handle it appropriately:
db.users.aggregate([
{ "$match": {
"commentUpvotes.upvoteDate": { "$gte": 0 }
}},
{ "$project": {
"posts": 1,
"comments": 1,
"commentUpVotes": {
"$setDifference": [
{
"$map": {
"input": "$commentUpvotes",
"as": "el",
"in": {
"$cond": [
{ "$gte": [ "$$el.upvoteDate", 0 ] },
"$$el",
false
]
}
}
},
[false]
]
}
}},
{ "$project": {
"posts": 1,
"comments": 1,
"kcommentUpVotes": "$commentUpVotes",
"commentUpVotes": 1
}},
{ "$unwind": "$commentUpVotes" },
{ "$project": {
"posts": 1,
"comments": {
"$setDifference": [
{
"$map": {
"input": "$comments",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
{ "$substr": [ "$$el.id", 0, 4 ] },
"$commentUpVotes.commentId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"commentUpVotes": "$kcommentUpVotes"
}},
{ "$unwind": "$comments" },
{ "$group": {
"_id": "$_id",
"posts": { "$first": "$posts" },
"comments": { "$addToSet": "$comments" },
"kcomments": { "$addToSet": "$comments" },
"commentUpVotes": { "$first": "$commentUpVotes" }
}},
{ "$unwind": "$comments" },
{ "$project": {
"posts": {
"$setDifference": [
{
"$map": {
"input": "$posts",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
"$$el.id",
"$comments.postId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"comments": "$kcomments",
"commentUpVotes": 1
}},
{ "$unwind": "$posts" },
{ "$group": {
"_id": "$_id",
"posts": { "$addToSet": "$posts" },
"comments": { "$first": "$comments" },
"commentUpVotes": { "$first": "$commentUpVotes" }
}}
])
So there is a point here to understand exactly what each stage ( or the repeated process ) is doing and why the $unwind operations here are important.
If you take the first $project here into consideration, the result returned is always going to be an array. This is how the "filtering" with $map works, and makes perfect sense as you are expecting the possibility of several ( in this example, all ) matches.
The important part takes place before you try to match those values against another array in your document as when you look at the anatomy of a $map the point is to compare the element against a singular value. This is why you need to $unwind in order to get at those "singular" values to compare.
So aside from keeping a copy of the "filtered" array to make things a bit cleaner, let's skip down to the part after matching against the "comments" array. Since the "commentUpvotes" array was "unwound" there is a copy for each document now with it's own filtered version of the array there. Noting that each result array can only contain a single element.
As these are indeed arrays, in order to combine them between documents you need to unwind these "single element" arrays and then group them back together. Keep in mind here that while there were "three" matches for "commentUpvotes" there are only "two" comments that would match but of the "three" matches "two" of them share the same id. This is where grouping back using $addToSet becomes important as you do not want to duplicate that matching post.
Once all the matched elements are in the array it's time again to $unwind and repeat.
So the general premise remains the same as the previous example and question. Indeed the approach here can be considered a "version 2.0" of the previous listing as it will cater for singular and "many" matches in all cases.
The one "caveat" to mention here is the basic principle that these items are indeed related and there is no "orphaned" detail in any of the arrays. The obvious reason for this that anything that was tested to match from one array to the other that did not match would result in an empty array. There may be other matches, but if one of those tests came up empty then you would have to handle the empty array that was produced.
The concept of that final note is easy by simply testing the $size of the result and otherwise entering a singe value of false and filtering that out at a later stage. But for the purpose of the exercise I am considering that your "relationships" are indeed intact, and leaving any additional handling up to your own implementation.
The end result is of course you get the desired result without resorting to the same level of "blowout" by simply unwinding the unfiltered arrays against each other and trying to match equality against those records.
I found a way to make it work without using $unwind and with $redact + $$ROOT. As you know $redact scans the document from parent to child, therefore to make comparison among subdocuments I needed to use $$ROOT.
Since it is only processed inside of the document, I believe this is the most efficient way. I will still be glad if some propose better ways. There are not many resources on $redact, and I believe the code below still can be improved:
// first query match
{
"$match": {
"commentUpvotes.upvoteDate": {
"$gte": 0
}
}
},
// exclude commentUpvotes
{
$redact: {
$cond: {
if: {
$or: [
{
$gte: [
"$upvoteDate",
0
]
},
{
$not: "$upvoteDate"
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
// exclude comments
{
$redact: {
$cond: {
if: {
$or: [
{
$not: "$postId"
},
{
$anyElementTrue: { $map: {
input: "$$ROOT.commentUpvotes",
as: "el",
in: { $cond: { if: { $eq: [ "$$el.commentId", "$id" ] },
then: true, else: false
}
}
}
}
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
},
// exclude posts
{
$redact: {
$cond: {
if: {
$or: [
{
$not: "$title"
},
{
$anyElementTrue: {
$map: {
input: "$$ROOT.comments",
as: "el",
in: {
$cond: {
if: {
$eq: [
"$$el.postId",
"$id"
]
},
then: true,
else: false
}
}
}
}
}
]
},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}