Nested Query Mongodb - mongodb

I have this beautiful Json , and I'm trying with the powerful mongodb query to get all comments with file_id 12....so this is what I would like have back [4,5,7,10,11,15].
I tried with this query but the file_id it's completely ignored by the engine:
db.collection.distinct("changes.comments",
{"my_uuid":"bf48e757-1a65-4546-bf24-2bb001effddd",
"changes":{$elemMatch:{file_id:12}} }
)
Output:
{
"_id" : ObjectId("5342bf796b03d7ffc834afcc"),
"my_uuid" : "bf48e757-1a65-4546-bf24-2bb001effddd",
"changes" : [
{
"file_id" : 12,
"version" : 1,
"comments" : [
4,
5,
7
],
"lastseen" : 1394640549
},
{
"file_id" : 12,
"version" : 2,
"comments" : [
10,
11,
15
],
"lastseen" : 1394640511
},
{
"file_id" : 234,
"version" : 1,
"comments" : [
100,
110,
150
],
"lastseen" : 1394640555
}
]
}
Thanks in advance

You can use the aggregation framework to achieve what you what. Although the query looks complex for what you are trying to do, it's simple once you get a hang of it.
db.collection.aggregate([
// Get only documents where "my_uuid" equals "bf48e757-1a65-4546-bf24-2bb001effddd"
{"$match":{"my_uuid":"bf48e757-1a65-4546-bf24-2bb001effddd"}},
// Unwind the "changes" array
{"$unwind":"$changes"},
// Get only elements of the "changes" array where "file_id" equals 12
{"$match":{"changes.file_id":12}},
// Unwind the "comments" array
{"$unwind":"$changes.comments"},
// Group by _id and add the comments to array only if not already present
{"$group":{_id:"$_id", comments:{$addToSet:"$changes.comments"}}},
// Cleanup the output
{"$project":{_id:0, comments:1}}
])
Output:
{
"result" : [
{
"comments" : [
4,
5,
7,
10,
11,
15
]
}
],
"ok" : 1
}
EDIT: Including my_uuid in the results is fairly straight-forward. We just need to group by my_uuid instead of _id:
db.collection.aggregate([
{"$match":{"my_uuid":"bf48e757-1a65-4546-bf24-2bb001effddd"}},
{"$unwind":"$changes"},
{"$match":{"changes.file_id":12}},
{"$unwind":"$changes.comments"},
{"$group":{_id:"$my_uuid", comments:{$addToSet:"$changes.comments"}}},
{"$project":{_id:0, my_uuid:"$_id", comments:1}}
])

Currently there is no straight forward way of pulling out only the matching document from an array. The $elemMatch operator will only ensure that at least one of the documents within the array satisfies the condition provided by you. The query will however, always return the entire document. One way to achieve what you are looking for is -
db.sample4.aggregate({$unwind:"$changes"},{$match:{"changes.file_id":12}},{$project:{"changes.comments":1,"_id":0}});
These topics are covered here in stackoverflow, where map-reduce approach as well is listed to achieve this. If the requirement was to return the first matching document, the you could have projected using changes.comments.$:1 Eg. - db.sample4.find({"changes":{$elemMatch:{"file_id":12}} },{"changes.comments.$":1} )

Related

MongoDB $or + sort + index. How to avoid sorting in memory?

I have an issue to generate proper index for my mongo query, which would avoid SORT stage. I am not even sure if that is possible in my case. So here is my query with execution stats:
db.getCollection('test').find(
{
"$or" : [
{
"a" : { "$elemMatch" : { "_id" : { "$in" : [4577] } } },
"b" : { "$in" : [290] },
"c" : { "$in" : [35, 49, 57, 101, 161, 440] },
"d" : { "$lte" : 399 }
},
{
"e" : { "$elemMatch" : { "numbers" : { "$in" : ["1K0407151AC", "0K20N51150A"] } } },
"d" : { "$lte" : 399 }
}]
})
.sort({ "X" : 1, "d" : 1, "Y" : 1, "Z" : 1 }).explain("executionStats")
The fields 'm', 'a' and 'e' are arrays, that is why 'm' is not included in any index.
If you check the execution stats screenshot, you will see that memory usage is pretty close to maximum and unfortunately I had cases where the query failed to execute because of the 32MB limit.
Index for the first part of the $or query:
{
"a._id" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1,
"b" : 1,
"c" : 1
}
Index for the second part of the $or query:
{
"e.numbers" : 1,
"X" : 1,
"d" : 1,
"Y" : 1,
"Z" : 1
}
The indexes are used by the query, but not for sorting. Instead of SORT stage I would like too see SORT_MERGE stage, but no success for now. If I run the part queries inside $or separately, they are able to use the index to avoid sorting in a memory. As a workaround it is ok, but I would need to merge and resort the results by the application.
MongoDB version is 3.4.2. I checked that and that question. My query is the result. Probably I missed something?
Edit: mongo documents look like that:
{
"_id" : "290_440_K760A03",
"Z" : "K760A03",
"c" : 440,
"Y" : "NPS",
"b" : 290,
"X" : "Schlussleuchte",
"e" : [
{
"..." : 184,
"numbers" : [
"0K20N51150A"
]
}
],
"a" : [
{
"_id" : 4577,
"..." : [
{
"..." : [
{
"..." : "R",
}
]
}
]
},
{
"_id" : 4578
}
],
"d" : 101,
"m" : [
"AT",
"BR",
"CH"
],
"moreFields":"..."
}
Edit 2: removed the filed "m" from query to decrease complexity and attached test collection dump for someone, who wants to help :)
Here is the solution-
I just added one document in my test collection as shown in your question (edit part). Then I created below four indices-
1. {"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
2. {"a._id":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1}
3. {"m":1,"X":1,"d":1,"Y":1,"Z":1}
4. {"e.numbers":1,"X":1,"d":1,"Y":1,"Z":1}
And when I executed given query for execution stats then it shows me the SORT_MERGE state as expected.
Here is the explanation-
MongoDB has a thing called equality-sort-range which tells a lot how we should create our indices. I just followed this rule and kept the index in that order. So Here the index should be {Equality fields, "X":1,"d":1,"Y":1,"Z":1, Range fields}. You can see that the query has range on field "d" only ("d" : { "$lte" : 101 }) but "d" is already covered in SORT fields of index ("X":1,"d":1,"Y":1,"Z":1) so we can skip range part (i.e. field "d") from the end of index.
If "d" had NOT been in sort/equality predicate then I would have taken it in index for range index field and my index would have looked like {Equality fields, "X":1,"Y":1,"Z":1,"d":1}.
Now my index is {Equality fields, "X":1,"d":1,"Y":1,"Z":1} and I am just concerned about equality fields. So to figure out equality fields I just checked the query find predicates and I found there are two conditions combined by OR operator.
The first condition has equality on "a._id", "b", "c", "m" ("d" has range, not equality). So I need to create an index like "a._id":1,"m":1,"b":1,"c":1,"X":1,"d":1,"Y":1,"Z":1 but this will give error because it has two array fields "a_id" and "m". And as we know Mongo doesn't allow compound index on parallel arrays so it will fail. So I created two separate index just to allow Mongo to use whatever is chosen by query planner. And hence I created first and second index.
The second condition of OR operator has "e.numbers" and "m". Both are arrays fields so I had to create two indices as done for first condition and that's how I got my third and fourth index.
Now we know that at a time a single query can use only and only one index so I need to create these indices because I don't know which branch of OR operator will be executed.
Note: If you are concerned about size of index then you can keep only one index from first two and one from last two. Or you can also keep all four and hint mongo to use proper index if you know it well before query planner.

MongoDB: Retrieving an entire array from a specific document

I have set up some test data in mongoDB that has the following form:
{
"_id" : ObjectId("579ab44c0f9f0dc3aeec42ab"),
"name" : "Bob",
"references" : [ 1, 2, 3, 4, 5, 6 ]
}
{
"_id" : ObjectId("579ab7a20f9f0dc3aeec42ac"),
"name" : "Jeff",
"references" : [ 11, 12, 13, 14, 15 ]
}
I want to be able to return the references array only for Bob. Currently I am able to return the complete Document for Bob with the following query:
db.test_2.find({"name" : "Bob"}, bob).pretty()
Basically the general question is how to return an array for a single document in a collection in MongoDB? If I could get any help for this that would be much appreciated!
You can add a projection document to limit the fields returned.
For example:
db.products.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 } )
Take a look at the documentation:
https://docs.mongodb.com/manual/reference/method/db.collection.find/#db.collection.find
The other option would be to select the field from the given document (if you use it in a loop for example).
In any case mongo will return a json document which you need to take the array from.
Regards
Jony
You can do this...
db.test_2.findOne({ "name": "Bob" }).select({ references: 1, _id: 1 })
P.S this is with MongoDB v4.2
db.test_2.find({ "name": "Bob" }, { "references": 1 });

MongoDB Why this error : can't append to array using string field name: comments

I have a DB structure like below:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"content" : "xxx"
}
]
}
I update a new subdocument in the comments feild. It is OK.
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
after that the DB structure:
{
"_id" : 1,
"comments" : [
{
"_id" : 2,
"comments" : [
{
"id" : 3,
"content" : "xxx"
}
],
"content" : "xxx"
}
]
}
But when I update a new subdocument in the comment field that _id is 3, There is a error:
db.test.update(
{"_id" : 1, "comments.comments.id" : 3},
{$push : {"comments.comments.$.comments" : {id : 4, content:"xxx"}}}
)
error message:
can't append to array using string field name: comments
Well, it makes total sense if you think about it. MongoDb has the advantage and the disadvantage of solving magically certain things.
When you query the database for a specific regular field like this:
{ field : "value" }
The query {field:"value"} makes total sense, it wouldn't in case value is part of an array but Mongo solves it for you, so in case the structure is:
{ field : ["value", "anothervalue"] }
Mongo iterates through all of them and matches "value" into the field and you don't have to think about it. It works perfectly.. at only one level, because it's impossible to guess what you want to do if you have multiple levels
In your case the first query works because it's the case in this example:
db.test.update(
{"_id" : 1, "comments._id" : 2},
{$push : {"comments.$.comments" : {_id : 3, content:"xxx"}}}
)
Matches _id in the first level, and comments._id at the second level, it gets an array as a result but Mongo is able to solve it.
But in the second case, think what you need, let's isolate the where clause:
{"_id" : 1, "comments.comments.id" : 3},
"Give me from the main collection records with _id:1" (one doc)
"And comments which comments inside have and id=3" (array * array)
The first level is solved easily, comments.id, the second is not possible due comments returns an array, but one more level is an array of arrays and Mongo gets an array of arrays as a result and it's not possible to push a document into all the records of the array.
The solution is to narrow your where clause to obtain an unique document in comments (could be the first one) but it's not a good solution because you never know what is the position of the document you're looking for, using the shell I think the only option to be accurate is to do it in two steps. Check this query that works (not the solution anyway) but "solves" the multiple array part fixing it to the first record:
db.test.update(
{"_id" : 1, "comments.0.comments._id" : 3},
{$push : {"comments.0.comments.$.comments" : {id : 4, content:"xxx"}}}
)

Map reduce in mongodb

I have mongo documents in this format.
{"_id" : 1,"Summary" : {...},"Examples" : [{"_id" : 353,"CategoryId" : 4},{"_id" : 239,"CategoryId" : 28}, ... ]}
{"_id" : 2,"Summary" : {...},"Examples" : [{"_id" : 312,"CategoryId" : 2},{"_id" : 121,"CategoryId" : 12}, ... ]}
How can I map/reduce them to get a hash like:
{ [ result[categoryId] : count_of_examples , .....] }
I.e. count of examples of each category.
I have 30 categories at all, all specified in Categories collection.
If you can use 2.1 (dev version of upcoming release 2.2) then you can use Aggregation Framework and it would look something like this:
db.collection.aggregate( [
{$project:{"CatId":"$Examples.CategoryId","_id":0}},
{$unwind:"$CatId"},
{$group:{_id:"$CatId","num":{$sum:1} } },
{$project:{CategoryId:"$_id",NumberOfExamples:"$num",_id:0 }}
] );
The first step projects the subfield of Examples (CategoryId) into a top level field of a document (not necessary but helps with readability), then we unwind the array of examples which creates a separate document for each array value of CatId, we do a "group by" and count them (I assume each instance of CategoryId is one example, right?) and last we use projection again to relabel the fields and make the result look like this:
"result" : [
{
"CategoryId" : 12,
"NumberOfExamples" : 1
},
{
"CategoryId" : 2,
"NumberOfExamples" : 1
},
{
"CategoryId" : 28,
"NumberOfExamples" : 1
},
{
"CategoryId" : 4,
"NumberOfExamples" : 1
}
],
"ok" : 1

How to query recent comments in Mongodb

The the post document looks like this:
{
...
comments: [{
_id:...
body:...
createDate:...
},
...
]
}
How do I get recent 10 comments from the collection?
If your comments are always in a predictable order (i.e. newest first, or newest last), then you can use the $slice operator to return just a subset of the full comments field when querying:
test> db.foo.save({name: "hello", comments: [1, 2, 3, 4, 5]})
test> db.foo.find({}, {comments: {$slice: 3}})
{ "_id" : ObjectId("4ec7d1c8e72da9b6f31e2528"), "name" : "hello", "comments" : [ 1, 2, 3 ] }
test> db.foo.find({}, {comments: {$slice: -3}})
{ "_id" : ObjectId("4ec7d1c8e72da9b6f31e2528"), "name" : "hello", "comments" : [ 3, 4, 5 ] }
You can read more about controlling the returned fields at http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
There is no way to partially select the items from embedded document. No matter what it will return the entire array of document. You have to do the filter in your application code. Thats the only way.
But i recommend to have a separate collection for comments. That way you can skip & limit the set.