Understanding overlapping mongodb projections - mongodb

I'm struggling to understand how overlapping projections work in mongodb.
Here's a quick example to illustrating my conundrum (results from the in-browser mongodb console).
First, I created and inserted a simple document:
var doc = {
id:"universe",
systems: {
1: {
name:"milky_way",
coords:{x:1, y:1}
}
}
}
db.test.insert(doc);
// doc succesfully inserted
Next, I try a somewhat odd projection:
db.test.find({}, {"systems.1.coords":1, "systems.1":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1,
"x" : 1
}
}
}
}
I expected to see the entirety of "system 1," including the name field. But it appears the deeper path to "systems.1.coords" overwrote the shallower path to just "system.1"?
I decide to test this "deeper path overrides shallower path" theory:
db.test.find({}, {"systems.1.coords.x":1, "systems.1.coords":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1, // how'd this get here, but for the shallower projection?
"x" : 1
}
}
}
}
Here, my deeper projection didn't override the shallower one.
What gives? How is mongodb dealing with overlapping projections? I can't find the logic to it.
EDIT:
My confusion was stemming from what counted as a "top level" path.
This worked like I expected: .findOne({}, {"systems.1":1, "systems":1}) (i.e., a full set of systems is returned, notwithstanding that I started with what appeared to be a "narrower" projection).
However, this did not work like I expected: .findOne({}, {"systems.1.name":1, "systems.1":1}) (i.e., only the name field of system.1 is returned).
In short, going more than "one dot" deep leads to the overwriting discussed in the accepted answer.

You cannot do this sort of projection using .find() as the general projections allowed are basic field selection. What you are talking about is document re-shaping and for that you can use the $project operator with the .aggregate() method.
So by your initial example:
db.test.aggregate([
{ "$project": {
"coords": "$systems.1.coords",
"systems": 1
}}
])
That will give you output like this:
{
"_id" : ObjectId("537fe2127cb762d14e2a1007"),
"systems" : {
"1" : {
"name" : "milky_way",
"coords" : {
"x" : 1,
"y" : 1
}
}
},
"coords" : {
"x" : 1,
"y" : 1
}
}
Note the different field naming there as well, as if for no other reason, the version coming from .find() would result in overlapping paths ("systems" is the same) for the levels of fields you were trying to select and therefore cannot be projected as two fields the way you can do right here.
In much the same way, consider a statement like the following:
db.test.aggregate([
{ "$project": {
"systems": {
"1": {
"coords": "$systems.1.coords"
}
},
"systems": 1
}}
])
So that is not telling you it's invalid, it's just that one of the results in the projection is overwriting the other as essentially at a top level they are both callled "systems".
This is basically what you end up with when trying to do something like this with the projection method available to .find(). So the essential part is you need a different field name, and this is what the aggregation framework ( though not aggregating here ) allows you to do.

Related

$Avg aggregation in Mongodb [duplicate]

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

MongoDB - Get highest value of child

I'm trying to get the highest value of a child value. If I have two documents like this
{
"_id" : ObjectId("5585b8359557d21f44e1d857"),
"test" : {
"number" : 1,
"number2" : 1
}
}
{
"_id" : ObjectId("5585b8569557d21f44e1d858"),
"test" : {
"number" : 2,
"number2" : 1
}
}
How would I get the highest value of key "number"?
Using dot notation:
db.testSOF.find().sort({'test.number': -1}).limit(1)
To get the highest value of the key "number" you could use two approaches here. You could use the aggregation framework where the pipeline would look like this
db.collection.aggregate([
{
"$group": {
"_id": 0,
"max_number": {
"$max": "$test.number"
}
}
}
])
Result:
/* 0 */
{
"result" : [
{
"_id" : 0,
"max_number" : 2
}
],
"ok" : 1
}
or you could use the find() cursor as follows
db.collection.find().sort({"test.number": -1}).limit(1)
max() does not work the way you would expect it to in SQL for Mongo.
This is perhaps going to change in future versions but as of now,
max,min are to be used with indexed keys primarily internally for
sharding.
see http://www.mongodb.org/display/DOCS/min+and+max+Query+Specifiers
Unfortunately for now the only way to get the max value is to sort the
collection desc on that value and take the first.
db.collection.find("_id" => x).sort({"test.number" => -1}).limit(1).first()
quoted from: Getting the highest value of a column in MongoDB

MongoDB / Morphia - Projection not working on recursive objects?

I have a test object which works as nodes on a tree, containing 0 or more children instances of the same type. I'm persisting it on MongoDB and querying it with Morphia.
I perform the following query:
db.TestObject.find( {}, { _id: 1, childrenTestObjects: 1 } ).limit(6).sort( {_id: 1 } ).pretty();
Which results in:
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }
{ "_id" : NumberLong(3) }
{ "_id" : NumberLong(4) }
{
"_id" : NumberLong(5),
"childrenTestObjects" : [
{
"stringValue" : "6eb887126d24e8f1cd8ad5033482c781",
"creationDate" : ISODate("1997-05-24T00:00:00Z")
"childrenTestObjects" : [
{
"stringValue" : "2ab8f86410b4f3bdcc747699295eb5a4",
"creationDate" : ISODate("2024-10-10T00:00:00Z"),
"_id" : NumberLong(7)
}
],
"_id" : NumberLong(6)
}
]
}
That's awesome, but also a little surprising. I'm having two issues with the results:
1) When I do a projection, it only applies to the top elements. The children elements still return other properties not in the projection (stringValue and creationDate). I'd like the field selection to apply to all documents and sub documents of the same type. This tree has an undermined number of sub items, so I can't specify that in the query explicitly. How to accomplish that?
2) To my surprise, limit applied to sub documents! You see that there was one embedded document with id 6. I was expecting to see 6 top level documents with N sub documents, but instead got just 5. How to tell MongoDB to return 6 top level elements, regardless of what is embedded in them? Without that having a consistent pagination system is impossible.
All your help has made learning MongoDB way faster and I really appreciate it! Thanks!
As for 1), projections retain fields in the results. In this case that field is childrenTestObjects which happens to be a document. So mongo returns that entire field which is, of course, the entire subdocument. Projections are not recursive so you'd have to specify each field explicitly.
As for 2), that doesn't sound right. it would help to see the query results without the projections added (full documents in each return document) and we can take it from there.

Average a Sub Document Field Across Documents in Mongo

For a given record id, how do I get the average of a sub document field if I have the following in MongoDB:
/* 0 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "1"
},
{
"key" : "test-key2",
"value" : "2"
}
]
}
/* 1 */
{
"item" : "1",
"samples" : [
{
"key" : "test-key",
"value" : "3"
},
{
"key" : "test-key2",
"value" : "4"
}
]
}
I want to get the average of the values where key = "test-key" for a given item id (in this case 1). So the average should be $avg (1 + 3) = 2
Thanks
You'll need to use the aggregation framework. The aggregation will end up looking something like this:
db.stack.aggregate([
{ $match: { "samples.key" : "test-key" } },
{ $unwind : "$samples" },
{ $match : { "samples.key" : "test-key" } },
{ $project : { "new_key" : "$samples.key", "new_value" : "$samples.value" } },
{ $group : { `_id` : "$new_key", answer : { $avg : "$new_value" } } }
])
The best way to think of the aggregation framework is like an assembly line. The query itself is an array of JSON documents, where each sub-document represents a different step in the assembly.
Step 1: $match
The first step is a basic filter, like a WHERE clause in SQL. We place this step first to filter out all documents that do not contain an array element containing test-key. Placing this at the beginning of the pipeline allows the aggregation to use indexes.
Step 2: $unwind
The second step, $unwind, is used for separating each of the elements in the "samples" array so we can perform operations across all of them. If you run the query with just that step, you'll see what I mean.
Long story short :
{ name : "bob",
children : [ {"name" : mary}, { "name" : "sue" } ]
}
becomes two documents :
{ name : "bob", children : [ { "name" : mary } ] }
{ name : "bob", children : [ { "name" : sue } ] }
Step 3: $match
The third step, $match, is an exact duplicate of the first $match stage, but has a different purpose. Since it follows $unwind, this stage filters out previous array elements, now documents, that don't match the filter criteria. In this case, we keep only documents where samples.key = "test-key"
Step 4: $project (Optional)
The fourth step, $project, restructures the document. In this case, I pulled the items out of the array so I could reference them directly. Using the example above..
{ name : "bob", children : [ { "name" : mary } ] }
becomes
{ new_name : "bob", new_child_name : mary }
Note that this step is entirely optional; later stages could be completed even without this $project after a few minor changes. In most cases $project is entirely cosmetic; aggregations have numerous optimizations under the hood such that manually including or excluding fields in a $project should not be necessary.
Step 5: $group
Finally, $group is where the magic happens. The _id value what you will be "grouping by" in the SQL world. The second field is saying to average over the value that I defined in the $project step. You can easily substitute $sum to perform a sum, but a count operation is typically done the following way: my_count : { $sum : 1 }.
The most important thing to note here is that the majority of the work being done is to format the data to a point where performing the operation is simple.
Final Note
Lastly, I wanted to note that this would not work on the example data provided since samples.value is defined as text, which can't be used in arithmetic operations. If you're interested, changing the type of a field is described here: MongoDB How to change the type of a field

mongodb micro-optimization of batch inserts ? or is this an important optimization?

premise : update statements are harmless since the driver by default works in one way messaging (as long as getLastError isn't used).
question Is the following fragment the best way to do this in mongodb for high volume inserts ? Is it possible to fold step 2 and 3 ?
edit : old buggy form , see below
// step 1 : making sure the top-level document is present (an upsert in the real
example)
db.test.insert( { x :1} )
// step 2 : making sure the sub-document entry is present
db.test.update( { x:1 }, { "$addToSet" : { "u" : { i : 1, p : 2 } } }, false)
// step 3 : increment a integer within the subdocument document
db.test.update( { x : 1, "u.i" : 1}, { "$inc" : { "u.$.c" : 1 } },false)
I have a feeling there is no way out of operation 3, since the$ operator requires priming in the query field of the query part of an update. amirite ? iamrite ?
If this is the best way to do things, can I get creative in my code and go nuts with update operations ?
edit : new form
There was a bug in my logic, thanks Gates. Still want to fold the updates if possible :D
// make sure the top-level entry exists and increase the incidence counter
db.test.update( { x : 1 }, { $inc : { i : 1 } }, true ) --1
// implicetly creates the array
db.test.update( { x : 1 , u : { $not : { $elemMatch : { i : 1 } } } } ,
{ $push : { u : { i : 1 , p :2 , c:0} } }) -- 2
db.test.update( { x :1 , "u.i" : 1}, { $inc : { "u.$.c" : 1 } },false) --3
notes : $addToSet is not usefull in this case, since it does a element-wise match, there is no way to express what elements in an array may be mutable as in C++ OO bitwise comparison parlance
question is pointless Data model is wrong. Please vote to close (OP).
So, the first thing to note is that the $ positional operator is a little sketchy. It has a lot of "gotchas": it doesn't play well with upserts, it only affects the first true match, etc.
To understand "folding" of #2 and #3, you need to look at the output of your commands:
db.test.insert( { x :1} )
{ x:1 } // DB value
db.test.update( { x:1 }, { "$addToSet" : { "u" : { i : 1, p : 2 } } }, false)
{ x:1, u: [ {i:1,p;2} ] } // DB value
db.test.update( { x : 1, "u.i" : 1}, { "$inc" : { "u.$.c" : 1 } },false)
{ x:1, u: [ {i:1,p:2,c:1} ] } // DB value
Based on the sequence you provided, the whole thing can be rolled into a single update.
If you're only looking to roll together #2 & #3, then you're worried about matching 'u.i':1 with u.$.c. But there are some edge cases here you have to clarify.
Let your starting document be the following:
{
x:1,
u: [
{i:1, p:2},
{i:1, p:3}
]
}
What do you expect from running update #3?
As written you get:
{
x:1,
u: [
{i:1, p:2, c:1},
{i:1, p:3}
]
}
Is this correct? Is that first document legal? (semantically correct)? Depending on the answers, this may actually be an issue of document structure.