Can aggregate functions be used with a variable in the field name? - mongodb

I have some documents that look like this:
{
(...stuff...)
something: {
first: {
0: 3,
1: 5,
2: 2
},
second: {
0: 1,
1: 9,
2: 7
}
}
}
For the sake of simplicity in this question, I'll assume that my $match only hits this one document. What I'd like to do is, in an aggregate command, add up the 0s, and add up the 1s, and add up the 2s, so that I can produce something like this:
something.0: 4 (something.first.0 + something.second.0)
something.1: 14 (something.first.1 + something.second.1)
something.2: 9 (something.first.2 + something.second.2)
Is this something that can be done, or do I need to change my document schema to rearrange the nested documents such that all the 0s are together, etc?

If you know the names of all the arrays you want to sum, you can use the Aggregation Framework in MongoDB 2.2+ and a projection with the $add operator:
db.something.aggregate(
{ $project: {
something: {
0: { $add: [ "$something.first.0", "$something.second.0" ] },
1: { $add: [ "$something.first.1", "$something.second.1" ] },
2: { $add: [ "$something.first.2", "$something.second.2" ] }
}
}}
)
Sample output:
{
"result" : [
{
"_id" : ObjectId("517ae4914bc9ade96ca6402d"),
"something" : {
"0" : 4,
"1" : 14,
"2" : 9
}
}
],
"ok" : 1
}
If you don't know the names (i.e. you are using trying to add the 0th element of every array element in a document) you will need to use MapReduce or implement the logic in your application code instead.

Related

Query and Update Child Documents without knowing keys

I have a collection with documents having the following format
{
name: "A",
details : {
matchA: {
comment: "Hello",
score: 5
},
matchI: {
score: 10
},
lastMatch:{
score: 5
}
}
},
{
name: "B",
details : {
match2: {
score: 5
},
match7: {
score: 10
},
firstMatch:{
score: 5
}
}
}
I don't immediatly know the name of the keys that are children of details, they don't follow a known format, there can be different amounts etc.
I would like to write a query which will update the children in such a manner that any subdocument with a score less than 5, gets a new field added (say lowScore: true).
I've looked around a bit and I found $ and $elemMatch, but those only work on arrays. Is there an equivalent for subdocuments? Is there some way of doing it using the aggregation pipeline?
I don't think you can do that using a normal update(). There is a way through the aggregation framework which itself, however, cannot alter any persisted data. So you will need to loop through the results and update your documents individually like e.g. here: Aggregation with update in mongoDB
This is the required query to transform your data into what you need for the subsequent update:
collection.aggregate({
$addFields: {
"details": {
$objectToArray: "$details" // transform "details" into uniform array of key-value pairs
}
}
}, {
$unwind: "$details" // flatten the array created above
}, {
$match: {
"details.v.score": {
$lt: 10 // filter out anything that's not relevant to us
// (please note that I used some other filter than the one you wanted "score less than 5" to get some results using your sample data
},
"details.v.lowScore": { // this filter is not really required but it seems to make sense to check for the presence of the field that you want to create in case you run the query repeatedly
$exists: false
}
}
}, {
$project: {
"fieldsToUpdate": "$details.k" // ...by populating the "details" array again
}
})
Running this query returns:
/* 1 */
{
"_id" : ObjectId("59cc0b6afab2f8c9e1404641"),
"fieldsToUpdate" : "matchA"
}
/* 2 */
{
"_id" : ObjectId("59cc0b6afab2f8c9e1404641"),
"fieldsToUpdate" : "lastMatch"
}
/* 3 */
{
"_id" : ObjectId("59cc0b6afab2f8c9e1404643"),
"fieldsToUpdate" : "match2"
}
/* 4 */
{
"_id" : ObjectId("59cc0b6afab2f8c9e1404643"),
"fieldsToUpdate" : "firstMatch"
}
You could then $set your new field "lowScore" using a cursor as described in the linked answer above.

How do I use spring mongo data to query based on the position of an element in an array

I have a mongo collection "test" which contains elements like so (with the nodes array being a set and meaningful order):
"test" : {
"superiorID" : 1,
"nodes" : [
{
"subID" : 2
},
{
"subID" : 1
},
{
"subID" : 3
}
]
}
or
"test" : {
"superiorID" : 4,
"nodes" : [
{
"subID" : 2
},
{
"subID" : 1
},
{
"subID" : 3
}
]
}
I am using spring Criteria to try and build a mongo query which will return to me all elements where the 'subID' equals a user input id 'inputID' AND the 'superiorID' position is NOT before the 'inputID' (if the superior id is even in the sub ids which is not required).
So for example, if my user input was 3 I would NOT want to pull the first document but I WOULD want to pull the second document (first has a superior that exists in the nodes BEFORE the userInput node second's superior id is not equal to the user input).
I know that the $indexOfArray function exists but I don't know how to translate this to Criteria.
You can get the result you are looking for through the aggregation framework. I've made a speude query for you to show what you should be looking for. This returns
showMe = false for doc1 and showMe = true for doc2, which you could obiously match for. You do not need 2 project phases for this query, I only did that to make a working query which is also easy-ish to read. This will not be a very fast query. If you want fast queries you might want to rethink your data structure.
db.getCollection('test').aggregate([
{ "$project":
{
"superiorIndex": {"$indexOfArray" : [ "$nodes.subID","$superiorID" ]},
"inputIndex": {"$indexOfArray" : [ "$nodes.subID",3 ]},
}
},
{ "$project":
{
"showMe" :
{
$cond:
{
if: { $eq: [ "$superiorIndex", -1 ] },
then: true,
else: {$gt:[ "$superiorIndex","$inputIndex"]}
}
}
}
}
])
db.collection.find({nodes.2.subID:2}) that query will lookup 2th element subid from nodes field.

Grouping documents in pairs using mongo aggregation

I have a collection of items,
[ a, b, c, d ]
And I want to group them in pairs such as,
[ [ a, b ], [ b, c ], [ c, d ] ]
This will be used in calculating the differences between each item in the original collection, but that part is solved using several techniques such as the one in this question.
I know that this is possible with map reduce, but I want to know if it's possible with aggregation.
Edit: Here's an example,
The collection of items; each item is an actual document.
[
{ val: 1 },
{ val: 3 },
{ val: 6 },
{ val: 10 },
]
Grouped version:
[
[ { val: 1 }, { val: 3 } ],
[ { val: 3 }, { val: 6 } ],
[ { val: 6 }, { val: 10 } ]
]
The resulting collection (or aggregation result):
[
{ diff: 2 },
{ diff: 3 },
{ diff: 4 }
]
This is something that just cannot be done with the aggregation framework, and the only current MongoDB method available for this type of operation is mapReduce.
The reason being that the a aggregation framework has no way of referring to any other document in the pipeline than the present one. This actually applies to "grouping" pipeline stages as well, since even though things are grouped on a "key" you cant really deal with individual documents in the way you want to.
MapReduce on the other hand has one feature available that allows you to do what you want here, and it's not even "directly" related to aggregation. It is in fact the ability to have "globally scoped variables" across all stages. And having a "variable" to basically "store the last document" is all you need to achieve your result.
So it's quite simple code, and there is in fact no "reduction" required:
db.collection.mapReduce(
function () {
if (lastVal != null)
emit( this._id, this.val - lastVal );
lastVal = this.val;
},
function() {}, // mapper is not called
{
"scope": { "lastVal": null },
"out": { "inline": 1 }
}
)
Which gives you a result much like this:
{
"results" : [
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d662"),
"value" : 2
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d663"),
"value" : 3
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d664"),
"value" : 4
}
],
"timeMillis" : 3,
"counts" : {
"input" : 4,
"emit" : 3,
"reduce" : 0,
"output" : 3
},
"ok" : 1
}
That's really just picking "something unique" as the emitted _id value rather than anything specific, because all this is really doing is the difference between values on differing documents.
Global variables are usually the solution to these types of "pairing" aggregations or producing "running totals". Right now the aggregation framework has no access to global variables, even though it might well be a nice this to have. The mapReduce framework has them, so it is probably fair to say that they should be available to the aggregation framework as well.
Right now they are not though, so stick with mapReduce.

mongo $slice query reverse index out of range

The following query in mongo, behaves strange :
db.items.findOne({},{ "List": { "$slice": [ skip, 3 ] }})
First:
Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Second:
if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
I would expect for:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3,
4,
5
]
"other" : "not_important"
}
query:
db.items.findOne({},{ "List": { "$slice": [-10, 3 ] }})
to get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : []
}
instead, I get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3
]
"other" : "not_important"
}
Why?
I use mongoDB 2.4.10
Second: if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
Yes. That is how the javascript Array.prototype.slice() method works, which is internally used by mongodb.
According to the ECMAScript® Language Specification,
If relativeStart is negative, let k be max((len + relativeStart),0);
else let k be min(relativeStart, len).
In your case relativeStart is -10,
k = max((-10+5),0), k = 0; (where, 5 is the length of your array).
Hence k or skip will always be 0, in these cases.
First: Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Yes, the projection operator works that way. Unless a inclusion or exclusion is explicitly specified in the projection parameter, the whole document is retrieved with the projection operators such as $slice,$elemmatch being applied.
db.items.findOne({},{"_id":1,"List": { "$slice": [-10, 3 ] }})
would return:
{ "_id" : ObjectId("542babf265f5de9a0d5c2928"), "List" : [ 1, 2, 3 ] }
The second parameter to the findOne() method is not only for simple projection purpose, fields are not projected, only if any one of the field names have a value of 0 or 1 against them. If not the whole document is returned. If any field has a projection operator to be applied, it would be applied and projected.
The projection mechanism seems to happen in the below manner, whenever the $slice operator is involved.
By default all the fields would be included for projection.
By Default all the fields whose values are derived based on the projection operator, $slice, if truthy, are always displayed, irrespective of the below.
Steps taking place for exclusion or inclusion.
The list of fields specified in the projection parameter are accumulated in their specified order.
For only the first field encountered with value '0' or '1':
If the
field has a value '0' - then it is excluded, and all the remaining
fields are marked to be included.
If a field has '1' - then it is included, and all the remaining fields
are marked to be excluded.
For all the subsequent fields, they are excluded or included based on
their values.
Whilst this behavior is by design for the $slice operator, it is possible since MongoDB 3.2 to evaluate this and alter the result with the aggregation operator for $slice:
Given the example documents:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2, 3, 4 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ 5, 6 ] }
If given a conditional expression to test against the length of the array with $size and only perform the $slice when the reverse index was greater than or equal to that length, or otherwise return an empty array:
db.collection.aggregate([
{ "$project": {
"a": {
"$cond": {
"if": { "$gte": [ { "$size": "$a" }, 4 ] },
"then": { "$slice": [ "$a", -4, 2 ] },
"else": { "$literal": [] },
}
}
}}
])
Then of course you get:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ ] }
So that is how you could get MongoDB to return a "slice" that acts in this way.

Conditional $inc in a nested MongoDB array

My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.