I am saving game results in MongoDB and would like to calculate the sum of the 3 best results for every player.
With the aggregation framework I am able to built the following intermediate pipeline result from my database of finished games (each player below has finished 5 games with the gives score):
{
"_id" : "Player1",
"points" : [ 324, 300, 287, 287, 227]
},
{
"_id" : "Player2",
"points" : [ 324, 324, 300, 287, 123]
}
Now I need to sum up the three best values for each player. I was able to sort the array so it would also be ok here to get only the first 3 elements of each array to build the sum of the array in the next pipeline step.
$limit would work fine if I only need the result for one player. I also tried using $slice but that doesn't seem to work in the aggregation framework.
So how do I get the sum of the three best results for each player?
You mentioned that it would also be ok here to get only the first 3 elements of each array to build the sum of the array in the next pipeline step., so do it first, then use:
db.test.aggregate({'$unwind':'$points'},{'$group':{'_id':'$_id','result':{'$sum':'$points'}}}
to get the result.
$slice method for aggregation framework was added in 3.2 version of mongo. For a more detailed answer, take a look here.
And a couple of examples from the mongo page:
{ $slice: [ [ 1, 2, 3 ], 1, 1 ] } // [ 2 ]
{ $slice: [ [ 1, 2, 3 ], -2 ] } // [ 2, 3 ]
{ $slice: [ [ 1, 2, 3 ], 15, 2 ] } // [ ]
{ $slice: [ [ 1, 2, 3 ], -15, 2 ] } // [ 1, 2 ]
Related
My gut feeling is that the answer is no, but is it possible to perform a search in Mongodb comparing the similarity of arrays where order is important?
E.g.
I have three documents like so
{'_id':1, "my_list": ["A",2,6,8,34,90]},
{'_id':2, "my_list": ["A","F",2,6,19,8,90,55]},
{'_id':3, "my_list": [90,34,8,6,3,"A"]}
1 and 2 are the most similar, 3 is wildly different irrespective of the fact it contains all of the same values as 1.
Ideally I would do a search similar to {"my_list" : ["A",2,6,8,34,90] } and the results would be document 1 and 2.
It's almost like a regex search with wild cards. I know I can do this in python easily enough, but speed is important and I'm dealing with 1.3 million documents.
Any "comparison" or "selection" is actually more or less subjective to the actual logic applied. But as a general principle you could always consider the product of the matched indices from the array to test against and the array present in the document. For example:
var sample = ["A",2,6,8,34,90];
db.getCollection('source').aggregate([
{ "$match": { "my_list": { "$in": sample } } },
{ "$addFields": {
"score": {
"$add": [
{ "$cond": {
"if": {
"$eq": [
{ "$size": { "$setIntersection": [ "$my_list", sample ] }},
{ "$size": { "$literal": sample } }
]
},
"then": 100,
"else": 0
}},
{ "$sum": {
"$map": {
"input": "$my_list",
"as": "ml",
"in": {
"$multiply": [
{ "$indexOfArray": [
{ "$reverseArray": "$my_list" },
"$$ml"
]},
{ "$indexOfArray": [
{ "$reverseArray": { "$literal": sample } },
"$$ml"
]}
]
}
}
}}
]
}
}},
{ "$sort": { "score": -1 } }
])
Would return the documents in order like this:
/* 1 */
{
"_id" : 1.0,
"my_list" : [ "A", 2, 6, 8, 34, 90],
"score" : 155.0
}
/* 2 */
{
"_id" : 2.0,
"my_list" : ["A", "F", 2, 6, 19, 8, 90, 55],
"score" : 62.0
}
/* 3 */
{
"_id" : 3.0,
"my_list" : [ 90, 34, 8, 6, 3, "A"],
"score" : 15.0
}
The key being that when applied using $reverseArray, the values from $indexOfArray will be "larger" produced by the matching index on order from "first to last" ( reversed ) which gives a larger "weight" to matches at the beginning of the array than those as it moves towards the end.
Of course you should make consideration for things like the second document does in fact contain "most" of the matches and have more array entries would place a "larger" weight on the initial matches than in the first document.
From the above "A" scores more in the second document than in the first because the array is longer even though both matched "A" in the first position. However there is also some effect that "F" is a mismatch and therefore has a greater negative effect than it would if it was later in the array. Same applies to "A" in the last document, where at the end of the array the match has little bearing on the overall weight.
The counter to this in consideration is to add some logic to consider the "exact match" case, such as here the $size comparison from the $setIntersection of the sample and the current array. This would adjust the scores to ensure that something that matched all provided elements actually scored higher than a document with less positional matches, but more elements overall.
With a "score" in place you can then filter out results ( i.e $limit ) or whatever other logic you can apply in order to only return the actual results wanted. But the first step is calculating a "score" to work from.
So it's all generally subjective to what logic actually means a "nearest match", but the $reverseArray and $indexOfArray operations are generally key to putting "more weight" on the earlier index matches rather than the last.
Overall you are looking for "calculation" of logic. The aggregation framework has some of the available operators, but which ones actually apply are up to your end implementation. I'm just showing something that "logically works" to but more weight on "earlier matches" in an array comparison rather than "latter matches", and of course the "most weight" where the arrays are actually the same.
NOTE: Similar logic could be achieved using the includeArrayIndex option of $unwind for earlier version of MongoDB without the main operators used above. However the process does require usage of $unwind to deconstruct arrays in the first place, and the performance hit this would incur would probably negate the effectiveness of the operation.
I have a collection of items,
[ a, b, c, d ]
And I want to group them in pairs such as,
[ [ a, b ], [ b, c ], [ c, d ] ]
This will be used in calculating the differences between each item in the original collection, but that part is solved using several techniques such as the one in this question.
I know that this is possible with map reduce, but I want to know if it's possible with aggregation.
Edit: Here's an example,
The collection of items; each item is an actual document.
[
{ val: 1 },
{ val: 3 },
{ val: 6 },
{ val: 10 },
]
Grouped version:
[
[ { val: 1 }, { val: 3 } ],
[ { val: 3 }, { val: 6 } ],
[ { val: 6 }, { val: 10 } ]
]
The resulting collection (or aggregation result):
[
{ diff: 2 },
{ diff: 3 },
{ diff: 4 }
]
This is something that just cannot be done with the aggregation framework, and the only current MongoDB method available for this type of operation is mapReduce.
The reason being that the a aggregation framework has no way of referring to any other document in the pipeline than the present one. This actually applies to "grouping" pipeline stages as well, since even though things are grouped on a "key" you cant really deal with individual documents in the way you want to.
MapReduce on the other hand has one feature available that allows you to do what you want here, and it's not even "directly" related to aggregation. It is in fact the ability to have "globally scoped variables" across all stages. And having a "variable" to basically "store the last document" is all you need to achieve your result.
So it's quite simple code, and there is in fact no "reduction" required:
db.collection.mapReduce(
function () {
if (lastVal != null)
emit( this._id, this.val - lastVal );
lastVal = this.val;
},
function() {}, // mapper is not called
{
"scope": { "lastVal": null },
"out": { "inline": 1 }
}
)
Which gives you a result much like this:
{
"results" : [
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d662"),
"value" : 2
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d663"),
"value" : 3
},
{
"_id" : ObjectId("54a425a99b8bcd6f73e2d664"),
"value" : 4
}
],
"timeMillis" : 3,
"counts" : {
"input" : 4,
"emit" : 3,
"reduce" : 0,
"output" : 3
},
"ok" : 1
}
That's really just picking "something unique" as the emitted _id value rather than anything specific, because all this is really doing is the difference between values on differing documents.
Global variables are usually the solution to these types of "pairing" aggregations or producing "running totals". Right now the aggregation framework has no access to global variables, even though it might well be a nice this to have. The mapReduce framework has them, so it is probably fair to say that they should be available to the aggregation framework as well.
Right now they are not though, so stick with mapReduce.
The following query in mongo, behaves strange :
db.items.findOne({},{ "List": { "$slice": [ skip, 3 ] }})
First:
Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Second:
if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
I would expect for:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3,
4,
5
]
"other" : "not_important"
}
query:
db.items.findOne({},{ "List": { "$slice": [-10, 3 ] }})
to get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : []
}
instead, I get:
{
"_id" : ObjectId("542babf265f5de9a0d5c2928"),
"List" : [
1,
2,
3
]
"other" : "not_important"
}
Why?
I use mongoDB 2.4.10
Second: if skip is negative and |skip| is higher than list.length then it returns the first three elements as though skip==0
Yes. That is how the javascript Array.prototype.slice() method works, which is internally used by mongodb.
According to the ECMAScript® Language Specification,
If relativeStart is negative, let k be max((len + relativeStart),0);
else let k be min(relativeStart, len).
In your case relativeStart is -10,
k = max((-10+5),0), k = 0; (where, 5 is the length of your array).
Hence k or skip will always be 0, in these cases.
First: Instead of returning one object with ["_id","List"] keys only, it returns a full object.
Yes, the projection operator works that way. Unless a inclusion or exclusion is explicitly specified in the projection parameter, the whole document is retrieved with the projection operators such as $slice,$elemmatch being applied.
db.items.findOne({},{"_id":1,"List": { "$slice": [-10, 3 ] }})
would return:
{ "_id" : ObjectId("542babf265f5de9a0d5c2928"), "List" : [ 1, 2, 3 ] }
The second parameter to the findOne() method is not only for simple projection purpose, fields are not projected, only if any one of the field names have a value of 0 or 1 against them. If not the whole document is returned. If any field has a projection operator to be applied, it would be applied and projected.
The projection mechanism seems to happen in the below manner, whenever the $slice operator is involved.
By default all the fields would be included for projection.
By Default all the fields whose values are derived based on the projection operator, $slice, if truthy, are always displayed, irrespective of the below.
Steps taking place for exclusion or inclusion.
The list of fields specified in the projection parameter are accumulated in their specified order.
For only the first field encountered with value '0' or '1':
If the
field has a value '0' - then it is excluded, and all the remaining
fields are marked to be included.
If a field has '1' - then it is included, and all the remaining fields
are marked to be excluded.
For all the subsequent fields, they are excluded or included based on
their values.
Whilst this behavior is by design for the $slice operator, it is possible since MongoDB 3.2 to evaluate this and alter the result with the aggregation operator for $slice:
Given the example documents:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2, 3, 4 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ 5, 6 ] }
If given a conditional expression to test against the length of the array with $size and only perform the $slice when the reverse index was greater than or equal to that length, or otherwise return an empty array:
db.collection.aggregate([
{ "$project": {
"a": {
"$cond": {
"if": { "$gte": [ { "$size": "$a" }, 4 ] },
"then": { "$slice": [ "$a", -4, 2 ] },
"else": { "$literal": [] },
}
}
}}
])
Then of course you get:
{ "_id" : ObjectId("5922846dbcf60428d0f69f6e"), "a" : [ 1, 2 ] }
{ "_id" : ObjectId("5922847cbcf60428d0f69f6f"), "a" : [ ] }
So that is how you could get MongoDB to return a "slice" that acts in this way.
My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.
H,
I'm trying to update the version field in this object but I'm not able to make a query with 2 nested $match. So what I would like to do is get the record with file id 12 and version 1.
I would ask also if is it a good practice have more the one nested array in mongoDB (like this object)...
Query:
db.collection.find({"my_uuid":"434343"},{"item":{$elemMatch:{"file_id":12,"changes":{$elemMatch:{"version":1}}}}}).pretty()
Object:
{
"my_uuid": "434343",
"item": [
{
"file_id": 12,
"no_of_versions" : 1,
"changes": [
{
"version": 1,
"commentIds": [
4,
5,
7
]
},
{
"version": 2,
"commentIds": [
10,
11,
15
]
}
]
},
{
"file_id": 234,
"unseen_comments": 3,
"no_of_versions" : 2,
"changes": [
{
"version": 1,
"commentIds": [
100,
110,
150
]
}
]
}
]
}
Thank you
If you want the entire documents that satisfy the criteria returned in the result, then I think it's fine. But if you want to limit the array contents of item and changes to just the matching elements, then it could be a problem. That's because, you'll have to use the $ positional operator in the projection to limit the contents of the array and only one such operator can appear in the projection. So, you'll not be able to limit the contents of multiple arrays within the document.