MongoDB min/max aggregation - mongodb

I've got documents with this simplified schema :
{
positon: 10,
value: 5,
count: 3
}
What I'd like to compute, is to group those documents by position and find the maximum value where the count is greater than 4 but with value less than the minimum value where the count is less than 4.
Here what I've done, but it does not work :
{ $group: {
_id: {
position: "$position",
},
result: {$max: { $cond: [ {$and: [ {$gte: ["$count", 4]},
{$lt: ["$value", {$min: { $cond: [ {$lt: ["$count", 4]},
{ value: "$value" },
10]
}
}]
}]},
{ value: "$value", nb: "$count"},
0]
}
}
}
}
I am said that $minis an invalid operator and I cant figure out how to write the right aggregation function. Would it be better to run a mapreduce ?
If for example I have those documents
{Position: 10, value: 1, count 5}
{Position: 10, value: 3, count 3}
{Position: 10, value: 4, count 5}
{Position: 10, value: 7, count 4}
I'd like the reslt to be
{Position: 10, value: 1, count 4}
As it is the maximum of 'value' where count is greater than 4 but also as there is a value of 3 that has only 3 counts so that the value 4 is not what I'm looking for.

That is a bit of a mouthful to say the least but I'll have another crack at explaining it:
You want:
For each "Position" value find the document whose "value" is less than the the largest "value" of the document with a "count" of less than four, whose own "count" is actually greater than 4.
Which reads like a math exam problem designed to confuse you with the logic. But catching that meaning then you perform the aggregation with the following steps:
db.positions.aggregate([
// Separate the values greater than and less than 4 by "Position"
{ "$group": {
"_id": "$Position",
"high": { "$push": {
"$cond": [
{ "$gt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}},
"low": { "$push": {
"$cond": [
{ "$lt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}}
}},
// Unwind the "low" counts array
{ "$unwind": "$low" },
// Find the "$max" value from the low counts
{ "$group": {
"_id": "$_id",
"high": { "$first": "$high" },
"low": { "$min": "$low.value" }
}},
// Unwind the "high" counts array
{ "$unwind": "$high" },
// Compare the value to the "low" value to see if it is less than
{ "$project": {
"high": 1,
"lower": { "$lt": [ "$high.value", "$low" ] }
}},
// Sorting, $max won't work over multiple values. Want the document.
{ "$sort": { "lower": -1, "high.value": -1 } },
// Group, get the highest order document which was on top
{ "$group": {
"_id": "$_id",
"value": { "$first": "$high.value" },
"count": { "$first": "$high.count" }
}}
])
So from the set of documents:
{ "Position" : 10, "value" : 1, "count" : 5 }
{ "Position" : 10, "value" : 3, "count" : 3 }
{ "Position" : 10, "value" : 4, "count" : 5 }
{ "Position" : 10, "value" : 7, "count" : 4 }
Only the first is returned in this case as it's value is less than the "count of three" document where it's own count is greater than 4.
{ "_id" : 10, "value" : 1, "count" : 5 }
Which I am sure is what you actually meant.
So the application of $min and $max really only applies when getting discrete values from documents out of a grouping range. If you are interested in more than one value from the document or indeed the whole document, then you are sorting and getting the $first or $last entries on the grouping boundary.
And aggregate is much faster than mapReduce as it uses native code without invoking a JavaScript interpreter.

Related

How to sort by a foreign field, the foreign field not using alphabetical/numerical order? [duplicate]

Following this question which #NeilLunn has gracefully answered, here is my problem in more detail.
This is the set of documents, some have user_id some don't. The user_id represent the user who created the document:
{ "user_id" : 11, "content" : "black", "date": somedate }
{ "user_id" : 6, "content" : "blue", "date": somedate }
{ "user_id" : 3, "content" : "red", "date": somedate }
{ "user_id" : 4, "content" : "black", "date": somedate }
{ "user_id" : 4, "content" : "blue", "date": somedate }
{ "user_id" : 90, "content" : "red", "date": somedate }
{ "user_id" : 7, "content" : "orange", "date": somedate }
{ "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
...
{ "user_id" : 4, "content" : "orange", "date": somedate }
{ "user_id" : 1, "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
{ "user_id" : 90, "content" : "purple", "date": somedate }
The front end is pulling pages, so each page will have 10 items and I do that with limit and skip and it is working very well.
In case we have a logged in user, I would like to display to that current logged in user documents which he may find more interesting first, based on the users he interacted with.
The list of users which the current user may find interesting is sorted by score and is located outside of mongo. So the first element is the most important user which I would like to show his documents first, and the last user on the list is the least important.
The list is a simple array which looks like this:
[4,7,90,1].
The system which created this user score is not located within mongo, but I can copy the data if that will help. I can also change the array to include a score number.
What I would like accomplish is the following:
Get the documents sorted by importance of the user_id from the list, so that documents from user_id 4 will be the first to show up, documents from user_id 7 second and so on. When where are no users left on the list I would like to show the rest of the documents. Like this:
all documents with user_d:4
all documents with user_d:7
all documents with user_d:90
all documents with user_d:1
all the rest of the documents
How should I accomplish this? Am I asking too much from mongo?
Given the array [4,7,90,1] what you want in your query is this:
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$or": [
{ "$eq": ["$user_id": 4] },
{ "$eq": ["$user_id": 7] },
{ "$eq": ["$user_id": 90] },
{ "$eq": ["$user_id": 1] },
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
So what that does is, for every item contained in that $or condition, the user_id field is tested against the supplied value, and $eq returns 1 or 0 for true or false.
What you do in your code is for each item you have in the array you build the array condition of $or. So it's just creating a hash structure for each equals condition, passing it to an array and plugging that in as the array value for the $or condition.
I probably should have left the $cond operator out of the previous code so this part would have been clearer.
Here's some code for the Ruby Brain:
userList = [4, 7, 90, 1];
orCond = [];
userList.each do |userId|
orCond.push({ '$eq' => [ 'user_id', userId ] })
end
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => { '$or' => orCond }
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
If you want to have individual weights and we'll assume key value pairs, then you need to nest with $cond :
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
Note that it's just a return value, these do not need to be in order. And you can think about the generation of that.
For generating this structure see here:
https://stackoverflow.com/a/22213246/2313887
Since mongoDB version 3.2 we can use a $filter which make this much easier to maintain in case there are more than 4 scores:
db.collection.aggregate([
{
$addFields: {
weight: [
{key: 4, score: 10}, {key: 8, score: 9}, {key: 90, score: 8}, {key: 1, score: 7}
]
}
},
{
$addFields: {
weight: {
$filter: {
input: "$weight",
as: "item",
cond: {$eq: ["$$item.key", "$user_id"]}
}
}
}
},
{
$set: {
weight: {
$cond: [{$eq: [{$size: "$weight"}, 1]}, {$arrayElemAt: ["$weight", 0]}, {score: 1}]
}
}
},
{$set: {weight: "$weight.score"}},
{$sort: {weight: -1, date: -1}}
])
See how it works on the playground example

Count and group on occurrences of keys and their values

I have a MongoDB collection that looks like this:
[{
"installer": "anthony",
"tester": "bob"
}, {
"installer": "chris",
"tester": "anthony"
}, {
"installer": "bob",
"tester": "dave"
}, {
"installer": "anthony",
"tester": "chris"
}, {
"installer": "chris",
"tester": "dave"
}
]
I am trying to use aggregate so I can count how many times each name appears within each field and retrieve the following result:
[{
"name": "anthony",
"installer": 2,
"tester": 1
}, {
"name": "bob",
"installer": 1,
"tester": 1
}, {
"name": "chris",
"installer": 2,
"tester": 1
}, {
"name": "dave",
"installer": 0,
"tester": 2
}
]
This is the query that I have completed so far, the problem is that it returns only the name and installer count without the tester count. I could run this query twice (one for installer and one for tester) but I would like to find a way how to return both counts at once.
db.data.aggregate([
{
"$group": {
"_id": "$installer",
"installer": { "$sum": 1 }
},
"$project": {
"name": "$_id",
"installer": 1,
"_id": 0
}
}
])
What changes to my query are needed so I can get both the installer and tester counts of each person?
You basically want $cond to select whether to pass 1 or 0 to the $sum accumulator in the $group pipeline, and an initial value as an "array" for both fields using $unwind to create a copy of the document for each person.
db.data.aggregate([
{ "$addFields": {
"val": ["$installer","$tester"]
}},
{ "$unwind": "$val" },
{ "$group": {
"_id": { "_id": "$_id", "val": "$val" },
"installer": {
"$max": {
"$cond": [
{ "$eq": ["$installer","$val"] },
1,
0
]
}
},
"tester": {
"$max": {
"$cond": [
{ "$eq": ["$tester","$val"] },
1,
0
]
}
}
}},
{ "$group": {
"_id": "$_id.val",
"installer": { "$sum": "$installer" },
"tester": { "$sum": "$tester" }
}}
])
To counter the case where a given document could have both the same "installer" and "tester" values we actually should aggregate on the "document" per the emitted "val" as a first step. Using the $cond inside a $max accumulator makes this case a "single" document instead of "two", being one for each array entry.
The other case of course is to simply return the "set" of values by applying $setUnion against the initial list to avoid the duplication in such an instance:
db.data.aggregate([
{ "$addFields": {
"val": { "$setUnion": [["$installer","$tester"]] }
}},
{ "$unwind": "$val" },
{ "$group": {
"_id": "$val",
"installer": {
"$sum": {
"$cond": [
{ "$eq": ["$installer","$val"] },
1,
0
]
}
},
"tester": {
"$sum": {
"$cond": [
{ "$eq": ["$tester","$val"] },
1,
0
]
}
}
}}
])
I added a document to your source as :
{ "installer": "jack", "tester": "jack" }
In order to illustrate the result.
As for $cond, it is a "ternary" or if..then..else condition, where the arguments are "first" if for a condition to evaluate as Boolean, then being the value to return when true and else as a value to return when the condition is false.
It can be alternately written like:
"$cond": {
"if": { "$eq": ["$installer","$val"] },
"then": 1,
"else": 0
}
But the original "array" syntax is a bit more brief to write for simple expressions. Most people would still recognize the "ternary" for what it is, but if you think it makes the code clearer then you can used the "named keys" form instead.
The result of course is the 1 is only returned when the field is present in the document, giving the correct counts:
/* 1 */
{
"_id" : "jack",
"installer" : 1.0,
"tester" : 1.0
}
/* 2 */
{
"_id" : "dave",
"installer" : 0.0,
"tester" : 2.0
}
/* 3 */
{
"_id" : "bob",
"installer" : 1.0,
"tester" : 1.0
}
/* 4 */
{
"_id" : "chris",
"installer" : 2.0,
"tester" : 1.0
}
/* 5 */
{
"_id" : "anthony",
"installer" : 2.0,
"tester" : 1.0
}
Adding the initial "array" to the document can alternately be done using $project if your MongoDB version does not support $addFields. The only difference is "explicitly" including the other fields that are required later:
{ "$project": {
"tester": 1,
"installer": 1,
"val": { "$setUnion": [["$installer","$tester"]] }
}}
And if your MongoDB is still actually older than MongoDB 3.2 which allows that notation of an "array", then you can use $map instead from MongoDB 2.6 and upwards:
{ "$project": {
"tester": 1,
"installer": 1,
"val": {
"$setUnion": [
{ "$map": {
"input": ["A","B"],
"as": "a",
"in": {
"$cond": [{ "$eq": ["$$a", "A"] }, "$installer", "$tester"]
}
}
]
}
}}
Again using $cond to alternately select which value to present as the array elements.
Also, you really should avoid doing things like adding a $project to the end of statements. You can of course do it, but it does mean that all results of the previous pipeline stage are being "run through again" in order to make the additional changes. For something as trivial as changing "_id" to "name", it's generally better practice to simply accept that the "grouping key" is called _id and leave it at that.
As the result of $group, it actually is the "unique identifier" for which _id is the common nomenclature.

MongoDB Aggregation pipeline problems

I'm new to mongoDB and am having difficulty getting my head around aggregation pipelines.
I have created a database that holds information regarding my stock trading. In a cut down version one document from my portfolio collection looks a bit like this
{
"date" : 2015-12-31T15:50:00.000Z,
"time" : 1550,
"aum" : 1000000,
"basket" :[
{
"_id" : "Microsoft",
"shares" : 10,
"price" : 56.53,
"fx" : 1.0
},
.
.
.
{
"_id" : "GOOG.N",
"shares" : 20,
"price" : 759.69,
"fx" : 1.0
}
]
So, for each day, I keep track of my assets under management (aum) and a list of all the positions I hold with the current price. What I need to do is to calculate the daily net and gross exposure for the portfolio as a percentage of aum. Net exposure is simply:
sum(shares*price*fx)/aum
over all the stocks. Gross exposure is:
abs(shares*price*fx)/aum
(a negative position means a short position). I need to do this as a single query using the aggregation framework. I have tried numbers of queries but none seem to work so clearly I'm just wandering around in the dark. Can anyone give some guidance?
My query looks like this
db.strategy.aggregate(
// Pipeline
[
// Stage 1
{
$project: {
"_id": 0,
"date":1,
"time":1,
"aum":1,
"strategyName":1,
"gExposure": {$divide: ["$grossExposure","$aum"]}
}
},
// Stage 2
{
$group: {
_id :{ date:"$date",time:"$time",strategyName:"$strategyName"},
grossExposure: { $sum: { $abs: {$multiply: [ "$basket.sysCurShares","$basket.price","$basket.fx" ] } }}
}
},
// Stage 3
{
$sort: {
"_id.date": 1, "_id.time": 1, "_id.strategyName": 1
}
}
]
);
The query runs but the calculated value is zero. My projection isn't working as I'd expect either as I would like all the data flattened to a two dimensional table.
Since the basket field is an array, you need to flatten it using $unwind before running the $group aggregate operation. Also, create a new field in the $project that holds the exposure before the $group pipeline. Continuing from your previous attempt, you could try the following pipeline:
db.strategy.aggregate([
{ "$unwind": "$basket" },
{
"$project": {
"date": 1,
"time": 1,
"strategyName": 1,
"exposure": {
"$multiply": ["$basket.sysCurShares", "$basket.price", "$basket.fx"]
}
}
},
{
"$group": {
"_id": {
"date": "$date",
"time": "$time",
"strategyName": "$strategyName"
},
"totalExposure": { "$sum": "$exposure" },
"aum": { "$first": "$aum" }
}
},
{
"$project": {
"_id": 0,
"date": "$_id.date",
"time": "$_id.time",
"strategyName": "$_id.strategyName",
"netExposure": { "$divide": ["$totalExposure", "$aum"] },
"grossExposure": {
"$abs": { "$divide": ["$totalExposure", "$aum"] }
}
}
},
{ "$sort": { "date": 1, "time": 1, "strategyName": 1 } }
]);
you can do the same with mongodb 3.4 in single stage
db.strategy.aggregate([
{
$project:{
"date": 1,
"time": 1,
"strategyName": 1,
"netExposure":{ "$divide": [{"$reduce":{input:"$basket",initialValue:0,in:{$add:[{$multiply: ["$$this.fx","$$this.shares","$$this.price"]},"$$value"]}}}, "$aum"] },
"grossExposure":{"$abs":{ "$divide": [{"$reduce":{input:"$basket",initialValue:0,in:{$add:[{$multiply: ["$$this.fx","$$this.shares","$$this.price"]},"$$value"]}}}, "$aum"] }}
},
{ "$sort": { "date": 1, "time": 1, "strategyName": 1 } }
]);

MongoDB multidimensional array projection

I just started learning MongoDB and can't find a solution for my problem.
Got that document:
> db.test.insert({"name" : "Anika", "arr" : [ [11, 22],[33,44] ] })
Please note the "arr" field which is a multidimensional array.
Now I'm looking for a query that returns only the value of arr[0][1] which is 22. I tried to achieve that by using $slice, however I don't know how to address the second dimension with that.
> db.test.find({},{_id:0,"arr":{$slice: [0,1]}})
{ "name" : "ha", "arr" : [ [ 11, 22 ] ] }
I also tried
> db.test.find({},{_id:0,"arr":{$slice: [0,1][1,1]}})
{ "name" : "ha", "arr" : [ [ 11, 22 ] ] }
The desired output would be either
22
or
{"arr":[[22]]}
Thank you
EDIT:
After reading the comments I think that I've simplified the example data too much and I have to provide more information:
There are many more documents in the collection like that one that
I've provided. But they all have the same structure.
There are more array elements than just two
In the real world the array contains really long texts (500kb-1mb),
so it is very expansive to transmit the whole data to the client.
Before the aggregation I will do a query by the 'name' field. Just
skipped that in the example for the sake of simplicity.
The target indexes are variable, so sometimes I need to know the
value of arr[0][1], the next time it is arr[1][4]
example data:
> db.test.insert({"name" : "Olivia", "arr" : [ [11, 22, 33, 44],[55,66,77,88],[99] ] })
> db.test.insert({"name" : "Walter", "arr" : [ [11], [22, 33, 44],[55,66,77,88],[99] ] })
> db.test.insert({"name" : "Astrid", "arr" : [ [11, 22, 33, 44],[55,66],[77,88],[99] ] })
> db.test.insert({"name" : "Peter", "arr" : [ [11, 22, 33, 44],[55,66,77,88],[99] ] })
example query:
> db.test.find({name:"Olivia"},{"arr:"...})
You can use the aggregation framework:
db.test.aggregate([
{ $unwind: '$arr' },
{ $limit: 1 },
{ $project: { _id: 0, arr: 1 } },
{ $unwind: '$arr' },
{ $skip: 1 },
{ $limit: 1 }
])
Returns:
{ "arr": 22 }
Edit: The original poster has modified my solution to suit his needs and came up with the following:
db.test.aggregate([
{ $match: { name:"Olivia" } },
{ $project: { _id: 0,arr: 1 } },
{ $unwind: '$arr' },
{ $skip: 1 },
{ $limit:1 },
{ $unwind: "$arr" },
{ $skip: 2 },
{ $limit: 1 }
])
This query will result in { arr: 77 } given the extended data provided by the OP. Note that $skip and $limit are needed to select the right elements in the array hierarchy.
The $slice form you ask for does not do multi-dimentional arrays. Each array is considered individually, and is therefore not supported that way by the current $slice.
As such it is actually done a lot shorter on indexed "first" and "last" values than has been suggested using .aggregate(), and presently:
db.test.aggregate([
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$first": "$arr" }
}},
{ "$unwind": "$arr" },
{ "$group": {
"_id": "$_id",
"arr": { "$last": "$arr" }
}}
])
But in future releases of MongoDB ( currently works in development branch 3.18 as of writing ) you have $arrayElemAt as an operator for the aggregation framework which works like this:
db.test.aggregate([
{ "$project": {
"arr": {
"$arrayElemAt": [
{ "$arrayElemAt": [ "$arr", 0 ] },
1
]
}
}}
])
Both basically come to the same { "arr": 22 } result, though the future available form works quite flexibly on array index values, rather than first and last.

Mongo: how to sort by external weight

Following this question which #NeilLunn has gracefully answered, here is my problem in more detail.
This is the set of documents, some have user_id some don't. The user_id represent the user who created the document:
{ "user_id" : 11, "content" : "black", "date": somedate }
{ "user_id" : 6, "content" : "blue", "date": somedate }
{ "user_id" : 3, "content" : "red", "date": somedate }
{ "user_id" : 4, "content" : "black", "date": somedate }
{ "user_id" : 4, "content" : "blue", "date": somedate }
{ "user_id" : 90, "content" : "red", "date": somedate }
{ "user_id" : 7, "content" : "orange", "date": somedate }
{ "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
...
{ "user_id" : 4, "content" : "orange", "date": somedate }
{ "user_id" : 1, "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
{ "user_id" : 90, "content" : "purple", "date": somedate }
The front end is pulling pages, so each page will have 10 items and I do that with limit and skip and it is working very well.
In case we have a logged in user, I would like to display to that current logged in user documents which he may find more interesting first, based on the users he interacted with.
The list of users which the current user may find interesting is sorted by score and is located outside of mongo. So the first element is the most important user which I would like to show his documents first, and the last user on the list is the least important.
The list is a simple array which looks like this:
[4,7,90,1].
The system which created this user score is not located within mongo, but I can copy the data if that will help. I can also change the array to include a score number.
What I would like accomplish is the following:
Get the documents sorted by importance of the user_id from the list, so that documents from user_id 4 will be the first to show up, documents from user_id 7 second and so on. When where are no users left on the list I would like to show the rest of the documents. Like this:
all documents with user_d:4
all documents with user_d:7
all documents with user_d:90
all documents with user_d:1
all the rest of the documents
How should I accomplish this? Am I asking too much from mongo?
Given the array [4,7,90,1] what you want in your query is this:
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$or": [
{ "$eq": ["$user_id": 4] },
{ "$eq": ["$user_id": 7] },
{ "$eq": ["$user_id": 90] },
{ "$eq": ["$user_id": 1] },
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
So what that does is, for every item contained in that $or condition, the user_id field is tested against the supplied value, and $eq returns 1 or 0 for true or false.
What you do in your code is for each item you have in the array you build the array condition of $or. So it's just creating a hash structure for each equals condition, passing it to an array and plugging that in as the array value for the $or condition.
I probably should have left the $cond operator out of the previous code so this part would have been clearer.
Here's some code for the Ruby Brain:
userList = [4, 7, 90, 1];
orCond = [];
userList.each do |userId|
orCond.push({ '$eq' => [ 'user_id', userId ] })
end
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => { '$or' => orCond }
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
If you want to have individual weights and we'll assume key value pairs, then you need to nest with $cond :
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
Note that it's just a return value, these do not need to be in order. And you can think about the generation of that.
For generating this structure see here:
https://stackoverflow.com/a/22213246/2313887
Since mongoDB version 3.2 we can use a $filter which make this much easier to maintain in case there are more than 4 scores:
db.collection.aggregate([
{
$addFields: {
weight: [
{key: 4, score: 10}, {key: 8, score: 9}, {key: 90, score: 8}, {key: 1, score: 7}
]
}
},
{
$addFields: {
weight: {
$filter: {
input: "$weight",
as: "item",
cond: {$eq: ["$$item.key", "$user_id"]}
}
}
}
},
{
$set: {
weight: {
$cond: [{$eq: [{$size: "$weight"}, 1]}, {$arrayElemAt: ["$weight", 0]}, {score: 1}]
}
}
},
{$set: {weight: "$weight.score"}},
{$sort: {weight: -1, date: -1}}
])
See how it works on the playground example