get all the documents having max value using aggregation in mongodb - mongodb

I want to fetch "all the documents" having highest value for specific field and than group by another field.
Consider below data:
_id:1, country:india, quantity:12, name:xyz
_id:2, country:USA, quantity:5, name:abc
_id:3, country:USA, quantity:6, name:xyz
_id:4, country:india, quantity:8, name:def
_id:5, country:USA, quantity:10, name:jkl
_id:6, country:india, quantity:12, name:jkl
Answer should be
country:india max-quantity:12
name xyz
name jkl
country:USA max-quantity:10
name jkl
I have tried several queries, but I can get only the max value without the name or i can go group by but it shows all the values.
db.coll.aggregate([{
$group:{
_id:"$country",
"maxQuantity":{$max:"$quantity"}
}
}])
for example above will give max quantity on every country but how to combine with other field such that it shows all the documents of max quantity.

If you want to keep document information, then you basically need to $push it into an array. But of course, then having your $max values, you need to filter the contents of the array for just the elements that match:
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$setDifference": [
{ "$map": {
"input": "$docs",
"as": "doc",
"in": {
"$cond": [
{ "$eq": [ "$maxQuantity", "$$doc.quantity" ] },
"$$doc",
false
]
}
}},
[false]
]
}
}}
])
So you store everything in an array and then test each array member to see if it's value matches the one that was recorded as the maximum, discarding any that do not.
I'd keep the _id values in the array documents since that is what makes them "unique" and won't be adversely affected by $setDifference when filtering out values. But of course if "name" is always unique then it won't be required.
You can also just return whatever fields you want from $map, but I'm just returning the whole document for example.
Keep in mind that this has the limitation of not exceeding the BSON size limit of 16MB, so is okay for small data samples, but anything producing a potentially large list ( since you cannot pre-filter array content ) would be better of processed with a separate query to find the "max" values, and another to fetch the matching documents.

I know how to do similar task simpler only if you alter specific range of countries:
[
{"$match":{"name":{"$in":["USA","india"]}}}, // stage one
{ "$sort": { "quanity": -1 }}, // stage three
{"$limit":2 } // stage four - count equal ["USA","india"] length
]
If you need all countries try follow, but without guaranties from me:
[
{"$project": {
"country": "$country",
"quantity": "$quantity",
"document": "$$ROOT" // save all fields for future usage
}},
{ "$sort": { "quantity": -1 }},
{"$group":{"_id":{"country":"$country"},"original_doc":{"$first":"$document"} }}
]

Another way can be like:
db.coll.aggregate(
[
{
$sort:{ country: -1, "quantity": -1 }
},
{
"$group":
{
"_id":{ "country": "$country" },
"data":{ "$first": "$$ROOT" }
}
}
])

Another possibility close to Blakes Seven's solution to simplify a bit the setDifference + map part by a filter of the array of documents.
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$filter": {
"input": "$docs",
"as": "doc",
"cond": { $eq: ["$$doc.quantity", "$maxQuantity"] }
}
}
}}
])

Related

MongoDB find count of all possible 'columns' within a collection

Is there a way to find all possible number of 'columns' or json properties available in a collection? (I know it's not correct to call them columns, but just for the ease of understanding)
For example, all the following documents are in the same collection called 'people':
{"Name": "bob", "Profession": "IT", "Height": 200},
{"Name": "simon", "Weight": 100, "IQ": 120},
{"Name": "james", "Weight": 130, "Glasses": "Yes"}
The possible 'columns' here are: Name, Profession, Height, Weight, IQ and Glasses. A total of 6.
Is there any way I can do an operation which gets this count of 6? (extra useful if there's also a pymongo variant)
I'm wanting to transfer data from MongoDB into a table format, and knowing the overall number of columns the table can have is useful.
You can use this aggregation query to get your desired result:
The trick here is to use $objectToArray to get the keys as values. Then remove the key _id (if exists) and group to get the total.
db.collection.aggregate([
{
"$project": {
"keys": {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": "$keys"
},
{
"$match": {
"keys.k": {
"$ne": "_id"
}
}
},
{
"$group": {
"_id": "$keys.k",
"total": {
"$sum": 1
}
}
},
{
"$group": {
"_id": null,
"total": {
"$sum": 1
}
}
}
])
Example here
Edit:
Another way to avoid $unwind and double $group id this query:
The idea is the same as before, use $objectToArray to get the keys as key.k and then $group all values and add into an array.
Then get the size of the array after to do some calculations: A $reduce to flatten the array and $filter to not get the _id field.
Note that if you want to count the _id you can simply remove the $filter stage like this example
db.collection.aggregate([
{
"$project": {
"keys": {
"$objectToArray": "$$ROOT"
}
}
},
{
"$group": {
"_id": null,
"keys": {
"$addToSet": "$keys.k"
}
}
},
{
"$project": {
"_id": 0,
"keys": {
"$size": {
"$filter": {
"input": {
"$reduce": {
"input": "$keys",
"initialValue": [],
"in": {
"$setUnion": [
"$$value",
"$$this"
]
}
}
},
"cond": {
"$ne": [
"$$this",
"_id"
]
}
}
}
}
}
}
])
Example here

How to convert an array of documents to two dimensions array

I am making a query to MongoDB
db.getCollection('user_actions').aggregate([
{$match: {
type: 'play_started',
entity_id: {$ne: null}
}},
{$group: {
_id: '$entity_id',
view_count: {$sum: 1}
}},
])
and getting a list of docs with two fields:
How can I get a list of lists with two items like
[[entity_id, view_count], [entity_id, view_count], ...]
Actually there are two different way to do this, depending on your MongoDB server version.
The optimal way is in MongoDB 3.2 using the square brackets [] to directly create new array fields in the $project stage. This return an array for each group. The next stage is the another $group stage where you group your document and use the $push accumulator operator to return a two dimensional array.
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": [ "$_id", "$view_count" ]
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
From MongoDB 2.6 and prior to 3.2 you need a different approach. In order to create your array you need to use the $map operator. Because the $map "input" field must resolves to and array you need to use $literal operator to set a literal array value to input. Of course the $cond operator here returns the "entity_id" or "view_count" accordingly to the "boolean-expression".
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": {
"$map": {
"input": { "$literal": [ "A", "B"] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$_id",
"$view_count"
]
}
}
}
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
It worth noting that this will also work in MongoDB 2.4. If you are running MongoDB 2.2, you can use the undocumented $const operator which does the same thing.

MongoDB: How to Get the Lowest Value Closer to a given Number and Decrement by 1 Another Field

Given the following document containing 3 nested documents...
{ "_id": ObjectId("56116d8e4a0000c9006b57ac"), "name": "Stock 1", "items" [
{ "price": 1.50, "description": "Item 1", "count": 10 }
{ "price": 1.70, "description": "Item 2", "count": 13 }
{ "price": 1.10, "description": "Item 3", "count": 20 }
]
}
... I need to select the sub-document with the lowest price closer to a given amount (here below I assume 1.05):
db.stocks.aggregate([
{$unwind: "$items"},
{$sort: {"items.price":1}},
{$match: {"items.price": {$gte: 1.05}}},
{$group: {
_id:0,
item: {$first:"$items"}
}},
{$project: {
_id: "$item._id",
price: "$item.price",
description: "$item.description"
}}
]);
This works as expected and here is the result:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 20
}
],
"ok" : 1
Alongside returning the item with the lowest price closer to a given amount, I need to decrement count by 1. For instance, here below is the result I'm looking for:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 19
}
],
"ok" : 1
It depends on whether you actually want to "update" the result or simply "return" the result with a decremented value. In the former case you will of course need to go back to the document and "decrement" the value for the returned result.
Also want to note that what you "think" is efficient here is actually not. Doing the "filter" of elements "post sort" or even "post unwind" really makes no difference at all to how the $first accumulator works in terms of performance.
The better approach is to basically "pre filter" the values from the array where possible. This reduces the document size in the aggregation pipeline, and the number of array elements to be processed by $unwind:
db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
Of course that does require a MongoDB version 2.6 or greater server to have the available operators, and going by your output you may have an earlier version. If that is the case then at least loose the $match as it does not do anything of value and would be detremental to performance.
Where a $match is useful, is in the document selection before you do anything, as what you always want to avoid is processing documents that do not even possibly meet the conditions you want from within the array or anywhere else. So you should always $match or use a similar query stage first.
At any rate, if all you wanted was a "projected result" then just use $subtract in the output:
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description",
"count": { "$subtract": [ "$item.count", 1 ] }
}}
If you wanted however to "update" the result, then you would be iterating the array ( it's still an array even with one result ) to update the matched item and "decrement" the count via $inc:
var result = db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
And on a MongoDB 2.4 shell, your same aggregate query applies ( but please make the changes ) however the result contains another field called result inside it with the array, so add the level:
result.result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
So either just $project for display only, or use the returned result to effect an .update() on the data as required.

Mongodb: Aggregation : sum up values in an array before $group

I have a collection of documents with the following structure:
{
_id: 1,
array: [
{value: 10 },
{value: 11 },
{value: 12 }
]
}
I want make an aggregate query on the collection:
get the proportion of each item. (i.e. for example the proportion of item 1 would be value of item 1 divided by the sum of the values of all three items.
Note: I want to do this within a single query.
The basic idea here is to $unwind the array, $group the document and then apply to each array member. This works better for MongoDB 2.6 or greater due to the $map operator:
db.collection.aggregate([
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": { "$push": "$array" },
"total": { "$sum": "$array.value" }
}},
{ "$project": {
"array": {
"$map": {
"input": "$array",
"as": "el",
"in": {
"value": "$$el.value",
"prop": {
"$divide": [ "$$el.value", "$total" ]
}
}
}
}
}}
])
Or with earlier versions:
db.collection.aggregate([
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": { "$push": "$array" },
"total": { "$sum": "$array.value" }
}},
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": {
"$push": {
"value": "$array.value",
"prop": {
"$divide": [ "$array.value", "$total" ]
}
}
}
}}
])
In either case, if you are not actually "aggregating" anything beyond the document, it is far more efficient to do this calculation in client code. The $unwind here can get very costly due to what it does.
Also if you just stored the "total" as another element, then the simple $project is all that you need, which comes at very little cost by itself. Keeping a total on updates is just simple usage of the $inc operator as you $push new elements to the array.
Here is the aggregation pipeline you need:
[
{$unwind: '$array'},
{
$group: {
_id: '$_id',
array: {$push: '$array'},
sum: {$sum: '$array.value'}
}
},
{$unwind: '$array'},
{
$project: {
_id: 1,
'array.value': 1,
'array.proportion': {
$divide: ['$array.value', '$sum']
}
}
}
]

MongoDB aggregation: Project separate document fields into a single array field

I have a document like this:
{fax: '8135551234', cellphone: '8134441234'}
Is there a way to project (without a group stage) this document into this:
{
phones: [{
type: 'fax',
number: '8135551234'
}, {
type: 'cellphone',
number: '8134441234'
}]
}
I could probably use a group stage operator for this, but I'd rather not if there's any other way, because my query also projects several other fields, all of which would require a $first just for the group stage.
Hope that's clear. Thanks in advance!
MongoDB 2.6 Introduces the the $map operator which is an array transformation operator which can be used to do exactly this:
db.phones.aggregate([
{ "$project": {
"phones": { "$map": {
"input": { "$literal": ["fax","cellphone"] },
"as": "el",
"in": {
"type": "$$el",
"number": { "$cond": [
{ "$eq": [ "$$el", "fax" ] },
"$fax",
"$cellphone"
]}
}
}}
}}
])
So your document now looks exactly like you want. The trick of course to to create a new array with members "fax" and "cellphone", then transform that array with the new document fields by matching those values.
Of course you can also do this in earlier versions using $unwind and $group in a similar fashion, but just not as efficiently:
db.phones.aggregate([
{ "$project": {
"type": { "$const": ["fax","cellphone"] },
"fax": 1,
"cellphone": 1
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "_id",
"phones": { "$push": {
"type": "$type",
"number": { "$cond": [
{ "$eq": [ "$type", "fax" ] },
"$fax",
"$cellphone"
]}
}}
}}
])
Of course it can be argued that unless you are doing some sort of aggregation then you may as well just post process the collection results in code. But this is an alternate way to do that.