How can I return the minimum values from two subdocuments in a collection using MongoDB's aggregation pipeline? - mongodb

We have a bunch of products in a database with two types of monetary values attached to each. Each object has a manufacturer, a range and a description, and each object can have a monthly rental amount (for rental agreements), a monthly payment amount (for finance agreements) or both.
An example object would be:
{
"manufacturer": "Manufacturer A",
"range": "Range A",
"description": "Product Description",
"rentals": {
"initialRental": 1111.05,
"monthlyRental": 123.45,
"termMonths": 24
},
"payments": {
"deposit": 592.56,
"monthlyPayment": 98.76,
"finalPayment": 296.28,
"termMonths": 36
}
}
There can often be more than one object for a given manufacturer and range.
I'm looking for an aggregation pipeline that will return a list of the lowest monthly rental and the lowest monthly payment for each distinct manufacturer/range pair, but my limited knowledge of how to use the aggregation framework seems to be catching me out.
My intended result, if there were one distinct manufacturers with two distinct ranges, would be the following:
[
{
"manufacturer": "Manufacturer A",
"range": "Range A",
"minimumRental": 123.45,
"minimumPayment": 98.76
},
{
"manufacturer": "Manufacturer A",
"range": "Range B",
"minimumRental": 234.56,
"minimumPayment": 197.53
}
]
I'm using the following to try and achieve this, but I seem to be tripping up on the grouping and use of $min:
db.products.aggregate(
[
{
"$group": {
"_id": {
"manufacturer": "$manufacturer.name",
"range": "$range.name"
},
"rentals": {
"$addToSet": "$rentals.monthlyrental"
},
"payments": {
"$addToSet": "$payments.monthlypayment"
}
}
},
{
"$group": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"payments": "$payments"
},
"minimumRental": {
"$min": "$rentals"
}
}
},
{
"$project": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$minimumRental",
"payments": "$_id.payments"
}
}
},
{
"$group": {
"_id": {
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$_id.minimumRental"
},
"minimumPayment": {
"$min": "$_id.payments"
}
}
},
{
"$project": {
"_id": 0,
"manufacturer": "$_id.manufacturer",
"range": "$_id.range",
"minimumRental": "$_id.minimumRental",
"minimumPayment": "$minimumPayment"
}
}
]
)
It's worth noting, in the case with my test data, that I have deliberately not specified a rental for Range B, as there will be cases where rentals and/or payments are not both specified for a given range.
So, using the query above on my test data gives me the following:
{
"0" : {
"minimumPayment" : [
98.76
],
"manufacturer" : "Manufacturer A",
"range" : "Range A",
"minimumRental" : [
123.45
]
},
"1" : {
"minimumPayment" : [
197.53
],
"manufacturer" : "Manufacturer A",
"range" : "Range B",
"minimumRental" : []
}
}
This is close, but it appears that I'm getting an array instead of a minimum value. I get the impression that what I'm trying to do is possible, but I don't seem to be able to find any resources specific enough to use to find out what I'm doing wrong.
Thanks for reading.

It's a bit complex but there is a little to understand here. First case is simplify and then just find the smallest amount for each
db.collection.aggregate([
// Tag things with an A/B value11
{ "$project": {
"_id": {
"manufacturer": "$manufacturer.name",
"range": "$range.name",
},
"rental": "$rentals.monthlyRental",
"payment": "$payments.monthlyPayment"
"type": { "$literal": [ "R","P" ] }
}},
// Unwind that "type"
{ "$unwind": "$type" },
// Group conditionally on the type
{ "$group": {
"_id": {
"_id": "$_id",
"type": "$type"
},
"value": {
"$min": {
"$cond": [
{ "$eq": [ "$type", "R" ] },
"$rental",
"$payment"
]
}
}
}},
// Sort by type and amount
{ "$sort": { "_id.type": 1, "value": 1 } },
// Group by type only and just take the first after sort
{ "$group": {
"_id": "$_id.type",
"manufacturer": { "$first": "$_id._id.manufacturer" },
"range": { "$first": "$_id._id.range" }
}}
])
And that's basically it, just clean up fields as you need with a $project or deal with it in code.
Personally though I find that a bit sloppy and with a bit of overhead due to $unwind doing "A/B" values. A better approach would be to run each aggregation in parallel queries, then just merge the result to send to the client.
I could bang on all day about parallel queries, but the basic example was in an answer I gave recently, so read How to Group By Different Fields which shows the general technique for doing this already.

Related

How to group and a count fields in a mongoDB collection

I'm really new to mongodb coming from a sql background and struggling to work out how to run a simple report that will group a value from a nested document with a count and in a sort order with highest count first.
I've tried so many ways from what I've found online but I'm unable to target the exact field that I need for the grouping.
Here is the collection.
{
"_id": {
"$oid": "6005f95dbad14c0308f9af7e"
},
"title": "test",
"fields": {
"6001bd300b363863606a815e": {
"field": {
"_id": {
"$oid": "6001bd300b363863606a815e"
},
"title": "Title Two",
"datatype": "string"
},
"section": "Section 1",
},
"6001bd300b363863423a815e": {
"field": {
"_id": {
"$oid": "6001bd3032453453606a815e"
},
"title": "Title One",
"datatype": "string"
},
"section": "Section 1",
},
"6001bd30453534863423a815e": {
"field": {
"_id": {
"$oid": "6001bd300dfgdfgdf06a815e"
},
"title": "Title One",
"datatype": "string"
},
"section": "Section 1",
}
},
"sections": ["Section 1"]
}
The result I need to get from the above example would be:
"Title One", 2
"Title Two", 1
Can anyone please point me in the right direction? Thank you so much.
Having dynamic field names is usually a poor design.
Try this one:
db.collection.aggregate([
{ $set: { fields: { $objectToArray: "$fields" } } },
{ $unwind: "$fields" },
{ $group: { _id: "$fields.v.field.title", count: { $count: {} } } },
{ $sort: { count: -1 } }
])
Here's another way to do it. The $project throws away everything except for the deep-dive to "title". Then just $unwind and $sortByCount.
db.collection.aggregate([
{
"$project": {
"titles": {
"$map": {
"input": {
"$objectToArray": "$fields"
},
"in": "$$this.v.field.title"
}
}
}
},
{
"$unwind": "$titles"
},
{
"$sortByCount": "$titles"
}
])
Try it on mongoplayground.net.

How to create an array containing arrays on MongoDB

I'm trying to make a query to mongodb. I want to get an array containing [location, status] of every document.
This is how my collection looks like
{
"_id": 1,
"status": "OPEN",
"location": "Costa Rica",
"type": "virtual store"
},
{
"_id": 2,
"status": "CLOSED",
"location": "El Salvador"
"type": "virtual store"
},
{
"_id": 3,
"status": "OPEN",
"location": "Mexico",
"type": "physical store"
},
{
"_id": 4,
"status": "CLOSED",
"location": "Nicaragua",
"type": "physical store"
}
I made a query, using the aggregate framework, trying to get all documents that match that specific type of store.
{
{'$match': {
'type': { '$eq': "physical store"}
}
}
What I want is something like this:
{
{
'stores': [
["Mexico", "OPEN"],
["Nicaragua", "CLOSED"]
]
},
}
I tried with the $push but couldn't make it.
Could someone please guide me on how to do it.
Since { $push: ["$location", "$status"] } would give you the error The $push accumulator is a unary operator. You would have to work around it a bit by passing to it a single object that output your desired array. One way to do it would be:
[
{
"$match": {
"type": {
"$eq": "physical store"
}
}
},
{
"$group": {
"_id": null,
"stores": {
"$push": {
"$slice": [["$location", "$status"], 2]
}
}
}
}
]
If the given documents are not sub-documents, then below is the approach:
db.collection.find({
type: {
$eq: "physical store"
}
},
{
location: 1,
status: 1
})
MongoPlayGround link for the above
If, they are the part of a field (means they are sub-documents), then below is the approach:
db.collection.aggregate([
{
$project: {
stores: {
$filter: {
input: "$stores",
as: "store",
cond: {
$eq: [
"$$store.type",
"physical store"
]
}
}
}
}
},
{
$unwind: "$stores"
},
{
$project: {
location: "$stores.location",
status: "$stores.status",
_id: "$stores._id"
}
}
])
MongoPlayGround link for the above

MongoDB nested query using aggregate function

I have a collection "superpack", which has the nested objects. The sample document looks like below.
{
"_id" : ObjectId("56038c8cca689261baca93eb"),
"name": "Test sub",
"packs": [
{
"id": "55fbc7f6b0ce97a309b3cead",
"name": "Classic",
"packDispVal": "PACK",
"billingPts": [
{
"id": "55fbc7f6b0ce97a309b3ceab",
"name": "Classic 1 month",
"expiryVal": 1,
"amount": 20,
"topUps": [
{
"id": "55fbc7f6b0ce97a309b3cea9",
"name": "1 extra",
"amount": 8
},
{
"id": "55fbc7f6b0ce97a309b3ceaa",
"name": "2 extra",
"amount": 12
}
]
},
{
"id": "55fbc7f6b0ce97a309b3ceac",
"name": "Classic 2 month",
"expiryVal": 1,
"amount": 30,
"topUps": [
{
"id": "55fbc7f6b0ce97a309b3cea8",
"name": "3 extra",
"amount": 16
}
]
}
]
}
]
}
I need to query for the nested object topups with the id field and result should have only the selected topup object and its associated parent. I am expecting the output to like below, when i query it on topup id 55fbc7f6b0ce97a309b3cea9.
{
"_id" : ObjectId("56038c8cca689261baca93eb"),
"name": "Test sub",
"packs": [
{
"id": "55fbc7f6b0ce97a309b3cead",
"name": "Classic",
"packDispVal": "PACK",
"billingPts": [
{
"id": "55fbc7f6b0ce97a309b3ceab",
"name": "Classic 1 month",
"expiryVal": 1,
"amount": 20,
"topUps": [
{
"id": "55fbc7f6b0ce97a309b3cea9",
"name": "1 extra",
"amount": 8
}
]
}
]
}
]
}
I tried with the below aggregate query for the same. However its not returning any result. Can you please help me, what is wrong in the query?
db.superpack.aggregate( [{ $match: { "id": "55fbc7f6b0ce97a309b3cea9" } }, { $redact: {$cond: { if: { $eq: [ "$id", "55fbc7f6b0ce97a309b3cea9" ] }, "then": "$$KEEP", else: "$$PRUNE" }}} ])
Unfortunately $redact is not a viable option here based on the fact that with the recursive $$DESCEND it is basically looking for a field called "id" at all levels of the document. You cannot possibly ask to do this only at a specific level of embedding as it's all or nothing.
This means you need alternate methods of filtering the content rather than $redact. All "id" values are unique so their is no problem filtering via "set" operations.
So the most efficient way to do this is via the following:
db.docs.aggregate([
{ "$match": {
"packs.billingPts.topUps.id": "55fbc7f6b0ce97a309b3cea9"
}},
{ "$project": {
"packs": {
"$setDifference": [
{ "$map": {
"input": "$packs",
"as": "pack",
"in": {
"$let": {
"vars": {
"billingPts": {
"$setDifference": [
{ "$map": {
"input": "$$pack.billingPts",
"as": "billing",
"in": {
"$let": {
"vars": {
"topUps": {
"$setDifference": [
{ "$map": {
"input": "$$billing.topUps",
"as": "topUp",
"in": {
"$cond": [
{ "$eq": [ "$$topUp.id", "55fbc7f6b0ce97a309b3cea9" ] },
"$$topUp",
false
]
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": [{ "$size": "$$topUps"}, 0] },
{
"id": "$$billing.id",
"name": "$$billing.name",
"expiryVal": "$$billing.expiryVal",
"amount": "$$billing.amount",
"topUps": "$$topUps"
},
false
]
}
}
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": [{ "$size": "$$billingPts"}, 0 ] },
{
"id": "$$pack.id",
"name": "$$pack.name",
"packDispVal": "$$pack.packDispVal",
"billingPts": "$$billingPts"
},
false
]
}
}
}
}},
[false]
]
}
}}
])
Where after digging down to the innermost array that is being filtered, that then the size of each resulting array going outwards is tested to see if it is zero, and omitted from results where it is.
It's a long listing but it is the most efficient way since each array is filtered down first and within each document.
A not so efficient way is to pull apart with $unwind and the $group back the results:
db.docs.aggregate([
{ "$match": {
"packs.billingPts.topUps.id": "55fbc7f6b0ce97a309b3cea9"
}},
{ "$unwind": "$packs" },
{ "$unwind": "$packs.billingPts" },
{ "$unwind": "$packs.billingPts.topUps"},
{ "$match": {
"packs.billingPts.topUps.id": "55fbc7f6b0ce97a309b3cea9"
}},
{ "$group": {
"_id": {
"_id": "$_id",
"packs": {
"id": "$packs.id",
"name": "$packs.name",
"packDispVal": "$packs.packDispVal",
"billingPts": {
"id": "$packs.billingPts.id",
"name": "$packs.billingPts.name",
"expiryVal": "$packs.billingPts.expiryVal",
"amount": "$packs.billingPts.amount"
}
}
},
"topUps": { "$push": "$packs.billingPts.topUps" }
}},
{ "$group": {
"_id": {
"_id": "$_id._id",
"packs": {
"id": "$_id.packs.id",
"name": "$_id.packs.name",
"packDispVal": "$_id.packs.packDispVal"
}
},
"billingPts": {
"$push": {
"id": "$_id.packs.billingPts.id",
"name": "$_id.packs.billingPts.name",
"expiryVal": "$_id.packs.billingPts.expiryVal",
"amount": "$_id.packs.billingPts.amount",
"topUps": "$topUps"
}
}
}},
{ "$group": {
"_id": "$_id._id",
"packs": {
"$push": {
"id": "$_id.packs.id",
"name": "$_id.packs.name",
"packDispVal": "$_id.packs.packDispVal",
"billingPts": "$billingPts"
}
}
}}
])
The listing looks a lot more simple but of course there is a lot of overhead introduced by $unwind here. The process of grouping back is basically keeping a copy of everything outside of the current array level being reconstructed, and then push that content back into the array in the next stage, until you get back to the root _id.
Please note that unless you intend such a search to match more than one document or if you are going to have significant gains from reduced network traffic by effectively reducing down the response size from a very large document, then it would be advised to do neither of these but follow much of the same design as the first pipeline example but in client code.
Whilst the first example would be still okay performance wise, it's still a mouthful to send to the server and as a general listing, that is typically written with the same operations in a cleaner way in client code to process and filter the resulting structure.
{
"_id" : ObjectId("56038c8cca689261baca93eb"),
"packs" : [
{
"id" : "55fbc7f6b0ce97a309b3cead",
"name" : "Classic",
"packDispVal" : "PACK",
"billingPts" : [
{
"id" : "55fbc7f6b0ce97a309b3ceab",
"name" : "Classic 1 month",
"expiryVal" : 1,
"amount" : 20,
"topUps" : [
{
"id" : "55fbc7f6b0ce97a309b3cea9",
"name" : "1 extra",
"amount" : 8
}
]
}
]
}
]
}

How to get sum of child entries for hierarchical documents?

I have a document of the following form:
{
"name": "root1",
"children": [{
"name": "A",
"children": [{
"name": "A1",
"items": 20
}, {
"name": "A2",
"items": 19
}],
"items": 8
}, {
"name": "B",
"items": 12
}],
"items": 1
}
That is, each level has a "name" field, an "items" field, and optionally a children field. I would like to run a query which returns the total number of items for each root. In this example, it should return (since 20+19+8+12+1=60)
{ "_id" : "root1", "items" : 60 }
However, each document can have arbitrarily many levels. That is, this example has two to three children below the root, but other documents may have more. That is, I cannot do something like
db.myCollection.aggregate( { $unwind : "$children" },
{ $group : { _id : "$name", items: { $sum : "$items" } } } )
What sort of query will work?
There really is no way to descend arrays to arbitrary depths using the aggregation framework. For this sort of structure you need to use mapReduce where you can programatically do this:
db.collection.mapReduce(
function () {
var items = 0;
var action = function(current) {
items += current.items;
if ( current.hasOwnProperty("children") ) {
current.children.forEach(function(child) {
action( child );
});
}
};
action( this );
emit( this.name, items );
},
function(){},
{ "out": { "inline": 1 } }
)
If you do not want mapReduce then consider another structure for your data and do things differently:
{ "name": "root1", "items": 1, "path": [], "root": null },
{ "name": "A", "items": 8, "path": ["root1"], "root": "root1" },
{ "name": "A1", "items": 20, "path": ["root1", "A"], "root": "root1" },
{ "name": "A2", "items": 19, "path": ["root1", "A"], "root": "root1" },
{ "name": "B", "items": 12, "path": ["root1"], "root": "root1" }
Then you just have a simple aggregate:
db.collection.aggregate([
{ "$group": {
"_id": {
"$cond": [
"$root",
"$root",
"$name"
]
},
"items": { "$sum": "$items" }
}}
])
So if you take a different approach to mapping a hierarchy then doing things such as aggregating totals for paths is much easier without the recursive inspection that would otherwise be required.
The approach that you need depends on your actual usage requirements.

Server Side Looping

I’ve solved this problem but looking for a better way to do it on the mongodb server rather that client.
I have one collection of Orders with a placement datetime (iso date) and a product.
{ _id:1, datetime:“T1”, product:”Apple”}
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:3, datetime:“T3”, product:”Pear”}
{ _id:4, datetime:“T4”, product:”Pear”}
{ _id:5, datetime:“T5”, product:”Apple”}
Goal: For a given time (or set of times) show the last order for EACH product in the set of my products before that time. Products are finite and known.
eg. query for time T6 will return:
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:4, datetime:“T4”, product:”Pear”}
{ _id:5, datetime:“T5”, product:”Apple”}
T4 will return:
{ _id:1, datetime:“T1”, product:”Apple”}
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:4, datetime:“T4”, product:”Pear”}
i’ve implemented this by creating a composite index on orders [datetime:descending, product:ascending]
Then on the java client:
findLastOrdersForTimes(times) {
for (time: times) {
for (product: products) {
db.orders.findOne(product:product, datetime: { $lt: time}}
}
}
}
Now that is pretty fast since it hits the index and only fetching the data i need. However I need to query for many time points (100000+) which will be a lot of calls over the network. Also my orders table will be very large. So how can I do this on the server in one hit, i.e return a collection of time->array products? If it was oracle, id create a stored proc with a cursor that loops back in time and collects the results for every time point and breaks when it gets to the last product after the last time point. I’ve looked at the aggregation framework and mapreduce but can’t see how to achieve this kind of loop. Any pointers?
If you truly want the last order for each product, then the aggregation framework comes in:
db.times.aggregate([
{ "$match": {
"product": { "$in": products },
}},
{ "$group": {
"_id": "$product",
"datetime": { "$max": "$datetime" }
}}
])
Example with an array of products:
var products = ['Apple', 'Orange', 'Pear'];
{ "_id" : "Pear", "datetime" : "T4" }
{ "_id" : "Orange", "datetime" : "T2" }
{ "_id" : "Apple", "datetime" : "T5" }
Or if the _id from the original document is important to you, use the $sort with $last instead:
db.times.aggregate([
{ "$match": {
"product": { "$in": products },
}},
{ "$sort": { "datetime": 1 } },
{ "$group": {
"_id": "$product",
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" }
}}
])
And that is what you most likely really want to do in either of those last cases. But the index you really want there is on "product":
db.times.ensureIndex({ "product": 1 })
So even if you need to iterate that with an additional $match condition for $lt a certain timepoint, then that is better or otherwise you can modify the "grouping" to include the "datetime" as well as keeping a set in the $match.
It seems better at any rate, so perhaps this helps at least to modify your thinking.
If I'm reading out your notes correctly you seem to simply be looking for turning this on it's head and finding the last product for each point in time. So the statement is not much different:
db.times.aggregate([
{ "$match": {
"datetime": { "$in": ["T4","T5"] },
}},
{ "$sort": { "product": 1, "datetime": 1 } },
{ "$group": {
"_id": "$datetime",
"id": { "$last": "$_id" },
"product": { "$last": "$product" }
}}
])
That is in theory it is like that based on how you present the question. I have the feeling though that you are abstracting this though and "datetime" is possibly actual timestamps as date object types.
So you might not be aware of the date aggregation operators you can apply, for example to get the boundary of each hour:
db.times.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$datetime" },
"dayOfYear": { "$dayOfYear": "$datetime" },
"hour": { "$hour": "$datetime" }
},
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" },
"product": { "$last": "$product" }
}}
])
Or even using date math instead of the operators if a epoch based timestamp
db.times.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$datetime", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$datetime", new Date("1970-01-01") ] },
1000*60*60
]}
]
},
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" },
"product": { "$last": "$product" }
}}
])
Of course you can add a range query for dates in the $match with $gt and $lt operators to keep the data within the range you are particularly looking at.
Your overall solution is probably a combination of ideas, but as I said, your question seem to be about matching the last entries on certain time boundaries, so the last examples possibly in combination with filtering certain products is what you need rather than looping .findOne() requests.