Does mongodb have a product equivalent of the aggregate $sum - mongodb

I am trying to calculate cumulative returns for a portfolio of stocks in mongodb and ideally would be able to use a cumulative $product accumulator
e.g. If I have three documents one with the value 0.5, the next 0.6 and the final having 0.7
I can easily calculate the sum using the aggregate accumulator $sum. This will give 0.5+0.6+0.7.
What I would like to do is calculate the cumulative product ($product) of these values i.e. 0.5*0.6*0.5? Can this be done directly of do I have to use logs?
The document structure is something like the following
{
"date" : 2015-12-31T15:50:00.000Z,
"time" : 1550,
"aum" : 1000000,
"basket" :[
{
"_id" : "Microsoft",
"return" : 0.03,
"shares" : 10,
"price" : 56.53,
"fx" : 1.0
},
.
.
.
{
"_id" : "GOOG.N",
"return" : 0.05,
"shares" : 20,
"price" : 759.69,
"fx" : 1.0
}

you can use $multiply (aggregation)
> db.stocks.aggregate( {$project: { total: { $multiply: [ 0.5,0.6,0.5 ] } }} )
UPDATE:
This will calculate product of each stock in a separate document:
> var total=1;db.stocks.find().forEach(function(doc){total=total*doc.stock;})
> total

you need to use $multiply and below is a sample query
db.stocks.aggregate( [{ total: { $multiply: [ "$price", "$quantity" ] } } ])

Related

How to improve aggregate pipeline

I have pipeline
[
{'$match':{templateId:ObjectId('blabla')}},
{
"$sort" : {
"_id" : 1
}
},
{
"$facet" : {
"paginatedResult" : [
{
"$skip" : 0
},
{
"$limit" : 100
}
],
"totalCount" : [
{
"$count" : "count"
}
]
}
}
])
Index:
"key" : {
"templateId" : 1,
"_id" : 1
}
Collection has 10.6M documents 500k of it is with needed templateId.
Aggregate use index
"planSummary" : "IXSCAN { templateId: 1, _id: 1 }",
But the request takes 16 seconds. What i did wrong? How to speed up it?
For start, you should get rid of the $sort operator. The documents are already sorted by _id since the documents are already guaranteed to sorted by the { templateId: 1, _id: 1 } index. The outcome is sorting 500k which are already sorted anyway.
Next, you shouldn't use the $skip approach. For high page numbers you will skip large numbers of documents up to almost 500k (rather index entries, but still).
I suggest an alternative approach:
For the first page, calculate an id you know for sure falls out of the left side of the index. Say, if you know that you don't have entries back dated to 2019 and before, you can use a match operator similar to this:
var pageStart = ObjectId.fromDate(new Date("2020/01/01"))
Then, your match operator should look like this:
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}}
For the next pages, keep track of the last document of the previous page: if the rightmost document _id is x in a certain page, then pageStart should be x for the next page.
So your pipeline may look like this:
[
{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}},
{
"$facet" : {
"paginatedResult" : [
{
"$limit" : 100
}
]
}
}
]
Note, that now the $skip is missing from the $facet operator as well.

Mongo aggregation on array elements

I have a mongo document like
{ "_id" : 12, "location" : [ "Kannur","Hyderabad","Chennai","Bengaluru"] }
{ "_id" : 13, "location" : [ "Hyderabad","Chennai","Mysore","Ballary"] }
From this how can I get the location aggregation (distinct area count).
some thing like
Hyderabad 2,
Kannur 1,
Chennai 2,
Bengaluru 1,
Mysore 1,
Ballary 1
Using aggregation you cannot get the exact output that you want. One of the limitations of aggregation pipeline is its inability to transform values to keys in the output document.
For example, Kannur is one of the values of the location field, in the input document. In your desired output structure it needs to be the key("kannur":1). This is not possible using aggregation. While, this can be used achieving map-reduce, you can however get a very closely related and useful structure using aggregation.
Unwind the location array.
Group by the location fields, get the count of individual locations
using the $sum operator.
Group again all the documents once again to get a consolidated array
of results.
Code:
db.collection.aggregate([
{$unwind:"$location"},
{$group:{"_id":"$location","count":{$sum:1}}},
{$group:{"_id":null,"location_details":{$push:{"location":"$_id",
"count":"$count"}}}},
{$project:{"_id":0,"location_details":1}}
])
Sample o/p:
{
"location_details" : [
{
"location" : "Ballary",
"count" : 1
},
{
"location" : "Mysore",
"count" : 1
},
{
"location" : "Bengaluru",
"count" : 1
},
{
"location" : "Chennai",
"count" : 2
},
{
"location" : "Hyderabad",
"count" : 2
},
{
"location" : "Kannur",
"count" : 1
}
]
}

Can I group floating point numbers by range in MongoDB?

I have a MongoDB set up with documents like this
{
"_id" : ObjectId("544ced7b9f40841ab8afec4e"),
"Measurement" : {
"Co2" : 38,
"Humidity" : 90
},
"City" : "Antwerp",
"Datum" : ISODate("2014-10-01T23:13:00.000Z"),
"BikeId" : 26,
"GPS" : {
"Latitude" : 51.20711593206187,
"Longitude" : 4.424424413969158
}
}
Now I try to aggregate them by date and location and also add the average of the measurement to the result. So far my code looks like this:
db.stadsfietsen.aggregate([
{$match: {"Measurement.Co2": {$gt: 0}}},
{
$group: {
_id: {
hour: {$hour: "$Datum"},
Location: {
Long: "$GPS.Longitude",
Lat: "$GPS.Latitude"
}
},
Average: {$avg: "$Measurement.Co2"}
}
},
{$sort: {"_id": 1}},
{$out: "Co2"}
]);
which gives me a nice list of all the possible combinations of hour and GPS coordinates, in this form:
{
"_id" : {
"hour" : 0,
"Location" : {
"Long" : 3.424424413969158,
"Lat" : 51.20711593206187
}
},
"Average" : 82
}
The problem is that there are so many unique coordinates, that it's not useful.
Can I group the documents together when there are values that are close together? Say from Longitude 51.207 to Longitude 51.209?
There is no standard support for ranges in $group.
Mathematically
You could calculate a new value that will be the same for several geolocations. For example you could simulate a floor method:
_id:{ hour:{$hour:"$Datum"}, Location:{
Long: $GPS.Longitude - mod($GPS.Longitude, 0.01),
Lat: $GPS.Latitude - mod($GPS.Latitude, 0.01)
}}
Geospatial Indexing
You could restructure you're application to use a Geospatial index and search for all locations in a given range. If this is applicable depends very much on your use case.
Map-Reduce
Map-Reduce is more powerful than the aggregation framework. You can definitely use this to do your calculations, but it's more complex and therefore I can't present you a ready-made solution without spending another hour.

MongoDB aggregation framework sort by length of array

Given the following data set:
{ "_id" : ObjectId("510458b188ce1d16e616129b"), "codes" : [ "oxtbyr", "xstute" ], "name" : "Ciao Mambo", "permalink" : "ciaomambo", "visits" : 1 }
{ "_id" : ObjectId("510458b188ce1d16e6161296"), "codes" : [ "zpngwh", "odszfy", "vbvlgr" ], "name" : "Anthony's at Spokane Falls", "permalink" : "anthonysatspokanefalls", "visits" : 0 }
How can I convert this python/pymongo sort into something that will work with the MongoDB aggregation framework? I'm sorting results based on the number of codes within the codes array.
z = [(x['name'], len(x['codes'])) for x in restaurants]
sorted_by_second = sorted(z, key=lambda tup: tup[1], reverse=True)
for x in sorted_by_second:
print x[0], x[1]
This works in python, I just want to know how to accomplish the same goal on the MongoDB query end of things.
> db.z.aggregate({ $unwind:'$codes'},
{ $group : {_id:'$_id', count:{$sum:1}}},
{ $sort :{ count: 1}})

Map reduce in mongodb

I have mongo documents in this format.
{"_id" : 1,"Summary" : {...},"Examples" : [{"_id" : 353,"CategoryId" : 4},{"_id" : 239,"CategoryId" : 28}, ... ]}
{"_id" : 2,"Summary" : {...},"Examples" : [{"_id" : 312,"CategoryId" : 2},{"_id" : 121,"CategoryId" : 12}, ... ]}
How can I map/reduce them to get a hash like:
{ [ result[categoryId] : count_of_examples , .....] }
I.e. count of examples of each category.
I have 30 categories at all, all specified in Categories collection.
If you can use 2.1 (dev version of upcoming release 2.2) then you can use Aggregation Framework and it would look something like this:
db.collection.aggregate( [
{$project:{"CatId":"$Examples.CategoryId","_id":0}},
{$unwind:"$CatId"},
{$group:{_id:"$CatId","num":{$sum:1} } },
{$project:{CategoryId:"$_id",NumberOfExamples:"$num",_id:0 }}
] );
The first step projects the subfield of Examples (CategoryId) into a top level field of a document (not necessary but helps with readability), then we unwind the array of examples which creates a separate document for each array value of CatId, we do a "group by" and count them (I assume each instance of CategoryId is one example, right?) and last we use projection again to relabel the fields and make the result look like this:
"result" : [
{
"CategoryId" : 12,
"NumberOfExamples" : 1
},
{
"CategoryId" : 2,
"NumberOfExamples" : 1
},
{
"CategoryId" : 28,
"NumberOfExamples" : 1
},
{
"CategoryId" : 4,
"NumberOfExamples" : 1
}
],
"ok" : 1