Can I group floating point numbers by range in MongoDB? - mongodb

I have a MongoDB set up with documents like this
{
"_id" : ObjectId("544ced7b9f40841ab8afec4e"),
"Measurement" : {
"Co2" : 38,
"Humidity" : 90
},
"City" : "Antwerp",
"Datum" : ISODate("2014-10-01T23:13:00.000Z"),
"BikeId" : 26,
"GPS" : {
"Latitude" : 51.20711593206187,
"Longitude" : 4.424424413969158
}
}
Now I try to aggregate them by date and location and also add the average of the measurement to the result. So far my code looks like this:
db.stadsfietsen.aggregate([
{$match: {"Measurement.Co2": {$gt: 0}}},
{
$group: {
_id: {
hour: {$hour: "$Datum"},
Location: {
Long: "$GPS.Longitude",
Lat: "$GPS.Latitude"
}
},
Average: {$avg: "$Measurement.Co2"}
}
},
{$sort: {"_id": 1}},
{$out: "Co2"}
]);
which gives me a nice list of all the possible combinations of hour and GPS coordinates, in this form:
{
"_id" : {
"hour" : 0,
"Location" : {
"Long" : 3.424424413969158,
"Lat" : 51.20711593206187
}
},
"Average" : 82
}
The problem is that there are so many unique coordinates, that it's not useful.
Can I group the documents together when there are values that are close together? Say from Longitude 51.207 to Longitude 51.209?

There is no standard support for ranges in $group.
Mathematically
You could calculate a new value that will be the same for several geolocations. For example you could simulate a floor method:
_id:{ hour:{$hour:"$Datum"}, Location:{
Long: $GPS.Longitude - mod($GPS.Longitude, 0.01),
Lat: $GPS.Latitude - mod($GPS.Latitude, 0.01)
}}
Geospatial Indexing
You could restructure you're application to use a Geospatial index and search for all locations in a given range. If this is applicable depends very much on your use case.
Map-Reduce
Map-Reduce is more powerful than the aggregation framework. You can definitely use this to do your calculations, but it's more complex and therefore I can't present you a ready-made solution without spending another hour.

Related

Does mongodb have a product equivalent of the aggregate $sum

I am trying to calculate cumulative returns for a portfolio of stocks in mongodb and ideally would be able to use a cumulative $product accumulator
e.g. If I have three documents one with the value 0.5, the next 0.6 and the final having 0.7
I can easily calculate the sum using the aggregate accumulator $sum. This will give 0.5+0.6+0.7.
What I would like to do is calculate the cumulative product ($product) of these values i.e. 0.5*0.6*0.5? Can this be done directly of do I have to use logs?
The document structure is something like the following
{
"date" : 2015-12-31T15:50:00.000Z,
"time" : 1550,
"aum" : 1000000,
"basket" :[
{
"_id" : "Microsoft",
"return" : 0.03,
"shares" : 10,
"price" : 56.53,
"fx" : 1.0
},
.
.
.
{
"_id" : "GOOG.N",
"return" : 0.05,
"shares" : 20,
"price" : 759.69,
"fx" : 1.0
}
you can use $multiply (aggregation)
> db.stocks.aggregate( {$project: { total: { $multiply: [ 0.5,0.6,0.5 ] } }} )
UPDATE:
This will calculate product of each stock in a separate document:
> var total=1;db.stocks.find().forEach(function(doc){total=total*doc.stock;})
> total
you need to use $multiply and below is a sample query
db.stocks.aggregate( [{ total: { $multiply: [ "$price", "$quantity" ] } } ])

Optimizing MongoDB Query on GEO fence

I am using MongoDB to store a lot of GPS-Data (2 Mio. Documents with about 1000 GPS-Points per Document). The data looks like the following
{
"Data": [
{
latitude : XXXXXX,
longitude : XXXXXX,
speed : XXXXXXX
}],
"_id": ID,
" StartDayOfWeek" : X
}
As you see Data is an array of GPSPoints and additional information.
I query for Documents on specific Days of the week (Sometimes I query for multiple days in which case I use $or.
Oh... lat/lon is stored as Integers
The following is my query(An example):
db.testNew2.aggregate(
{ $sort : {"StartDayOfWeek" : 1 }},
{ $match: {"$and" : [
{ "StartDayOfWeek" : 1},
{ 'Data' :
{$elemMatch:
{ "latitude" : {$gt:48143743, $lt:48143843}}}},
{ 'Data' :
{$elemMatch:
{ "longitude" : {$gt:11554706,$lt:11554806}}}}
]}},
{ $unwind : "$Data"},
{ $match: {"$and" : [
{ 'Data.latitude' : {$gt:48143743, $lt:48143843}},
{ 'Data.longitude' : {$gt:11554706, $lt:11554806}}
]}},
{$group: {
"_id" : "$_id",
"Traces" : { $push :"$Data"}
}}
)
As you can see I am sorting out the GPS-Points that are not within the GEO fence.
This query works fine but...and I do not know why. It seems very slow. There is an index on StartDayOfWeek and the machine is more than capable (24 GB of RAM and two 7200rpm SATA drives in RAID0).
The collection size is about 130 GB and the query takes about 3 - 5 minutes.
In the Java programme that uses that query I also use allowDiskUsage since the return values can be higher than 16 MB.
Is there any way to optimize this?

how to sort before querying in the embedded document

I know how to sort the embedded document after the find results but how do I sort before the find so that the query itself is run on the sorted array ? I know this must be possible if I use aggregate but i really like to know if this is possible without that so that I understand it better how it works.
This is my embedded document
"shipping_charges" : [
{
"region" : "region1",
"weight" : 500,
"rate" : 10
},
{
"region" : "Bangalore HQ",
"weight" : 200,
"rate" : 40
},
{
"region" : "region2",
"weight" : 1500,
"rate" : 110
},
{
"region" : "region3",
"weight" : 100,
"rate" : 50
},
{
"region" : "Bangalore HQ",
"weight" : 100,
"rate" : 150
}
]
This is the query i use to match the 'region' and the 'weight' to get the pricing for that match ..
db.clients.find( { "shipping_charges.region" : "Bangalore HQ" , "shipping_charges.weight" : { $gte : 99 } }, { "shipping_charges.$" : 1 } ).pretty()
This query currently returns me the
{
"shipping_charges" : [
{
"region" : "Bangalore HQ",
"weight" : 200,
"rate" : 40
}
]
}
The reason it possibly returns this set is because of the order in which it appears(& matches) in the embedded document.
But, I want this to return me the last set that best matches to closest slab of the weight(100grams)
What changes required in my existing query so that I can sort the embedded document before the find runs on them to get the results as I want it ?
If for any reasons you are sure this cant be done without a MPR, let me know so that i can stay away from this method and focus only on MPR to get the desired results as I want it .
You can use an aggregation pipeline instead of map-reduce:
db.clients.aggregate([
// Filter the docs to what we're looking for.
{$match: {
'shipping_charges.region': 'Bangalore HQ',
'shipping_charges.weight': {$gte: 99}
}},
// Duplicate the docs, once per shipping_charges element
{$unwind: '$shipping_charges'},
// Filter again to get the candidate shipping_charges.
{$match: {
'shipping_charges.region': 'Bangalore HQ',
'shipping_charges.weight': {$gte: 99}
}},
// Sort those by weight, ascending.
{$sort: {'shipping_charges.weight': 1}},
// Regroup and take the first shipping_charge which will be the one closest to 99
// because of the sort.
{$group: {_id: '$_id', shipping_charges: {$first: '$shipping_charges'}}}
])
You could also use find, but you'd need to pre-sort the shipping_charges array by weight in the documents themselves. You can do that by using a $push update with the $sort modifier:
db.clients.update({}, {
$push: {shipping_charges: {$each: [], $sort: {weight: 1}}}
}, {multi: true})
After doing that, your existing query will return the right element:
db.clients.find({
"shipping_charges.region" : "Bangalore HQ",
"shipping_charges.weight" : { $gte : 99 }
}, { "shipping_charges.$" : 1 } )
You would, of course, need to consistently include the $sort modifier on any further updates to your docs' shipping_charges array to ensure it stays sorted.

MongoDB Query with sum

I have a simple document setup:
{
VG: "East",
Artikellist: {
Artikel1: "Sprite",
Amount1: 1,
Artikel2: "Fanta",
Amount2: 3
}
}
actually i just want to query these document to get a list of selling articels in each VG, or maybe town, doesnt matter. In addition the Query should sum the Amount of each product and give it back to me.
I Know i'm thinking in a SQL language, but that's actually the case.
May idea was this on here:
db.collection.aggregate([{
$group: {
_id: {
VG: "$VG",
Artikel1: "$Artikellist.Artikel1",
Artikel2: "$Artikellist.Artikel2",
$sum: "$Artikellist.Amount1",
$sum: "$Artikellist.Amount2"
},
}
}]);
The hardest point here that i have 5 different values for VG and and it could be maximum of 5 Artikel and regarding amounts in one list.
So Hopefully you can help me here. Sry for my bad english and for my badder Mongo Skills.
If Artikel1 is always "Sprite" and Artikel2 - "Fanta", then you can try this one:
db.test.aggregate({$group : { _id : {VG : "$VG", Artikel1 : "$Artikellist.Artikel1", Artikel2 : "$Artikellist.Artikel2"}, Amount1 : {$sum : "$Artikellist.Amount1"},Amount2 : {$sum : "$Artikellist.Amount2"}}});
If values of Artikel1 and Artikel2 can vary I suggest changing the structure of document say to:
{
VG: "East",
Artikellist: [
{ Artikel: "Sprite",
Amount: 1},
{ Artikel: "Fanta",
Amount: 3 }
]}
and then use the following approach:
db.test.aggregate({$unwind : "$Artikellist"}, {$group : {_id : {VG : "$VG", Artikel : "$Artikellist.Artikel"}, Amount : {$sum : "$Artikellist.Amount"}}})

Can't find a good index for ipintervals in mongodb

I have a collection, called Cities with city info in it. Each city document has an inner IpIntervals array ({StartNum, EndNum}) that contians IpIntervals for the city. Each interval boundary is calculated using formula 256 * 256 * 256 * a + 256 * 256 * b + 256 * c + d where "a.b.c.d" is ip address. To find location by ip address I'm using query:
{IpIntervals: $elemMatch: {"StartNum": {$lte: <<my_ip_num>>}, "EndNum": {$gte: <<my_ip_num>>}}}}
which works great but it takes about 270 ms, so I want to use some index with it. I've tried different indexes like:
{"IpIntervals.StartNum": 1, "IpIntervals.EndNum": 1}, {"IpIntervals.StartNum": -1, "IpIntervals.EndNum": 1}, {"IpIntervals.StartNum": 1, "IpIntervals.EndNum": -1}, {"IpIntervals.StartNum": 1}
But nothing seems to work: it is always BasicCursor and 270ms, which is not good. Any ideas about what index is appropriate in this situation?
Thanx.
Sample data:
{
"_id" : { "$oid" : "51015e8bd246e8e455ee027d" },
"Name" : "SomeCity",
"Latitude" : 28.755787,
"Longitude" : 37.617634,
"IpIntervals" : [
{ "StartNum" : 2457360384, "EndNum" : 2457360639 },
{ "StartNum" : 2457361408, "EndNum" : 2457362431 },
{ "StartNum" : 2457364480, "EndNum" : 2457366527 },
{ "StartNum" : 2461648896, "EndNum" : 2461650943 }
]
}
Finally found work around with query
db.Cities.find({"IpIntervals.StartNum": {$lte: <<my_ip_num>>}}).limit(1).sort({"IpIntervals.StartNum": -1}) and index {"IpIntervals.StartNum": 1} which takes about 2ms.
Since ip intervals are not overlapping I can order StartNums and get the closest to my_ip_num. Still I haven't found any good index for query {IpIntervals: $elemMatch: {"StartNum": {$lte: <<my_ip_num>>}, "EndNum": {$gte: <<my_ip_num>>}}}}