MongoDB query for count based on some value in other collection - mongodb

I have a configuration collection with below fields:
1) Model
2) Threshold
In above collection, certain threshold value is given for every model like as follow:
'model1' 200
'model2' 400
'model3' 600
There is another collection named customer with following fields:
1)model
2)customerID
3)baseValue
In above collection, data is as follow:
'model1' 'BIXPTL098' 300
'model2' 'BIXPTL448' 350
'model3' 'BIXPTL338' 500
Now I need to get the count of customer records which have baseValue for that particular model greater than the threshold of that particular model in configuration collection.
Example : For the above demo data, 1 should be returned by the query as there is only one customer(BIXPTL098) with baseValue(300) greater than Threshold(200) for that particular model(model1) in configuration
There are thousands of records in configuration collection. Any help is appreciated.

How often does the threshold change? If it doesn't change very often, I'd store the difference between the model threshold and the customer baseValue on each document.
{
"model" : "model1",
"customerID" : "BIXPTL098",
"baseValue" : 300,
"delta" : 100 // customer baseValue - model1 threshold = 300 - 200 = 100
{
and query for delta > 0
db.customers.find({ "delta" : { "$gt" : 0 } })
If the threshold changes frequently, the easiest option would be to compute customer documents exceeding their model threshold on a model-by-model basis:
> var mt = db.models.findOne({ "model" : "model1" }).threshold
> db.customers.find({ "model" : "model1", "baseValue" : { "$gt" : mt } })

Related

MongoDB: maximum number of documents in a capped collection

I'm using a capped collection and I defined max size to be 512000000 (512MB)
stats() says (After 1 insert): size:55, storageSize:16384.
Assuming that all documents are the same size, how many documents can I store?
Is it 512000000 / 55 or 512000000 / 16384?
For a capped collection, it's maxSize / avgObjSize. If your documents are about the same size, then it's practically maxSize / size.
You can verify this using a smaller more manageable number:
// create a capped collection with maxSize of 1024
> db.createCollection('test', {capped: true, size: 1024})
// insert one document to get an initial size
> db.test.insert({a:0})
> db.test.stats().size
33
// with similar documents, the collection should contain 1024/33 ~= 31 documents
// so let's insert 100 to make sure it's full
> for(i=1; i<100; i++) { db.test.insert({a:i}) }
> db.test.stats()
{
"ns" : "test.test",
"size" : 1023,
"count" : 31,
"avgObjSize" : 33,
"storageSize" : 36864,
"capped" : true,
"max" : -1,
"maxSize" : 1024,
....
so from the experiment above, count is 31 as expected, even though we inserted 100 documents.
Using your numbers, the max number of documents in your capped collection would be 512000000 / 55 ~= 9,309,090 documents.

How to aggregate OHLC 5 min from 1 min nested array data (mongodb, mongoose)

I have a mongodb data storage with 1 minute OHLCV data like below (time, open, high, low, close, volume) stored using mongoose in nodejs.
{
"_id":1,
"__v":0,
"data":[
[
1516597690510,
885000,
885000,
885000,
885000,
121.2982
],
[
1516597739868,
885000,
885000,
885000,
885000,
121.2982
]
...
]
}
I need to extract in same format for 5 minute interval from this data. I could not find how to do that in mongodb/mongoose, even after several hours of searching as am newbie. Kindly help. It is confusing esp because its nested array, and not having fields inside array.
NOTE: Suppose for 5 min data, you will have 4 samples(arrays) of 1 min data from data base, then
time : time element of last 1 min data array (of that 5 min interval)
open : first element of first 1 min data array (of that 5 min interval)
high : max of 2nd element in all 1 min data arrays (of that 5 min interval)
low : min of 3rd element in all 1 min data arrays (of that 5 min interval)
close : last of 4th element in all 1 min data arrays (of that 5 min interval)
volume : last element of last array in all 1 min data arrays (of that 5 min interval)
Please check the visual representation here
Idea is to be able to extract 5 min, 10 min, 30 min, 1 hour, 4 hours, 1 day intervals also in the same manner from the base 1 min database.
You need to use aggregate pipeline for doing this comparing the first element in data array which is stored in epoch time, get the epoch time of your $start and $end interval, use that value in query
db.col.aggregate(
[
{$project : {
data : {
$filter : {input : "$data", as : "d", cond : {$and : [{$lt : [ {$arrayElemAt : ["$$d" , 0]}, "$end"]}, {$gte : [ {$arrayElemAt : ["$$d" , 0]}, "$start"]}]}}
}
}
}
]
).pretty()

How to paginate and group in MongoDB?

My objects are of the following structure:
{id: 1234, ownerId: 1, typeId: 3456, date:...}
{id: 1235, ownerId: 1, typeId: 3456, date:...}
{id: 1236, ownerId: 1, typeId: 12, date:...}
I would like to query the database so that it returns all the items that belong to a given ownerId but only the first item of a given typeId. IE the typeId field is unique in the results. I would also like to be able to use skip and limit.
In SQL the query would be something like:
SELECT * FROM table WHERE ownerId=1 SORT BY date GROUP BY typeId LIMIT 10 OFFSET 300
I currently have the following query (using pymongo) but it is giving my errors for using $sort, $limit and $skip:
search_dict['ownerId'] = 1
search_dict['$sort'] = {'date': -1}
search_dict['$limit'] = 10
search_dict['$skip'] = 200
collectionName.group(['typeId'], search_dict, {'list': []}, 'function(obj, prev) {prev.list.push(obj)}')
-
I have also tried the aggregation route but as I understand grouping will touch all the items in the collection, group them, and then limit and skip. This will be too computationally expensive and slow. I need an iterative grouping algorithm.
search_dict = {'ownerId':1}
collectionName.aggregate([
{
'$match': search_dict
},
{
'$sort': {'date': -1}
},
{
'$group': {'_id': "$typeId"}
},
{
'$skip': skip
},
{
'$limit': 10
}
])
Your aggregation looks correct. You need to include the fields you want in the output in the $group stage using $first.
grouping will touch all the items in the collection, group them, and then limit and skip. This will be too computationally expensive and slow.
It won't touch all items in the collection. If the match + sort is indexed ({ "ownerId" : 1, "date" : -1 }), the index will be used for the match + sort, and the group will only process the documents that are the result of the match.
The constraint is hardly ever cpu, except in cases of unindexed sort. It's usually disk I/O.
I need an iterative grouping algorithm.
What precisely do you mean by "iterative grouping"? The grouping is iterative, as it iterates over the result of the previous stage and checks which group each document belongs to!
I am not to sure how you get the idea that this operation should be computational expensive. This isn't really true for most SQL databases, and it surely isn't for MongoDB. All you need is to create an index over your sort criterium.
Here is how to prove it:
Open up a mongo shell and have this executed.
var bulk = db.speed.initializeOrderedBulkOp()
for ( var i = 1; i <= 100000; i++ ){
bulk.insert({field1:i,field2:i*i,date:new ISODate()});
if((i%100) == 0){print(i)}
}
bulk.execute();
The bulk execution may take some seconds. Next, we create a helper function:
Array.prototype.avg = function() {
var av = 0;
var cnt = 0;
var len = this.length;
for (var i = 0; i < len; i++) {
var e = +this[i];
if(!e && this[i] !== 0 && this[i] !== '0') e--;
if (this[i] == e) {av += e; cnt++;}
}
return av/cnt;
}
The troupe is ready, the stage is set:
var times = new Array();
for( var i = 0; i < 10000; i++){
var start = new Date();
db.speed.find().sort({date:-1}).skip(Math.random()*100000).limit(10);
times.push(new Date() - start);
}
print(times.avg() + " msecs");
The output is in msecs. This is the output of 5 runs for comparison:
0.1697 msecs
0.1441 msecs
0.1397 msecs
0.1682 msecs
0.1843 msecs
The test server runs inside a docker image which in turn runs inside a VM (boot2docker) on my 2,13 GHz Intel Core 2 Duo with 4GB of RAM, running OSX 10.10.2, a lot of Safari windows, iTunes, Mail, Spotify and Eclipse additionally. Not quite a production system. And that collection does not even have an index on the date field. With the index, the averages of 5 runs look like this:
0.1399 msecs
0.1431 msecs
0.1339 msecs
0.1441 msecs
0.1767 msecs
qed, hth.

How to implement a List as a value in Mongo DB

My requirement is to store a list of (Location ID + BITMAP) for each client. Example row is as follows:
Key: Client ID
Value: < (Location 1, Bitmap 1), (Location 2, Bitmap 2), ... , (Location N, Bitmap N) >
where
'Bitmap k' contains the history of which dates a client visited that location k.
The number of elements in Value could be varying from Client to Client, it could be 0 for some, could be 100 for some. I'd like to know how should this data be stored in MongoDB such that the following operations could be efficient:
Reset a particular BIT in all the Value pairs for all the rows
Update a particular BIT for some of the Value pairs for a row
An example for query 2 is as follows:
ROW KEY: Client A
ROW VALUE: < (Loc 1, BITWISE 1), (Loc 2, BITMASK 2), (Loc 3, BITMASK 3) >
Query: Update Row with Key = 'Client A' set BIT # 8 for Loc IN (Loc 1, Loc 3)
Ultimately, I'd like to run a map-reduce query which should be able to iterate on each of the row value pairs.
What about something like this ?
{
"_id" : ObjectId("51da846d9c34549b45432871"),
"client_id" : "client_a",
"values" : [
{
"location" : "loc_1",
"bitmap" : [
"000...",
"111..."
]
}
]
}
db.collection.find( { client_id: 'client_a', 'values.location': 'loc_1' } )

Get closest data using centerSphere - MongoDB

I'm trying to get the closest data from the following data
> db.points.insert({name:"Skoda" , pos : { lon : 30, lat : 30 } })
> db.points.insert({name:"Honda" , pos : { lon : -10, lat : -20 } })
> db.points.insert({name:"Skode" , pos : { lon : 10, lat : -20 } })
> db.points.insert({name:"Skoda" , pos : { lon : 60, lat : -10 } })
> db.points.insert({name:"Honda" , pos : { lon : 410, lat : 20 } })
> db.points.ensureIndex({ loc : "2d" })
then I tried
> db.points.find({"loc" : {"$within" : {"$centerSphere" : [[0 , 0],5]}}})
this time I got different error
error: {
"$err" : "Spherical MaxDistance > PI. Are you sure you are using radians?",
"code" : 13461
then I tried
> db.points.find({"loc" : {"$within" : {"$centerSphere" : [[10 , 10],2]}}})
error: {
"$err" : "Spherical distance would require wrapping, which isn't implemented yet",
"code" : 13462
How to get this done ? I just want to get the closest data based on the given radious from GEO point.
Thanks
A few things to note. Firstly, you are storing your coordinates in a field called "pos" but you are doing a query (and have created an index) on a field called "loc."
The $centerSphere takes a set of coordinates and a value that is in radians. So $centerSphere: [[10, 10], 2] searches for items around [10, 10] in a circle that is 2 * (earth's radius) = 12,756 km. The $centerSphere operator is not designed to search for documents in this large of an area (and wrapping around the poles of the Earth is tricky). Try using a smaller value, such as .005.
Finally, it is probably a better idea to store coordinates as elements in an array since some drivers may not preserve the order of fields in a document (and swapping latitude and longitude results in drastically different geo locations).
Hope this helps:
The radius of the Earth is approximately 3963.192 miles or 6378.137 kilometers.
For 1 mile:
db.places.find( { loc: { $centerSphere: [ [ -74, 40.74 ] ,
1 / 3963.192 ] } } )