Translate sql query to MongoDB

Translate sql query to MongoDB - mongodb

Hi im newbie into MongoDB and im needing to translate this sql query to mongodb using two techniques first in MapReduce method and other Aggregation method. Someone may help?
select
sum(l_extendedprice*l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24;

http://www.mongodb.org/display/DOCS/MapReduce
For your sample, using map/reduce
var m = function () { emit(1, {this.l_extendedprice * this.l_discount})};
var r = function (k, vals) {
var sum = 0;
for (var i = 0; i < vals.length; i++) {
sum += vals[i];
}
return sum;
}
var res = db.stuff.mapReduce(m, r, {
out:"stuff_aggr",
query: {
"l_shipdate": {$gte: ISODate("1994-01-01T00:00:00.000Z")},
"l_shipdate": {$lte: ISODate("1995-01-01T00:00:00.000Z")},
"l_discount": {$gte: 0.05},
"l_discount": {$lte: 0.07},
"l_quantity": {$lt: 24}
}
});
Aggregation is still a beta feature. MapReduce is still the better option. Am assuming you wanted to see if a complex where clause can be handled easily... Its not that different from SQL as long as you are restricting yourself to one collection/table.

Related

AGGrid - Aggregation, Filtering

In agGrid, I have an aggregation which is based on two columns. When I apply filter, the aggregates based on two columns are not updated. There is no problem with sum, max, min aggregate functions as all these are based on same column.
here is the steps I follow,
Group by a Column --> Value Aggregation --> Weighted Avg(Custom Aggregation) --> Add a filter.
When I add a filter after weighted avg aggregation calculation, the new weighted avg is not refreshed.
I have this aggregate calculated using a valueGetter function (as in the code below).
Is there a way to update the aggregates which comes from a valueGetter function when I modify the filter?
Thanks in advance.
Regards,
Vijay.
function weightedAverageGetter(params) {
var col = params.column.colId.split('.')[1];
var value = 0;
if (params.data){
value = params.data.qli?params.data.qli[col]:0;
if (col == 'P1prime__c' || col == 'P2Prime__c'){
value = Math.ceil(value);
}
}
var avgW = 0;
if (params.node){
var sumVal = 0;
var sumQty = 0;
for (var node in params.node.allLeafChildren){
var nodeValue = params.node.allLeafChildren[node];
var val = 0;
var qty = 0;
if (nodeValue.data){
val = nodeValue.data.qli?nodeValue.data.qli[col]:0;
qty = nodeValue.data.qli?parseFloat(nodeValue.data.qli.Quantity):0;
}
if (typeof val !=='undefined' && val != '' && typeof qty !=='undefined' && qty != '') {
sumQty += qty;
sumVal += qty*val;
}
}
avgW = sumVal/sumQty;
}
avgW = Math.round(avgW * 100) / 100;
if(!params.column.aggregationActive || Number.isNaN(avgW)){
avgW = '';
}
return params.data?value:avgW;
}

Is there a way to optimize count with mongoDB

I have a index on id_profile and i do db.myCollection.count({"id_profile":xxx}). It's quite fast if the count is low, but if the count is large, it starts being slow. For example if there is 1 000 000 records matching {"id_profile":xxx} then it can take up to 500 ms to return the count. I think that internally the engine is simply loading all the documents matching {"id_profile":xxx} to count them.
Is there a way to quickly retrieve a count where the filter match exactly an index? I would like to avoid to use a counter collection :(
NOTE: I m on mongoDB 3.6.3 and this the script i used:
db.createCollection("following");
db.following.createIndex( {"id_profile": 1}, {unique: false} );
function randInt(n) { return parseInt(Math.random()*n); }
for(var j=0; j<10; j++) {
print("Building op "+j);
var bulkop=db.following.initializeOrderedBulkOp() ;
for (var i = 0; i < 1000000; ++i) {
bulkop.insert(
{
id_profile: NumberLong("-4578128619402503089"),
id_following: NumberLong(randInt(9223372036854775807))
}
)
};
print("Executing op "+j);
bulkop.execute();
}
db.following.count({"id_profile":NumberLong("-4578128619402503089")});

How to paginate and group in MongoDB?

My objects are of the following structure:
{id: 1234, ownerId: 1, typeId: 3456, date:...}
{id: 1235, ownerId: 1, typeId: 3456, date:...}
{id: 1236, ownerId: 1, typeId: 12, date:...}
I would like to query the database so that it returns all the items that belong to a given ownerId but only the first item of a given typeId. IE the typeId field is unique in the results. I would also like to be able to use skip and limit.
In SQL the query would be something like:
SELECT * FROM table WHERE ownerId=1 SORT BY date GROUP BY typeId LIMIT 10 OFFSET 300
I currently have the following query (using pymongo) but it is giving my errors for using $sort, $limit and $skip:
search_dict['ownerId'] = 1
search_dict['$sort'] = {'date': -1}
search_dict['$limit'] = 10
search_dict['$skip'] = 200
collectionName.group(['typeId'], search_dict, {'list': []}, 'function(obj, prev) {prev.list.push(obj)}')
-
I have also tried the aggregation route but as I understand grouping will touch all the items in the collection, group them, and then limit and skip. This will be too computationally expensive and slow. I need an iterative grouping algorithm.
search_dict = {'ownerId':1}
collectionName.aggregate([
{
'$match': search_dict
},
{
'$sort': {'date': -1}
},
{
'$group': {'_id': "$typeId"}
},
{
'$skip': skip
},
{
'$limit': 10
}
])

Your aggregation looks correct. You need to include the fields you want in the output in the $group stage using $first.
grouping will touch all the items in the collection, group them, and then limit and skip. This will be too computationally expensive and slow.
It won't touch all items in the collection. If the match + sort is indexed ({ "ownerId" : 1, "date" : -1 }), the index will be used for the match + sort, and the group will only process the documents that are the result of the match.
The constraint is hardly ever cpu, except in cases of unindexed sort. It's usually disk I/O.
I need an iterative grouping algorithm.
What precisely do you mean by "iterative grouping"? The grouping is iterative, as it iterates over the result of the previous stage and checks which group each document belongs to!

I am not to sure how you get the idea that this operation should be computational expensive. This isn't really true for most SQL databases, and it surely isn't for MongoDB. All you need is to create an index over your sort criterium.
Here is how to prove it:
Open up a mongo shell and have this executed.
var bulk = db.speed.initializeOrderedBulkOp()
for ( var i = 1; i <= 100000; i++ ){
bulk.insert({field1:i,field2:i*i,date:new ISODate()});
if((i%100) == 0){print(i)}
}
bulk.execute();
The bulk execution may take some seconds. Next, we create a helper function:
Array.prototype.avg = function() {
var av = 0;
var cnt = 0;
var len = this.length;
for (var i = 0; i < len; i++) {
var e = +this[i];
if(!e && this[i] !== 0 && this[i] !== '0') e--;
if (this[i] == e) {av += e; cnt++;}
}
return av/cnt;
}
The troupe is ready, the stage is set:
var times = new Array();
for( var i = 0; i < 10000; i++){
var start = new Date();
db.speed.find().sort({date:-1}).skip(Math.random()*100000).limit(10);
times.push(new Date() - start);
}
print(times.avg() + " msecs");
The output is in msecs. This is the output of 5 runs for comparison:
0.1697 msecs
0.1441 msecs
0.1397 msecs
0.1682 msecs
0.1843 msecs
The test server runs inside a docker image which in turn runs inside a VM (boot2docker) on my 2,13 GHz Intel Core 2 Duo with 4GB of RAM, running OSX 10.10.2, a lot of Safari windows, iTunes, Mail, Spotify and Eclipse additionally. Not quite a production system. And that collection does not even have an index on the date field. With the index, the averages of 5 runs look like this:
0.1399 msecs
0.1431 msecs
0.1339 msecs
0.1441 msecs
0.1767 msecs
qed, hth.

In mongoDB can I efficiently group documents into groups of a set size?

I need to group data into subgroups of a set size. Like if there are 6 records, ordered by date.
[1,2,3,4,5,6]
and I have a subgroup size of 2. I would end up with an array(length of 3) of arrays(each length 2):
[[1,2],[3,4],[5,6]]
Nothing about the record factors into the grouping, just how they are ordered over all and the subgroup size.
Does the aggregation framework have something that would help with this?

The best way to currently do this is with mapReduce:
db.collection.mapReduce(
function() {
var result = [];
var x = 0;
for ( x=0; x < Math.floor( this.array.length / 2 ); x+2 ) {
result.push( this.array.slice( x, x+2 ) );
}
var diff = Math.ceil( this.array.length )
- Math.floor( this.array.length );
if ( diff != 1 )
result.push( this.array.slice( x, x+diff ) );
emit( this._id, result );
},
function(){},
{
"out": { "inline": 1 }
}
);
Or basically something along those lines.
The aggregation framework does not do slice type operations well, but JavaScript processes do, especially in this case.

how to write a simple aggregation query in mongo db 2.0.6 with json API

Just started mongo and started having issue with querying already. i have a collection called 'externalTransaction' and i want to write a equivalent of this mysql query:
select transactionCode,
sum(amount) as totalSum,
count(amount) as totalCount
from externalTransaction
where transactioncode in ('aa','bb','cc')
group by sum(amount)
below is my attempt:
{
"collectionName": "externalTransaction",
sort: {transactionCode:-1},
query: {this._id: {$in:['aa','bb','cc']}},
mapReduce:{
'map': 'function(){
emit(this.transactionCode, this.amount);
}',
'reduce': 'function(key, values){
var result = {count: 0, sum: 0.0};
values.forEach(function(value) {
result.count++
result.sum += value.amount;
});
return result;
}',
'out' : 'sumAmount'
}
}
the above query give me a result set looking like this:
_id value.count value.sum
ct 2.0 NaN
bb 40.0 NaN
fg 71.0 NaN
fd 36.0 NaN
sd 5.0 NaN
as 4.0 NaN
aa 71.0 NaN
df 4.0 NaN
cc 10.0 NaN
From the documentation with the version 2.0.6 i can't use the aggregation framework just yet so how to handle simple queries like mine in mongo. thanks for reading and excuse the triviality of my question.

You have a few errors in your map and reduce functions. First, in map you emit a simple number, and in reduce you try to take amount of a number. I bet, it doesn't have that property. Second, outputs of map and reduce must be uniform, because reduce is supposed to be runnable over partially reduced results. Try these functions:
var map = function() {
emit(this.transactionCode, {sum: this.amount, count: 1})
}
var reduce = function(k, vals) {
var result = {sum: 0, count: 0};
vals.forEach(function(v) {
result.sum += v.sum;
result.count += v.count;
});
return result;
}