How to optimize this mongo query? - mongodb

I have a simple mongo query here which uses aggregate function. It is taking 1 min and 38 sec to load 10 rows in my php script. How can I optimize this query as it takes lesser time.
$data = $mongo->command(
array(
'aggregate' =>"my_collection",
'pipeline' => array(
array('$match' => $filter_query),
array('$group' => array('_id'=>'$email')),
array('$skip'=>$offset),
array('$limit'=>10)
)
),
array( 'timeout' => "-1")
);
This is my query and the filter_query is another array which is
Array ( [event] => spamreport [timestamp] => Array ( [$gt] => 1384236000 [$lt] => 1384840800 ) )
Also I have one more query to find the distinct values like this
$distinct_notifications = $my_collection->distinct('email',$filter_query);
It is taking forever to load the distinct values. My collection is little larger and has around 20700144 rows. It is increasing day by day. Also it will be helpful if someone can point out a good map/reduce tutorial as I am new to this.

Related

MongoDB-PHP: How to sort the result of find query by timestamp in ascending/descending order?

I want to sort the result of find query in ascending/descending order w.r.t timestamp(time_created).
My query is:
$mongoResult = $mongoDb->find(array('organization_id' => new MongoId($_SESSION['user_id'])));
Try:
$mongoResult = $mongoDb->find(
array(
'organization_id' => new MongoId($_SESSION['user_id'])
))->sort(array("time_created" => -1)
);
You can use the _id field to sort by timestamp.
The following query can print the latest 100 records.
$mongoResult = $mongoDb->find()->sort( array("_id" => -1 ))->limit(100);

Mongodb aggregation multiple groups per document

I am using the mongodb aggregation framework for some analyses and I am stuck on this particulare one.
In english, I am trying to run the following query:
"Find the average 5 day count over a given time range."
So given a dataset that looks like this:
{
:_id => BSONID,
:date => ISO(DATE),
:count => 3
},
{
:_id => BSONID,
:date => ISO(DATE),
:count => 2
},
{
:_id => BSONID,
:date => ISO(DATE),
:count => 9
}
...
I want a result:
[
{:_id => "2014-145", :five_day_average_count => 3},#this is the average count from 2014-145 through 2014-149
{:_id => "2014-146", :five_day_average_count => 6},#this is the average count from 2014-146 through 2014-150
{:_id => "2014-147", :five_day_average_count => 4},#this is the average count from 2014-147 through 2014-151
...
]
Where the _id is the $year and $dayOfYear (this can be achieved with $concat).
I am perfectly capable of $group'ing the data into groups of length 5 days. The difficulty is in the fact that a single data point will fall into 5 groups at once. Now I COULD run 5 aggregate queries, each with a separate start day of the grouping, but I would prefer to run a single query to save time.
I sincerely hope that I am being silly and a simple solution will be posted.

Subqueries in DBIx::Class

I have spent too much time on this, and still cannot get the syntax to work.
Is this select statement possible in DBIx::Class?
"SELECT A.id, A.name, count(C.a_id) AS count1,
(SELECT count(B.id FROM A, B WHERE A.id=B.a_id GROUP BY B.a_id, A.id) AS count2
FROM A LEFT OUTER JOIN C on A.id=C.a_id GROUP BY C.a_id, A.id"
This code below works in DBIx::Class to pull the count for table 'C', but multiple efforts of mine to add in the count for table 'B' have repeatedly failed:
my $data= $c->model('DB::Model')
->search({},
{
join => 'C',
join_type => 'LEFT_OUTER',
distinct => 1,
'select' => [ 'me.id','name',{ count => 'C.id', -as => 'count1'} ],
'as' => [qw/id name count1 /],
group_by => ['C.a_id','me.id'],
}
)
->all();
I am trying to get two counts in one query so that the results are saved in one data structure. On another forum it was suggested that I make two separate search calls and then union the results. When I looked at the DBIx::Class documentation though, it mentioned that 'union' is being deprecated. Using the 'literal' DBIx::Class doesn't work because it's only meant to be used as a where clause. I do not want to use a view (another's suggestion) because the SQL will eventually be expanded to match upon one of the id's. How do I format this query to work in DBIx::Class? Thank you.
DBIx::Class supports subqueries pretty conveniently by using the as_query method on your subquery resultset. There are some examples in the cookbook. For your use case, it'd look something like:
# your subquery goes in its own resultset object
my $subq_rs = $schema->resultset('A')->search(undef, {
join => 'B',
group_by => [qw/A.id B.id/],
})->count_rs;
# the subquery is attached to the parent query with ->as_query
my $combo_rs = $schema->resultset('A')->search(undef, {
select => [qw/ me.id me.name /,{ count => 'C.id' }, $subq_rs->as_query],
as => [qw/ id name c_count b_count/],
});

how to do multiple where with same field?

How to convert this script for mongoDB?
select * from table where id > -1 and id in (4,5)
I need to execute this query with two where conditions on the same field.
This doesn't seem to work in PHP:
$db->$col->find(
array(
'id' => array( '$gt' => -1 ),
'id' => array( '$in' => array( 4, 5 ) )
)
);
Basically the problem is that you have a duplicate key in your array so the latter overwrites the prior so your query is only doing the $in.
Most poeople look to $and to solve this problem however, very few know that actually:
'id'=>array('$gt'=>-1, '$in'=>array(4,5))
Will also solve it. You can just chain your constraints for that field.

Proper indexing and sorting with MongoDB

I have a relatively complex query that I'm running. I've included it below, but essentially I have a location query combined with an $or and $in query all to be sorted by date/time. Whenever I run the full query though it returns in ascending order rather than descending.
If I pull out the location part of the query, it successfully returns the results in the proper order, but once I put the location component of the query back in, it continues to fail.
Here's the query as written in php:
$query = array(
'$or' => array(
array( ISPUBLIC => true ),
array( GROUP_INVITES.".".TO.".".INVITE_VAL => array( '$in' => $user_groups ) )
),
LOC => array( '$near' => array( (float)$lon , (float)$lat ), '$maxDistance' => 1 ) //Using 1 degree for the max distance for now
);
$res = $this->db->find( $query )->sort(array('ts'=>-1))->limit($limit)->skip($start);
The fields used are constants that map to fields in the database. My first assumption was that the indexes are not configured properly, but how should I have this indexed when combining geospatial queries with others?
UPDATE
I've been able to replicate the problem by simplifying the query down to:
$query = array(
HOLLER_GROUP_INVITES.".".HOLLER_TO.".".INVITE_VAL => array( '$in' => $user_groups ),
HOLLER_LOC => array( '$near' => array( (float)$lon , (float)$lat ), '$maxDistance' => 1 )
);
UPDATE 2
I've run additional tests and it appears that when limit is removed it's sorted in the correct direction. No idea what's going on here.
This question appears to have the answer: Sorting MongoDB GeoNear results by something other than distance? .... apparently $near was already performing a sort. If you simply want to limit within a given boundary but sort by something else, you should use $within instead of $near.