I am using the mongodb aggregation framework for some analyses and I am stuck on this particulare one.
In english, I am trying to run the following query:
"Find the average 5 day count over a given time range."
So given a dataset that looks like this:
{
:_id => BSONID,
:date => ISO(DATE),
:count => 3
},
{
:_id => BSONID,
:date => ISO(DATE),
:count => 2
},
{
:_id => BSONID,
:date => ISO(DATE),
:count => 9
}
...
I want a result:
[
{:_id => "2014-145", :five_day_average_count => 3},#this is the average count from 2014-145 through 2014-149
{:_id => "2014-146", :five_day_average_count => 6},#this is the average count from 2014-146 through 2014-150
{:_id => "2014-147", :five_day_average_count => 4},#this is the average count from 2014-147 through 2014-151
...
]
Where the _id is the $year and $dayOfYear (this can be achieved with $concat).
I am perfectly capable of $group'ing the data into groups of length 5 days. The difficulty is in the fact that a single data point will fall into 5 groups at once. Now I COULD run 5 aggregate queries, each with a separate start day of the grouping, but I would prefer to run a single query to save time.
I sincerely hope that I am being silly and a simple solution will be posted.
Related
I have many records that has a datetime field:
MyTable(ID, StartDate,...)
And I have as parameter a startDate, I would like to get all the records that have a startDate >= than the date set in the parameter and also I would like the record which ID is ID -1 of the ID of the first record which startDate >= of the date of the parameter (first record when the result is ordered by date).
Something like that:
dbContext.MyTable.Where(x => x.ID >= dbContext.MyTable.OrderBy(y => y.StartDate).Where(y => y.StartDate >= myDate).First()).ToList();
But I get an error because I can't use First() in this place.
Also if I would use it, first execute the query to the database, but I don't want to do it at this point, because I am constructing a dynamic query and I only want one trip to the database.
So I would like to know if it is possible to use as condition the first element of a result.
Thanks.
You can use Take(1) as replacement of First.
dbContext.MyTable
.Where(x =>
dbContext.MyTable
.OrderBy(y => y.StartDate)
.Where(y => y.StartDate >= myDate)
.Take(1)
.Any(y => x.ID >= y.ID))
.ToList();
I have a simple mongo query here which uses aggregate function. It is taking 1 min and 38 sec to load 10 rows in my php script. How can I optimize this query as it takes lesser time.
$data = $mongo->command(
array(
'aggregate' =>"my_collection",
'pipeline' => array(
array('$match' => $filter_query),
array('$group' => array('_id'=>'$email')),
array('$skip'=>$offset),
array('$limit'=>10)
)
),
array( 'timeout' => "-1")
);
This is my query and the filter_query is another array which is
Array ( [event] => spamreport [timestamp] => Array ( [$gt] => 1384236000 [$lt] => 1384840800 ) )
Also I have one more query to find the distinct values like this
$distinct_notifications = $my_collection->distinct('email',$filter_query);
It is taking forever to load the distinct values. My collection is little larger and has around 20700144 rows. It is increasing day by day. Also it will be helpful if someone can point out a good map/reduce tutorial as I am new to this.
Hi I'm using mongoid (mongodb) to go a greater than criteria:
Account.where(:field1.gt => 10)
But I was wondering if it was possible to do a criteria where the sum of two fields was greater than some number. Maybe something like this (but doesn't seem to work):
Account.where(:'field1 + field2'.gt => 10)
Maybe some embedded javascript is needed? Thanks!
I'd recommend using the Mongoid 3 syntax as suggested by Piotr, but if you want to make this much more performant, at the expense of some storage overhead, you could try something like this:
class Account
# ...
field :field1, :type => Integer
field :field2, :type => Integer
field :field3, :type => Integer, default -> { field1 + field2 }
index({ field3: 1 }, { name: "field3" })
before_save :update_field3
private
def update_field3
self.field3 = field1 + field2
end
end
Then your query would look more like:
Account.where(:field3.gte => 10)
Notice the callback to update field3 when the document changes. Also added an index for it.
You can use MongoDB's javascript query syntax.
So you can do something like:
Account.collection.find("$where" => '(this.field1 + thist.field2) > 10')
Or in Mongoid 3 the following will work as well
Account.where('(this.field1 + thist.field2) > 10')
As Sammaye mentioned in the comment it introduces the performance penealty since javascript has to be executed for every document individually. If you don't use that query that often then it's ok. But if you do I would recommend adding another field that would be the aggregation of field1 and field2 and then base the query on that field.
I have a relatively complex query that I'm running. I've included it below, but essentially I have a location query combined with an $or and $in query all to be sorted by date/time. Whenever I run the full query though it returns in ascending order rather than descending.
If I pull out the location part of the query, it successfully returns the results in the proper order, but once I put the location component of the query back in, it continues to fail.
Here's the query as written in php:
$query = array(
'$or' => array(
array( ISPUBLIC => true ),
array( GROUP_INVITES.".".TO.".".INVITE_VAL => array( '$in' => $user_groups ) )
),
LOC => array( '$near' => array( (float)$lon , (float)$lat ), '$maxDistance' => 1 ) //Using 1 degree for the max distance for now
);
$res = $this->db->find( $query )->sort(array('ts'=>-1))->limit($limit)->skip($start);
The fields used are constants that map to fields in the database. My first assumption was that the indexes are not configured properly, but how should I have this indexed when combining geospatial queries with others?
UPDATE
I've been able to replicate the problem by simplifying the query down to:
$query = array(
HOLLER_GROUP_INVITES.".".HOLLER_TO.".".INVITE_VAL => array( '$in' => $user_groups ),
HOLLER_LOC => array( '$near' => array( (float)$lon , (float)$lat ), '$maxDistance' => 1 )
);
UPDATE 2
I've run additional tests and it appears that when limit is removed it's sorted in the correct direction. No idea what's going on here.
This question appears to have the answer: Sorting MongoDB GeoNear results by something other than distance? .... apparently $near was already performing a sort. If you simply want to limit within a given boundary but sort by something else, you should use $within instead of $near.
I've created a MongoDB aggregation query that will give me the sum of data for the current month, but how could I modify this so that it would return an array from each day of the month, with the total for each day (I assume this is possible, but I'm finding it difficult to get it working)? If this isn't possible, is there a better way of doing this than using a loop and running 30 group queries?
I'm using the PHP driver, but an answer in shell is just as useful.
$total_this_month = $db->test->group(
array( ),
array(
'sum' => 0
),
new MongoCode( 'function(doc, out){ out.sum += doc.data; }' ),
array(
'condition' => array(
'time' => array(
'$gte' => new MongoDate(strtotime('first day of this month, 00:00:00')),
'$lte' => new MongoDate(strtotime('last day of this month, 23:59:59'))
)
)
)
);
If you plan on running your group query often you should consider adding a new field or fields that allow you to group by the time period you need. The previous answer of using map-reduce is a great one if this is an ad-hoc query that doesn't need performance tuning. For example, I have a collection that needs to be aggregated sometimes by day, week, month, etc. Here is an example record:
{"_id" : ObjectId("4ddaed3a8b0f766963000003"),
"name": "Sample Data",
"time" : "Mon May 23 2011 17:26:50 GMT-0600 (MDT)",
"period" : {
"m" : 201105,
"w" : 201121,
"d" : 20110523,
"h" : 2011052317
}
}
With these additional fields I can do a lot more with the group function and can also index those fields for faster queries. You can choose to use integers, as I did, or strings - either way will work, but remember that your query parameters need to be of the same data type. I like integers because it seems that they should perform a little better and use less space (just a hunch).
The group command doesn't support generating new grouping keys via a function, so use map/reduce instead of group.