Aggregate MongoDB results by ObjectId date - mongodb

How can I aggregate my MongoDB results by ObjectId date. Example:
Default cursor results:
cursor = [
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'},
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'},
]
Projected results:
projected_cursor = [
{'2013-09-08':
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'}
},
{'2013-09-07':
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'}
}
]
This is what I'm currently using in PyMongo to achieve these results, but it's messy and I'd like to see how I can do it using MongoDB's aggregation framework (or even MapReduce):
cursor = db.find({}, limit=10).sort("_id", pymongo.DESCENDING)
messages = [x for x in cursor]
this_date = lambda x: x['_id'].generation_time.date()
dates = set([this_date(message) for message in messages])
dates_dict = {date: [m for m in messages if this_date(m) == date] for date in dates}
And yes, I know that the easiest way would be to simply add a new date field to each record then aggregate by that, but that's not what I want to do right now.
Thanks!

Update: There is a built in way to do this now, see https://stackoverflow.com/a/51766657/295687
There is no way to accomplish what you're asking with mongodb's
aggregation framework, because there is no aggregation operator that
can turn ObjectId's into something date-like (there is a JIRA
ticket, though). You
should be able to accomplish what you want using map-reduce, however:
// map function
function domap() {
// turn ObjectId --> ISODate
var date = this._id.getTimestamp();
// format the date however you want
var year = date.getFullYear();
var month = date.getMonth();
var day = date.getDate();
// yields date string as key, entire document as value
emit(year+"-"+month+"-"+day, this);
}
// reduce function
function doreduce(datestring, docs) {
return {"date":datestring, "docs":docs};
}

The Jira Ticket pointed out by llovett has been solved, so now you can use date operators like $isoWeek and $year to extract this information from an ObjectId.
Your aggregation would look something like this:
{
"$project":
{
"_id": {
"$dateFromParts" : {
"year": { "$year": "$_id"},
"month": { "$month": "$_id"},
"day": { "$dayOfMonth": "$_id"}
}
}
}
}

So this doesn't answer my question directly, but I did find a better way to replace all that lambda nonsense above using Python's setdefault:
d = {}
for message in messages:
key = message['_id'].generation_time.date()
d.setdefault(key,[]).append(message)
Thanks to #raymondh for the hint in is PyCon talk:
Transforming Code into Beautiful, Idiomatic Python

Related

How to update ISODate to two days later

How to convert a SQL query to mongodb query?
Please write a mongodb query - this is my query in SQL:
UPDATE user
SET expireIn = DATEADD(DAY, 2, expireIn)
WHERE phone = '123434574'
I want to add some day to expireIn column.
expireIn field is ISODate and also has a value for the time.
Welcome Mostafa Asadi,
You can do something like this:
db.collection.update({
phone: "123434574"
},
[
{
$set: {
"expireIn": {
$dateAdd: {
startDate: "$expireIn",
unit: "day",
amount: 2
}
}
}
}
],{multi:true})
As you can see on the playground.
The first {} are the matching part, which documents do you want to update. The second part is the updating, here inside [] as this is a pipeline, using the $dateAdd function.
Edit:
with {multi: true} for multiple documents update

Compute Simple Moving Average in Mongo Shell

I am developing a financial application with Nodejs. I wonder would it be possible to compute simple moving average which is the average last N days of price directly in Mongo Shell than reading it and computing it in Node js.
Document Sample.
[{code:'0001',price:0.10,date:'2014-07-04T00:00:00.000Z'},
{code:'0001',price:0.12,date:'2014-07-05T00:00:00.000Z'},{code:'0001',price:0.13,date:'2014-07-06T00:00:00.000Z'},
{code:'0001',price:0.12,date:'2014-07-07T00:00:00.000Z'}]
If you have more than a trivial number of documents you should use the DB server to do the work rather than JS.
You don't say if you are using mongoose or the node driver directly. I'll assume you are using mongoose as that is the way most people are headed.
So your model would be:
// models/stocks.js
const mongoose = require("mongoose");
const conn = mongoose.createConnection('mongodb://localhost/stocksdb');
const StockSchema = new mongoose.Schema(
{
price: Number,
code: String,
date: Date,
},
{ timestamps: true }
);
module.exports = conn.model("Stock", StockSchema, "stocks");
You rightly suggested that aggregation frameworks would be a good way to go here. First though if we are dealing with returning values between date ranges, the records in your database need to be date objects. From your example documents you may have put strings. An example of inserting objects with dates would be:
db.stocks.insertMany([{code:'0001',price:0.10,date:ISODate('2014-07-04T00:00:00.000Z')}, {code:'0001',price:0.12,date:ISODate('2014-07-05T00:00:00.000Z')},{code:'0001',price:0.13,date:ISODate('2014-07-06T00:00:00.000Z')}, {code:'0001',price:0.12,date:ISODate('2014-07-07T00:00:00.000Z')}])
The aggregation pipeline function accepts an array with one or more pipeline stages.
The first pipeline stage we should use is $match, $match docs, this filters the documents down to only the records we are interested in which is important for performance
{ $match: {
date: {
$gte: new Date('2014-07-03'),
$lte: new Date('2014-07-07')
}
}
}
This stage will send only the documents that are on the 3rd to 7th July 2014 inclusive to the next stage (in this case all the example docs)
Next stage is the stage where you can get an average. We need to group the values together based on one field, multiple fields or all fields.
As you don't specify a field you want to average over I'll give an example for all fields. For this we use the $group object, $group docs
{
$group: {
_id: null,
average: {
$avg: '$price'
}
}
}
This will take all the documents and display an average of all the prices.
In the case of your example documents this results in
{ _id: null, avg: 0.1175 }
Check the answer:
(0.10 + 0.12 + 0.12 + 0.13) / 4 = 0.1175
FYI: I wouldn't rely on calculations done with javascript for anything critical as Numbers using floating points. See https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html for more details if you are worried about that.
For completeness here is the full aggregation query
const Stock = require("./models/stocks");
Stock.aggregate([{ $match: {
date: {
$gte: new Date('2014-07-03'),
$lte: new Date('2014-07-07')
}
}},
{
$group: {
_id: null,
avg: {
$avg: '$price'
}
}
}])
.then(console.log)
.catch(error => console.error(error))
Not sure about your moving average formula, but here is how I would do it:
var moving_average = null
db.test.find().forEach(function(doc) {
if (moving_average==null) {
moving_average = doc.price;
}
else {
moving_average = (moving_average+doc.price)/2;
}
})
output:
> moving_average
0.3
And if you wan to define the N days to do the average for, just modify the argument for find:
db.test.find({ "date": { $lt: "2014-07-10T00:00:00.000Z" }, "date": { $gt: "2014-07-07T00:00:00.000Z" } })
And if you want to do the above shell code in one-line, you can assume that moving_average is undefined and just check for that before assigning the first value.

mongodb- Push query results into variable

I'm trying to push/insert query results into variable. It gives me error for both the below cases. However, the queries individually gives me correct result. The error is only when trying to push the value into variable.Can you identify what is wrong with the queries?
var aggVal =db.zeroDimFacts.aggregate(
{$group: {_id: '', maxi: {$max: "$_id"}}},
{$project: {_id:0, maxe:"$maxi"}})
printjson(aggVal)
var ss = db.zeroDimFacts.find({},{_id:1}).sort({"_id": -1}).limit(1)
printjson(ss)
Both the codes give some similar errors like:
DBQuery: Agronomics.zeroDimFacts -> { "query" : { }, "orderby" : { "_id" : 1 } }
Using the aggregation framework you would need to run the following pipeline;
Note - the pipeline should be an array of aggregation operators piped together
var cursor = db.zeroDimFacts.aggregate([
{ "$group": { "_id": null, "maximumId": { "$max": "$_id" } } }
]);
var maximumId = cursor.toArray()[0]["maximumId"];
printjson(maximumId);
This may not be perfect way to get max value. However, it still continues to give correct result.
If someone can provide/ suggest a correct way, I will mark it as answer after two days.
After a long trial n errors, I have found an answer to this. First push the values into an array and the zeroth element of an array will have your max value. Now push that value to another variable.
var ss = db.zeroDimFacts.find({},{_id:1}).limit(1).sort({_id:-1}).toArray()
var vi =Number(ss[0]._id)
print(vi)

Publish all fields in document but just part of an array in the document

I have a mongo collection in which the documents have a field that is an array. I want to be able to publish everything in the documents except for the elements in the array that were created more than a day ago. I suspect the answer will be somewhat similar to this question.
Meteor publication: Hiding certain fields in an array document field?
Instead of limiting fields in the array, I just want to limit the elements in the array being published.
Thanks in advance for any responses!
EDIT
Here is an example document:
{
_id: 123456,
name: "Unit 1",
createdAt: (datetime object),
settings: *some stuff*,
packets: [
{
_id: 32412312,
temperature: 70,
createdAt: *datetime object from today*
},
{
_id: 32412312,
temperature: 70,
createdAt: *datetime from yesterday*
}
]
}
I want to get everything in this document except for the part of the array that was created more than 24 hours ago. I know I can accomplish this by moving the packets into their own collection and tying them together with keys as in a relational database but if what I am asking were possible, this would be simpler with less code.
You could do something like this in your publish method:
Meteor.publish("pubName", function() {
var collection = Collection.find().fetch(); //change this to return your data
_.each(collection, function(collectionItem) {
_.each(collectionItem.packets, function(packet, index) {
var deadline = Date.now() - 86400000 //should equal 24 hrs ago
if (packet.createdAt < deadline) {
collectionItem.packets.splice(index, 1);
}
}
}
return collection;
}
Though you might be better off storing the last 24 hours worth of packets as a separate array in your document. Would probably be less taxing on the server, not sure.
Also, code above is untested. Good luck.
you can use the $elemMatch projection
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/
So in your case, it would be
var today = new Date();
var yesterday = new Date(today);
yesterday.setDate(today.getDate() - 1);
collection.find({}, //find anything or specifc
{
fields: {
'packets': {
$elemMatch: {$gt : {'createdAt' : yesterday /* or some new Date() */}}
}
}
});
However, $elemMatch only returns the FIRST element matching your condition. To return more than 1 element, you need to use the aggregation framework, which will be more efficient than _.each or forEach, particularly if you have a large array to loop through.
collection.rawCollection().aggregate([
{
$match: {}
},
{
$redact: {
$cond: {
if : {$or: [{$gt: ["$createdAt",yesterday]},"$packets"]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}], function (error, result ){
});
You specify the $match in a way similar to find({}). Then all the documents that match your conditions get pipped into the $redact which is specified by the $cond.
$redact scans the document from top level to bottom. At the top level, you have _id, name, createdAt, settings, packets; hence {$or: [***,"$packets"]}
The presence of $packets in the $or allows the $redact to scan the second level which contain the _id, temperature and createdAt; hence {$gt: ["$createdAt",yesterday]}
This is async, you can use Meteor.wrapAsync to wrap around the function.
Hope this help

How to query mongodb between dates only by ObjectID? [duplicate]

I know that ObjectIds contain the date they were created on. Is there a way to query this aspect of the ObjectId?
Popping Timestamps into ObjectIds covers queries based on dates embedded in the ObjectId in great detail.
Briefly in JavaScript code:
/* This function returns an ObjectId embedded with a given datetime */
/* Accepts both Date object and string input */
function objectIdWithTimestamp(timestamp) {
/* Convert string date to Date object (otherwise assume timestamp is a date) */
if (typeof(timestamp) == 'string') {
timestamp = new Date(timestamp);
}
/* Convert date object to hex seconds since Unix epoch */
var hexSeconds = Math.floor(timestamp/1000).toString(16);
/* Create an ObjectId with that hex timestamp */
var constructedObjectId = ObjectId(hexSeconds + "0000000000000000");
return constructedObjectId
}
/* Find all documents created after midnight on May 25th, 1980 */
db.mycollection.find({ _id: { $gt: objectIdWithTimestamp('1980/05/25') } });
In pymongo, it can be done this way:
import datetime
from bson.objectid import ObjectId
mins = 15
gen_time = datetime.datetime.today() - datetime.timedelta(mins=mins)
dummy_id = ObjectId.from_datetime(gen_time)
result = list(db.coll.find({"_id": {"$gte": dummy_id}}))
Using inbuilt function provided by mongodb drivers in in Node.js lets you query by any timestamp:
var timestamp = Date.now();
var objectId = ObjectID.createFromTime(timestamp / 1000);
Alternatively, to search for records before the current time, you can simply do:
var objectId = new ObjectID(); // or ObjectId in the mongo shell
Source: http://mongodb.github.io/node-mongodb-native/api-bson-generated/objectid.html
You can use $convert function to extract the date from ObjectId starting in 4.0 version.
Something like
$convert: { input: "$_id", to: "date" }
You can query on date comparing between start and end time for date.
db.collectionname.find({
"$expr":{
"$and":[
{"$gte":[{"$convert":{"input":"$_id","to":"date"}}, ISODate("2018-07-03T00:00:00.000Z")]},
{"$lte":[{"$convert":{"input":"$_id","to":"date"}}, ISODate("2018-07-03T11:59:59.999Z")]}
]
}
})
OR
You can use shorthand $toDate to achieve the same.
db.collectionname.find({
"$expr":{
"$and":[
{"$gte":[{"$toDate":"$_id"}, ISODate("2018-07-03T00:00:00.000Z")]},
{"$lte":[{"$toDate":"$_id"},ISODate("2018-07-03T11:59:59.999Z")]}
]
}
})
how to find Find the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.find({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
}).pretty()
Count the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.count({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
})
Remove the Command (this date[2015-1-12] to this Date[2015-1-15]):
db.collection.remove({
_id: {
$gt: ObjectId(Math.floor((new Date('2015/1/12'))/1000).toString(16) + "0000000000000000"),
$lt: ObjectId(Math.floor((new Date('2015/1/15'))/1000).toString(16) + "0000000000000000")
}
})
Since the first 4 bytes of an ObjectId represent a timestamp, to query your collection chronologically, simply order by id:
# oldest first; use pymongo.DESCENDING for most recent first
items = db.your_collection.find().sort("_id", pymongo.ASCENDING)
After you get the documents, you can get the ObjectId's generation time like so:
id = some_object_id
generation_time = id.generation_time
Yes you can query object by date using MongoDB inserted ID
db.collectionname.find({_id: {$lt: ObjectId.fromDate( new ISODate("TZformat") ) } });
let's suppose users is my collection and I want all users created less than 05 January 2018
db.users.find({_id: {$lt: ObjectId.fromDate( new ISODate("2018-01-05T00:00:00.000Z") ) } });
For running from a query we can use like
db.users.find({_id: {$lt: ObjectId.fromDate(new Date((new Date().getTime() - (1 * 3 * 60 * 60 * 1000))) ) } })
All the users from the current time - 3 hours
To get last 60 days old documents in mongo collection i used below query in shell.
db.collection.find({_id: {$lt:new ObjectId( Math.floor(new Date(new Date()-1000*60*60*24*60).getTime()/1000).toString(16) + "0000000000000000" )}})
If you want to make a range query, you can do it like in this post. For example querying for a specific day (i.e. Apr 4th 2015):
> var objIdMin = ObjectId(Math.floor((new Date('2015/4/4'))/1000).toString(16) + "0000000000000000")
> var objIdMax = ObjectId(Math.floor((new Date('2015/4/5'))/1000).toString(16) + "0000000000000000")
> db.collection.find({_id:{$gt: objIdMin, $lt: objIdMax}}).pretty()
From the documentation:
o = new ObjectId()
date = o.getTimestamp()
this way you have date that is a ISODate.
Look at
http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
for more information
Using MongoObjectID you should also find results as given below
db.mycollection.find({ _id: { $gt: ObjectId("5217a543dd99a6d9e0f74702").getTimestamp().getTime()}});
A Solution Filtering within MongoDB Compass.
Based on versions:
Compass version: 1.25.0
MongoDB version: 4.2.8
Option 1:
#s7vr 's answer worked perfectly for me. You can paste this into the Filter field:
{$expr: { $and: [ {$gte: [{$toDate: "$_id"}, ISODate('2021-01-01')]}, {$lt: [{$toDate: "$_id"}, ISODate('2021-02-01')]} ] } }
Option 2:
I also found this to work (remember that the Date's month parameter is 0-based indexing so January is 0):
{_id: {$gte: ObjectId(Date(2021, 0, 1) / 1000), $lt: ObjectId(Date(2021, 1, 1) / 1000) } }
Option 3:
Equivalent with ISODate:
{_id: {$gte: ObjectId(ISODate('2021-01-01') / 1000), $lt: ObjectId(Date('2021-02-01') / 1000) } }
After writing this post, I decided to run the Explain on these queries. Here's the skinny on performance:
Option 1: 39 ms, 0 indexes used, 30 ms in COLLSCAN
Option 2: 0 ms, _id index used
Option 3: 1 ms, _id index used, 1 ms in FETCH
Based on my rudimentary analysis, it appears that option 2 is the most efficient. I will use Option 3, personally, as it is a little cleaner to use ISODate rather than remembering 0-based month indexing in the Date object.
In rails mongoid you can query using
time = Time.utc(2010, 1, 1)
time_id = ObjectId.from_time(time)
collection.find({'_id' => {'$lt' => time_id}})