I am experiencing a very weird issue with MongoDB shell version: 2.4.6. It has to do with creating ISODate objects from strings. See below for a specific example.
Why does this not work.
collection.aggregate({$project: {created_at: 1, ts: {$add: new Date('created_at')}}}, {$limit: 1})
{
"result" : [
{
"_id" : ObjectId("522ff3b075e90018b2e2dfc4"),
"created_at" : "Wed Sep 11 04:38:08 +0000 2013",
"ts" : ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
}
],
"ok" : 1
}
But this does.
collection.aggregate({$project: {created_at: 1, ts: {$add: new Date('Wed Sep 11 04:38:08 +0000 2013')}}}, {$limit: 1})
{
"result" : [
{
"_id" : ObjectId("522ff3b075e90018b2e2dfc4"),
"created_at" : "Wed Sep 11 04:38:08 +0000 2013",
"ts" : ISODate("2013-09-11T04:38:08Z")
}
],
"ok" : 1
}
The short answer is that you're passing the string 'created_at' to the Date constructor. If you pass a malformed date string to the constructor, you get ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ") in return.
To properly create a new date you'd have to do so by passing in the contents of 'created_at'. Unfortunately, I don't know of a way to run a date constructor on a string using the aggregation framework at this time. If your collection is small enough, you could do this in the client by iterating over your collection and adding a new date field to each document.
Kindof similar problem:
I had documents in MongoDB with NO dates being set in, and the DB is filling up so ultimately I need to go in and delete items that are older than one year.
I did sortof have date in a crappy human readable string format. which I can grab from '$ParentMasterItem.Name';
Example: '20211109 VendorName ProductType Pricing Update Workflow'.
So here's my attempt to pull out the dates (via substring parsing) -- (thankfully I do happen to know that every one of the 100K documents has it set the same way)
db.MyCollectionName.aggregate({$project: {
created_at: 1,
ts: {$add: {
$dateFromString: {
dateString: {
/* 0123 (year)
45 (month)
67 (day)
# '20211109 blahblah string'*/
$concat: [ { $substr: [ "$ParentMasterItem.Name", 0, 4 ]}, "-",
{ $substr: [ "$ParentMasterItem.Name", 4, 2 ]}, "-",
{ $substr: [ "$ParentMasterItem.Name", 6, 2 ]},
'T01:00:00Z']
}}}}}}, {$limit: 10})
output:
{ _id: 8445390, ts: ISODate("2022-12-19T01:00:00.000Z") },
Related
I have documents with this structure:
{
"_id" : ObjectId("59936474a955fb3f18db07d5"),
"CD_MATRICULA" : 12,
"DT_FROM" : ISODate("2014-04-04T16:21:37.000-03:00"),
"DT_FROM_ANO" : 2012,
"DT_FROM_MES" : 2,
"DT_FROM_DIA" : 2 }
In this sample, DT_FROM_ANO differ from YEAR(DT_FROM), I need to find docs that have this difference, I try this but dont work:
db.getCollection('TABLESAMPLE').aggregate(
{$project: {
CD_MATRICULA: 1,
DT_FROM: 1,
DT_FROM_ANO: 1,
DT_FROM_MES: 1,
DT_FROM_DIA: 1,
DT_TO: 1,
ANO: {$year: '$DT_FROM'},
MES: {$month: '$DT_FROM'},
DIA: {$dayOfMonth: '$DT_FROM'}
}
},
{$match:
{ DT_FROM_ANO: {$ne: '$ANO'} }
}
);
All documents will return in this case, exist only one that have YEAR differs.
What is wrong ??? I need that only docs with difference between in year returns.
I have a mongodb database with a data structure like this:
{
"_id" : "123456":,
"Dates" : ["2014-02-05 09:00:15 PM", "2014-02-06 09:00:15 PM", "2014-02-07 09:00:15 PM"],
}
There are other stuff too but what I would like to do is pull data where Dates{$slice: -1} is older than 3 days. I have a query like this but it doesnt work:
db.posts.find({}, {RecoveryPoints:{$slice: -1} $lt: ISODate("2014-02-09 00:00:00 AM")})
I am beginning to think is not possible but I figured I would come here first. My guess is that I'll have to do this logic within my code.
Edit: I wanted to mention too that my goal is to return the entire record and not just the date that mathes it
Your query has all the elements on the projection side and not the query side. Since you do not want to change the appearance of the document, then thee is no need to project.
This is actually a simple comparison as if you are always looking at the first element then you can use the dot "." notation form to look for it:
> db.dates.find({ "Dates.0": {$lt: ISODate("2014-02-09 00:00:00 AM") } }).pretty()
{
"_id" : "123456",
"Dates" : [
ISODate("2014-02-05T09:00:15Z"),
ISODate("2014-02-06T09:00:15Z"),
ISODate("2014-02-07T09:00:15Z")
]
}
> db.dates.find({ "Dates.0": {$lt: ISODate("2014-02-05 00:00:00 AM") } }).pretty()
>
So the dot "." notation works at index value in the case of arrays, and to get other elements just change the position:
> db.dates.find({ "Dates.1": {$lt: ISODate("2014-02-07 00:00:00 AM") } }).pretty()
{
"_id" : "123456",
"Dates" : [
ISODate("2014-02-05T09:00:15Z"),
ISODate("2014-02-06T09:00:15Z"),
ISODate("2014-02-07T09:00:15Z")
]
}
> db.dates.find({ "Dates.3": {$lt: ISODate("2014-02-07 00:00:00 AM") } }).pretty()
>
> db.dates.find({ "Dates.2": {$lt: ISODate("2014-02-08 00:00:00 AM") } }).pretty()
{
"_id" : "123456",
"Dates" : [
ISODate("2014-02-05T09:00:15Z"),
ISODate("2014-02-06T09:00:15Z"),
ISODate("2014-02-07T09:00:15Z")
]
}
Note that this only makes sense if you are sorting the array on changes and can otherwise be sure that the first (or position you are querying ) element is going to be the one you want to compare. If you need to otherwise test against all elements, then use the $elemMatch operator in the query instead.
The only reason you wouldn't get the result you want with this query is where your dates are strings. If so, change them, and change whatever code is causing them to be saved that way.
How do I get MongoDB to calculate the sum of array values when the array field may be missing completely (as is the case for month 10)?
For example:
> db.month.save({MonthNum: 10,
... NumWeekdays: 23});
> db.month.save({MonthNum: 11,
... NumWeekdays: 21,
... Holidays: [ {Description: "Thanksgiving", NumDays: 2} ] });
> db.month.save({MonthNum: 12,
... NumWeekdays: 22,
... Holidays: [ {Description: "Christmas", NumDays: 6},
... {Description: "New Year's Eve", NumDays: 1} ] });
> db.month.aggregate( { $unwind: "$Holidays" },
... { $group: { _id: "$MonthNum",
... total: { $sum: "$Holidays.NumDays" } } });
{
"result" : [
{
"_id" : 12,
"total" : 7
},
{
"_id" : 11,
"total" : 2
}
],
"ok" : 1
}
How do I get month 10 to show up in the above results (showing "total" as 0)?
Bonus: How do I get the above to show the available weekdays (the NumWeekdays minus the sum of the Holidays)?
I've tried $project to get the data into a canonical format first but without success so far... thanks!
$unwind isn't passing along your document with MonthNum 10 because your Holidays array is empty on that document (see the note at the bottom of the $unwind docs). Assuming that Holidays is always either an array containing at least one item or completely absent from a document, you can use the $ifNull operator inside of $project to add a "Holiday" document that just has NumDays = 0 to your Holidays is null:
db.month.aggregate([
// Make "Holidays" = [{NumDays:0}] if "Holidays" is null for this document (i.e. absent)
{$project:{NumWeekDays:1, MonthNum:1, Holidays:{$ifNull:["$Holidays", [{"NumDays":0}]]}}},
// Now you can unwind + group as normal
{$unwind:"$Holidays"},
{$group:{_id:"$MonthNum", NumWeekDays:{$first:"$NumWeekDays"}, "total":{$sum:"$Holidays.NumDays"}}},
// This should take care of "available weekdays"
{$project:{total:1, available:{$subtract:["$NumWeekDays", "$total"]}}}
]);
Note that $ifNull won't work if for some of your documents Holidays is an empty array; it has to be absent completely.
I have a collections of objects with structure like this:
{
"_id" : ObjectId("5233a700bc7b9f31580a9de0"),
"id" : "3df7ce4cc2586c37607a8266093617da",
"published_at" : ISODate("2013-09-13T23:59:59Z"),
...
"topic_id" : [
284,
9741
],
...
"date" : NumberLong("1379116800055")
}
I'm trying to use the following query:
db.collection.find({"topic_id": { $in: [ 9723, 9953, 9558, 9982, 9833, 301, ... 9356, 9990, 9497, 9724] }, "date": { $gte: 1378944001000, $lte: 1378954799000 }, "_id": { $gt: ObjectId('523104ddbc7b9f023700193c') }}).sort({ "_id": 1 }).limit(1000)
The above query uses topic_id, date index but then it does not keep the order of returned results.
Forcing it to use hint({_id:1}) makes the results ordered, but the nscanned is 1 million documents even though limit(1000) is specified.
What am I missing?
I was wondering if someone could help me get my aggregation function right. I'm trying to count the number of times a piece of text appears per hour in a specified day. So far I've got:
db.daily_data.aggregate(
[
{ $project : { useragent: 1, datetime: 1, url: 1, hour: {$hour: new Date("$datetime")} } },
{ $match : { datetime: {$gte: 1361318400000, $lt: 1361404800000}, useragent: /.*LinkCheck by Siteimprove.*/i } },
{ $group : { _id : { useragent: "$useragent", hour: "$hour" }, queriesPerUseragent: {$sum: 1} } }
]
);
But I'm obviously getting it wrong as hour is always 0:
{
"result" : [
{
"_id" : {
"useragent" : "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0) LinkCheck by Siteimprove.com",
"hour" : 0
},
"queriesPerUseragent" : 94215
}
],
"ok" : 1
}
Here's a trimmed down example of a record too:
{
"_id" : ObjectId("50fe63c70266a712e8663725"),
"useragent" : "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0) LinkCheck by Siteimprove.com",
"datetime" : NumberLong("1358848954813"),
"url" : "http://www.somewhere.com"
}
I've also tried using new Date("$datetime").getHours() instead of the $hour function to try and get the same result but with no luck. Can someone point me in the direction of where I'm going wrong?
Thanks!
This is a recommendation rather an answer for your problem.
On MongoDB for analytics it's recommended to pre-aggregate your buckets (hourly buckets in your use case) for every metric you want to calculate.
So, for your metric you can update your pre-aggregated collection (speeding up your query time):
db.user_agent_hourly.update({url: "your_url", useragent: "your user agent", hour: current_HOUR_of_DAY, date: current_DAY_Date}, {$inc: {counter:1}}, {upsert:true})
Take into account that in current_DAY_Date you have to point to stable date value for the current day, i.e., current_year/current_month/current_day 00:00:00 , using the same hour:minute:second to every metric received in current day.
Then, you can query this collection, extracting aggregated analytics for any given period of time as follows:
db.user_agent_hourly.aggregate(
{$match:{date:{$gte: INITIAL_DATE, $lt: FINAL_DATE}}},
{$group:{ _id : { useragent: "$useragent", hour: "$hour" } ,queriesPerUseragent: {$sum: "$count"} } },
{$sort:{queriesPerUseragent:-1}}
)
If you want to filter the results using a specific user agent, you can use the next query:
db.user_agent_hourly.aggregate(
{$match:{date:{$gte: INITIAL_DATE, $lt: FINAL_DATE, useragent: "your_user_agent"}}},
{$group:{ _id : { useragent: "$useragent", hour: "$hour" }, queriesPerUseragent: {$sum: "$count"} } }
)
PS: We store every single received metric in other collection to be able to reprocess it in case of disaster or other needs.