Iterating over a list in MongoDB - mongodb

I'm a complete novice in MongoDB and I'm trying to delete some entries from a collection day by day. I have to do it day by day coz the collection is huge and removing by month times out. Here's an example code I have:
days = ['2018-04-01-day','2018-04-02-day','2018-04-03-day','2018-04-04-day','2018-04-05-day','2018-04-06-day','2018-04-07-day','2018-04-08-day','2018-04-09-day','2018-04-10-day','2018-04-11-day','2018-04-12-day','2018-04-13-day','2018-04-14-day','2018-04-15-day','2018-04-16-day','2018-04-17-day','2018-04-18-day','2018-04-19-day','2018-04-20-day','2018-04-21-day','2018-04-22-day','2018-04-23-day','2018-04-24-day','2018-04-25-day','2018-04-26-day','2018-04-27-day','2018-04-28-day','2018-04-29-day','2018-04-30-day']
var day;
for(day of days)
{
print(day)
db.<colln>.remove
(
{ 'time_bucket': day },
{ 'URL':/https:\/\/abc.com/}
)
}
The above code executes, but only gives me the following:
2018-04-06-day
WriteResult({ "nRemoved" : 0 })
I would have expected to atleast see all the dates printed, but even that's not happening.
I tried other methods using UTC date methods and they didn't seem to work as well.
I am able to make the following code work on a smaller collection:
db.<small colln>.remove(
{ 'time_bucket': '2018-04-month' },
{ 'URL':/https:\/\/abc.com/}
)
But the above code (removing by month) won't work for a larger collection, which is why I'm forced to do it day by day, by creating an array for multiple days. I know it's not the most efficient method, but I just need to make it work anyhow.
Any help would be much appreciated.

for(var day in days) will iterate through indexes of your array, producing
for(var day in days){ print (day) }
0
1
2
3
4
5
I believe you meant to use for(var day of days):
for(var day of days){ print (day) }
2018-04-01-day
2018-04-02-day
2018-04-03-day
2018-04-04-day
2018-04-05-day
2018-04-06-day
Just wanted to add a few more details. I have tested the following on my local MongoDB 4.2 on this collection:
{ "_id" : ObjectId("6053e2f7acf8d9b7cc48adf0"), "name" : "test 4", "time_bucket" : "2018-04-03-day" }
{ "_id" : ObjectId("6053e4ccacf8d9b7cc48adf1"), "name" : "test", "time_bucket" : "2018-04-01-day" }
{ "_id" : ObjectId("6053e4d3acf8d9b7cc48adf2"), "name" : "test 1", "time_bucket" : "2018-04-01-day" }
{ "_id" : ObjectId("6053e4ddacf8d9b7cc48adf3"), "name" : "test 34", "time_bucket" : "2018-04-02-day" }
const days = ['2018-04-01-day','2018-04-02-day']
for(let day of days){ db.testcol.remove( { 'time_bucket': day }) }
After executing it, the collection looks like this:
{ "_id" : ObjectId("6053e2f7acf8d9b7cc48adf0"), "name" : "test 4", "time_bucket" : "2018-04-03-day" }
So everything appears to work as intended.

Related

MongoDB TTL/ExpireAfterSeconds is misbehaving and not deleting all data after given time

1) We have put expireAfterSeconds=15 on column of type: date
[
{
"v" : 1,
"key" : {
"_ts" : -1
},
"name" : "AnjaliIndex",
"ns" : "test.sessions",
"expireAfterSeconds" : 15
}
]
It is working fine on yesterdays date but is not working fine on todays date i.e it is removing data when i change document date from current date to yesterdays date where it should delete all data. (Current date which i given is even not future time but previous time)
Why is this happening? Is there any particular cycle or time when mongodb engine collect documents for expiry?
(I have seen related question but in that question use case is different where he was giving future date)
Mongo DB Version: 3.2.22
Sample Document:(not gettinkg deleted)
{
"_id" : ObjectId("5dde452818c87122389bbc09"),
"authorization" : "a0ce0b43-194d-4402-99cb-b660b3365757",
"userNumber" : "gourav#gmail.com",
"_ts" : ISODate("2019-11-27T13:43:04.776Z")
}
I will try to answer and see if that can help you.
db.my_collection.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
After that, every document that you insert in this collection must have the "createdAt" with the current date:
db.myCollection.insert( {
"createdAt": new Date(), // This can be set in UTC
"dataExample": 2,
"Message": "#### My data ####"
} )

Getting data from outside of group

I have a lot of devices non-periodically inserting data into mongo.
I need to get statistics of this data (value by day/month/year). Currently i am doing this by adding a field where I parse the date to day month and year using $month, $year, $dayOfMonth. Then grouping them by these values. The problem is when I get no (or only one) data a day. Then I cant get actual value in this day because I need 2 values to subtract.
Is there a way to get the closest document by day to this group? in one query?
Lets say I have data:
{id : 1, ts : "2017-12-15T10:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-15T17:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-14T12:00:00.000Z", value : 6}
{id : 1, ts : "2017-12-14T15:00:00.000Z", value : 10}
{id : 1, ts : "2017-12-14T10:00:00.000Z", value : 10}
{id : 2, ts : "2017-12-14T09:00:00.000Z", value : 3}
Explanation of problem:
The value is actual read from the meter, for example lets say consumed energy. If device sonsumes 4W/min after 1 min it will be 4 after 2 minutes it will be 8. So the delta between 1. and 2. minute will be 4 . If i have record from 2017-12-14T23:58:00.000Z lets say 10W 23:59 it will be 14W so dValue should be 4 and 00:00 the next day i am not able to calculate the dValue because this is the first and only record in this group
If I group this data by day I can calculate the value difference only in 2017-12-14.
For now I am using this query:
{
$addFields : {
month : {$month : "$ts"},
year : {$year : "$ts"},
day : {$dayOfMonth : "$ts"}
}
},
{
$group : {
_id : {
year : "$year",
month : "$month",
day : "$day",
id : "$id"
},
first : {$min : "$$ROOT"},
last : {$max : "$$ROOT"},
}
},
{
$addFields : {
dValue: {$subtract : [last.value, first.value]} //delta value
}
},
This query works but only if there is more than one document in a day. If there is only one document i cant get accurate data. I want to do this in one query, because i have a lot of these devices and the number is going to only increase and if i have to do a query for every device i get insane number of queries to the database. Is there a way how to solve this ?

Find closest date in one query

I'm currently trying to figure out a way to find the closest date of a entry in mongoDB to the on i'm looking for.
Currently i solved the problem by using 2 queries. One using $gte and limit(1) to look for the next larger date and then $lte - limit(1) to see if there is a closer on that might be lower.
I was wondering, if there might be a way to find the closest date in just one query, but was not able to find anything on that matter.
Hope you can help me with this, or at least tell me for sure that this is the only way to do so.
db.collection.find({"time":{$gte: isoDate}}).sort({"time":1}).limit(1)
db.collection.find({"time":{$lte: isoDate}}).sort({"time":-1}).limit(1)
But I am looking for a way to do this in one query so i dont have to subtract the results to find the closest one.
I solved a similar problem using an aggregation.
Sample data:
{
"_id" : ObjectId("5e365a1655c3f0bea76632a0"),
"time" : ISODate("2020-02-01T00:00:00Z"),
"description" : "record 1"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a1"),
"time" : ISODate("2020-02-01T00:05:00Z"),
"description" : "record 2"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a2"),
"time" : ISODate("2020-02-01T00:10:00Z"),
"description" : "record 3"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a3"),
"time" : ISODate("2020-02-01T00:15:00Z"),
"description" : "record 4"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a4"),
"time" : ISODate("2020-02-01T00:20:00Z"),
"description" : "record 5"
}
{
"_id" : ObjectId("5e365a1655c3f0bea76632a5"),
"time" : ISODate("2020-02-01T00:25:00Z"),
"description" : "record 6"
}
And I'm looking for the record nearest to ISODate('2020-02-01T00:18:00.000Z').
db.test_collection.aggregate([
{
$match:
{
time:
{
$gte: ISODate('2020-02-01T00:13:00.000Z'),
$lte: ISODate('2020-02-01T00:23:00.000Z')
}
}
},
{
$project:
{
time: 1,
description: 1,
time_dist: {$abs: [{$subtract: ["$time", ISODate('2020-02-01T00:18:00.000Z')]}]}}
},
{
$sort: {time_dist: 1}
},
{
$limit: 1
}])
The $match stage sets up a "time window". I used 5 minutes for this example.
The $project stage adds a time distance field. This is the time in milliseconds each record is from the query time of ISODate('2020-02-01T00:18:00.000Z').
Then I sorted on the time_dist field and limit the results to 1 to return the record with time closest to ISODate('2020-02-01T00:18:00.000Z').
The result of the aggregation:
{
"_id" : ObjectId("5e365a1655c3f0bea76632a4"),
"time" : ISODate("2020-02-01T00:20:00Z"),
"description" : "record 5",
"time_dist" : NumberLong(120000)
}
check this one
db.collection.find({"time":{$gte: isoDate,$lt: isoDate}}).sort({"time":1}).limit(1)
Please use the same format what mongodb support like following
ISODate("2015-10-26T00:00:00.000Z")
In Pymongo, I used the following function. The idea is to take a datetime object, subtract some days from it and add some days to it, then find a date between those two dates. If there are no such records, increase the date span:
import datetime, dateutil
def date_query(table, date, variance=1):
'''Run a date query using closest available date'''
try:
date_a = date - dateutil.relativedelta.relativedelta(days=variance)
date_b = date + dateutil.relativedelta.relativedelta(days=variance)
result = db[table].find({'date': {'$gte': date_a, '$lt': date_b}}).sort([('date', 1)])
result = list(result)
assert len(result) >= 1
return result[len(result)//2] # return the result closest to the center
except:
return date_query(table, date, variance=variance*2)
accourding to https://stackoverflow.com/a/33351918/4885936
don't need ISODate
simple easy solution is:
if you want 1 hour left to due date just simply :
const tasks = await task.find({
time: {
$gt: Date.now(),
$lt: Date.now() + 3600000 // one hour to miliseconds
}
})
this code get tasks from now to upcoming one hour later.

Need some help completing this aggregation pipeline

I have an analytics collection where I store queries as individual documents. I want to count the number of queries taking place over the past day (24 hours). Here's the aggregation command as it is:
db.analytics.aggregate([{$group:{_id:{"day":{$dayOfMonth:"$datetime"},"hour":{$hour:"$datetime"}},"count":{$sum:1}}},{$sort:{"_id.day":1,"_id.hour":1}}])
The result looks like:
.
.
.
{
"_id" : {
"day" : 17,
"hour" : 19
},
"count" : 8
},
{
"_id" : {
"day" : 17,
"hour" : 22
},
"count" : 1
},
{
"_id" : {
"day" : 18,
"hour" : 0
},
"count" : 1
}
.
.
.
Originally, my plan was to add a $limit operation to simply take the last 24 results. That's a great plan until you realize that there are some hours without any queries at all. So the last 24 documents could go back more than a single day. I thought of using $match, but I'm just not sure how to go about constructing it. Any ideas?
First of all you need to get the day just as current date or as most recent document from the collection. Then use query for specified day like:
db.analytics.aggregate([
{$project:{datetime:"$datetime",day:{$dayOfMonth:"$datetime"}}},
{$match:{day:3}},
{$group:{_id:{"hour":{$hour:"$datetime"}},"count":{$sum:1}}},
{$sort:{"_id.hour":1}}
]);
where 3 is the day of the month here {$match:{day:3}}
The idea is to add a day field, so, we able to filter by it, then group documents of the day by hours and sort.

Incremental MapReduce with time interval in mongoDB

i got some records from server with time interval of 10 minute (in 1 hours i will get 6 files)
i want to do map reduce on every 1 hours in next hours i will have to do map reduce of next group on 6 files with last hours file
how i will solve this problem ?
help me
im confuse frm last 1 month
thank You
Sushil Kr Singh
In order to summarize your 10-minute log files by the hour, you could round down the timestamp of each logfile to the nearest hour in the map function and group the results by hours in the reduce function.
Here is a little dummy example that illustrates this from the mongo shell:
Create 100 log files, each 10 minutes apart and containing a random number between 0-10, and insert them in the logs collection in the database:
for (var i = 0; i < 100; i++) {
d = new ISODate();
d.setMinutes(d.getMinutes() + i*10);
r = Math.floor(Math.random()*11)
db.logs.insert({timestamp: d, number: r})
}
To check what the logs collection looks like, send a query like db.logs.find().limit(3).pretty(), which results in:
{
"_id" : ObjectId("50455a3570537f9433c1efb2"),
"timestamp" : ISODate("2012-09-04T01:32:37.370Z"),
"number" : 2
}
{
"_id" : ObjectId("50455a3570537f9433c1efb3"),
"timestamp" : ISODate("2012-09-04T01:42:37.370Z"),
"number" : 3
}
{
"_id" : ObjectId("50455a3570537f9433c1efb4"),
"timestamp" : ISODate("2012-09-04T01:52:37.370Z"),
"number" : 8
}
Define a map function (in this example called mapf) that rounds the timestamps to the nearest hour (rounded down), which is used for the emit key. The emit value is the number for that log file.
mapf = function () {
// round down to nearest hour
d = this.timestamp;
d.setMinutes(0);
d.setSeconds(0);
d.setMilliseconds(0);
emit(d, this.number);
}
Define a reduce function, that sums over all the emitted values (i.e. numbers).
reducef = function (key, values) {
var sum = 0;
for (var v in values) {
sum += values[v];
}
return sum;
}
Now execute map/reduce on the logs collection. The out parameter here specifies that we want to write the results to the hourly_logs collection and merge existing documents with new results. This ensures that log files submitted later (e.g. after a server failure or other delay) will be included in the results once they appear in the logs.
db.logs.mapReduce(mapf, reducef, {out: { merge : "hourly_logs" }})
Lastly, to see the results, you can query a simple find on hourly_logs:
db.hourly_logs.find()
{ "_id" : ISODate("2012-09-04T02:00:00Z"), "value" : 33 }
{ "_id" : ISODate("2012-09-04T03:00:00Z"), "value" : 31 }
{ "_id" : ISODate("2012-09-04T04:00:00Z"), "value" : 21 }
{ "_id" : ISODate("2012-09-04T05:00:00Z"), "value" : 40 }
{ "_id" : ISODate("2012-09-04T06:00:00Z"), "value" : 26 }
{ "_id" : ISODate("2012-09-04T07:00:00Z"), "value" : 26 }
{ "_id" : ISODate("2012-09-04T08:00:00Z"), "value" : 25 }
{ "_id" : ISODate("2012-09-04T09:00:00Z"), "value" : 46 }
{ "_id" : ISODate("2012-09-04T10:00:00Z"), "value" : 27 }
{ "_id" : ISODate("2012-09-04T11:00:00Z"), "value" : 42 }
{ "_id" : ISODate("2012-09-04T12:00:00Z"), "value" : 43 }
{ "_id" : ISODate("2012-09-04T13:00:00Z"), "value" : 35 }
{ "_id" : ISODate("2012-09-04T14:00:00Z"), "value" : 22 }
{ "_id" : ISODate("2012-09-04T15:00:00Z"), "value" : 34 }
{ "_id" : ISODate("2012-09-04T16:00:00Z"), "value" : 18 }
{ "_id" : ISODate("2012-09-04T01:00:00Z"), "value" : 13 }
{ "_id" : ISODate("2012-09-04T17:00:00Z"), "value" : 25 }
{ "_id" : ISODate("2012-09-04T18:00:00Z"), "value" : 7 }
The result is an hourly summary of your 10-minute logs, with the _id field containing the start of the hour and the value field the sum of the random numbers. In your case, you may have different aggregation operators; modify the reduce functions according to your needs.
As Sammaye mentioned in the comment, you could automate the map/reduce call with a cron job entry to run every hour.
If you don't want to process the full logs collection every time, you can run incremental updates by limiting the documents to hourly time windows like so:
var q = { $and: [ {timestamp: {$gte: new Date(2012, 8, 4, 12, 0, 0) }},
{timestamp: {$lt: new Date(2012, 8, 4, 13, 0, 0) }} ] }
db.logs.mapReduce(mapf, reducef, {query: q, out: { merge : "hourly_logs" }})
This would only include log files between the hours of 12 and 13. Note that the month value in the Date() object starts at 0 (8=September). Because of the merge option, it is safe to run the m/r on already processed log files.