mongodb TTL not removing documents - mongodb

I have a simple schema like:
{
_id: String, // auto generated
key: String, // there is a unique index on this field
timestamp: Date() // set to current time
}
Then I set the TTL index like so:
db.sess.ensureIndex( { "timestamp": 1 }, { expireAfterSeconds: 3600 } )
I expect the record to removed after 1 hour but it is never removed.
I flipped on verbose logging and I see the TTLMonitor running:
Tue Sep 10 10:42:37.081 [TTLMonitor] TTL: { timestamp: 1.0 } { timestamp: { $lt: new Date(1378823557081) } }
Tue Sep 10 10:42:37.081 [TTLMonitor] TTL deleted: 0
When I run that query myself I see all my expired records coming back:
db.sess.find({ timestamp: { $lt: new Date(1378823557081) }})
...
Any ideas? I'm stumped.
EDIT - Example document below
{ "_id" : "3971446b45e640fdb30ebb3d58663807", "key" : "6XTHYKG7XBTQE9MJH8", "timestamp" : ISODate("2013-09-09T18:54:28Z") }

Can you show us what the inserted records actually look like?
How long is "never"? Because there's a big warning:
Warning: The TTL index does not guarantee that expired data will be deleted immediately. There may be a delay between the time a document expires and the time that MongoDB removes the document from the database.
Does the timestamp field have an index already?

This was my issue:
I had the index created wrong like this:
{
"v" : 1,
"key" : {
"columnName" : 1,
"expireAfterSeconds" : 172800
},
"name" : "columnName_1_expireAfterSeconds_172800",
"ns" : "dbName.collectionName"
}
When it should have been this: (expireAfterSeconds is a top level propery)
{
"v" : 1,
"key" : {
"columnName" : 1
},
"expireAfterSeconds" : 172800,
"name" : "columnName_1_expireAfterSeconds_172800",
"ns" : "dbName.collectionName"
}

Related

MongoDB TTL/ExpireAfterSeconds is misbehaving and not deleting all data after given time

1) We have put expireAfterSeconds=15 on column of type: date
[
{
"v" : 1,
"key" : {
"_ts" : -1
},
"name" : "AnjaliIndex",
"ns" : "test.sessions",
"expireAfterSeconds" : 15
}
]
It is working fine on yesterdays date but is not working fine on todays date i.e it is removing data when i change document date from current date to yesterdays date where it should delete all data. (Current date which i given is even not future time but previous time)
Why is this happening? Is there any particular cycle or time when mongodb engine collect documents for expiry?
(I have seen related question but in that question use case is different where he was giving future date)
Mongo DB Version: 3.2.22
Sample Document:(not gettinkg deleted)
{
"_id" : ObjectId("5dde452818c87122389bbc09"),
"authorization" : "a0ce0b43-194d-4402-99cb-b660b3365757",
"userNumber" : "gourav#gmail.com",
"_ts" : ISODate("2019-11-27T13:43:04.776Z")
}
I will try to answer and see if that can help you.
db.my_collection.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
After that, every document that you insert in this collection must have the "createdAt" with the current date:
db.myCollection.insert( {
"createdAt": new Date(), // This can be set in UTC
"dataExample": 2,
"Message": "#### My data ####"
} )

Remove redundant data from sensors by using date and value

I'm developing an application that collects data from sensors and I need to reduce the amount of data that is stored in a mongodb database by using a value (temperature) and a date (timestamp).
The document have the following format:
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:37:50.370Z")
sensorCode:"SENSOR_A1"
}
The problem is that sensors sent data too much frequently so there are too many documents with redudant data in a short period of time (let's say 10 minutes). I meant it is not useful to have multiple equal values in a very short period of time.
Example: here there are data from a sensor that is reporting temperature is 10
// collection: datasensors
[
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:37:50.370Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:50.555Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:51.654Z")
sensorCode:"SENSOR_A1"
}
,
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:50:20.335Z")
sensorCode:"SENSOR_A1"
}
]
Because a minute precission is not required, I would like to remove all documents from 2016-04-29T14:37:50.370Z to 2016-04-29T14:38:51.32Z except one. So the result should be this:
[
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:38:51.654Z")
sensorCode:"SENSOR_A1"
},
{
temperature: 10,
timestamp: ISODate("2016-04-29T14:50:20.335Z")
sensorCode:"SENSOR_A1"
}
]
The remove operation I want to perform should "reduce" equal temperatures in time ranges less than 10 minutes to one value.
Is there any technique to achieve this?
I simplified my solution and decided to keep every unique measurement received in 10 minutes time window.
Mongo 3.2 is required for that
adding a time mark will separate measurements in 10 minutes time groups
Then we are preserving first record in group and storing all ids for futher process
Then removing id of document we want to keep from an array of all ids (let say documents to delete)
Finally as forEach loop we are deleting not needed ids - this line is commented :-)
Copy code below to mongo console, execute and verify ids to delete, then un-comment and GO!
var addTimeMark = {
$project : {
_id : 1,
temperature : 1,
timestamp : 1,
sensorCode : 1,
yearMonthDay : {
$substr : [{
$dateToString : {
format : "%Y%m%d%H%M",
date : "$timestamp"
}
}, 0, 11]
}
}
}
var getFirstRecordInGroup = {
// take only first record froum group
$group : {
_id : {
timeMark : "$yearMonthDay",
sensorCode : "$sensorCode",
temperature : "$temperature"
},
id : {
$first : "$_id"
},
allIds : {
$push : "$_id"
},
timestamp : {
$first : "$timestamp"
},
totalEntries : {
$sum : 1
}
}
}
var removeFirstIdFromAllIds = {
$project : {
_id : 1,
id : 1,
timestamp : 1,
totalEntries : 1,
allIds : {
$filter : {
input : "$allIds",
as : "item",
cond : {
$ne : ["$$item", "$id"]
}
}
}
}
}
db.sensor.aggregate([
addTimeMark,
getFirstRecordInGroup,
removeFirstIdFromAllIds,
]).forEach(function (entry) {
printjson(entry.allIds);
// db.sensor.deleteMany({_id:{$in:entry.allIds}})
})
below document outlook after each step:
{
"_id" : ObjectId("574b5d8e0ac96f88db507209"),
"temperature" : 10,
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"sensorCode" : "SENSOR_A1",
"yearMonthDay" : "20160429143"
}
2:
{
"_id" : {
"timeMark" : "20160429143",
"sensorCode" : "SENSOR_A1",
"temperature" : 10
},
"id" : ObjectId("574b5d8e0ac96f88db507209"),
"allIds" : [
ObjectId("574b5d8e0ac96f88db507209"),
ObjectId("574b5d8e0ac96f88db50720a"),
ObjectId("574b5d8e0ac96f88db50720b")
],
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"totalEntries" : 3
}
and last;
{
"_id" : {
"timeMark" : "20160429143",
"sensorCode" : "SENSOR_A1",
"temperature" : 10
},
"id" : ObjectId("574b5d8e0ac96f88db507209"),
"allIds" : [
ObjectId("574b5d8e0ac96f88db50720a"),
ObjectId("574b5d8e0ac96f88db50720b")
],
"timestamp" : ISODate("2016-04-29T14:37:50.370Z"),
"totalEntries" : 3
}

mongodb TTL not working

I had executed this command to set a TTL Index on mongodb,
db.sessions.ensureIndex({'expiration':1},{"expireAfterSeconds" : 30})
but after 4 days,I found these documents were not removed.
I had confirmed command and document's field was correct.
I don't know how to fix it.
after executed db.serverStatus(), I got
localTime is 2015-01-16 11:03:05.554+08:00
and the following is some info of my collection
db.sessions.getIndexes()
{
"0" : {
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "meta.sessions"
},
"1" : {
"v" : 1,
"key" : {
"expiration" : 1
},
"name" : "expiration_1",
"ns" : "meta.sessions",
**"expireAfterSeconds" : 30**
}
}
db.sessions.find()
/* 0 */
{
"_id" : ObjectId("54b4c2e0f840238ca1436788"),
"data" : ...,
"expiration" : **ISODate("2015-01-13T16:02:33.947+08:00"),**
"sid" : "..."
}
/* 1 */
{
"_id" : ObjectId("54b4c333f840238ca1436789"),
"data" : ...,
"expiration" : ISODate("2015-01-13T16:06:56.942+08:00"),
"sid" : ".."
}
/* ... */
To expire data from a collection (Tested in version 3.2) you must create indexes:
db.my_collection.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
After that, every document that you insert in this collection must have the "createdAt" with the current date:
db.my_collection.insert( {
"createdAt": new Date(),
"dataExample": 2,
"Message": "Success!"
} )
The document will be removed when the date createdAt value + expireAfterSeconds value arrive.
Note: This background task in MongoDB, by default, happens once every 60 seconds.
When you create TTL index in the foreground (like you did), MongoDB begins removing expired documents as soon as the index finishes building. Best to tail -f mongod.log during index creation to track the progress. You may wish to remove & recreate index if something went wrong.
If index was created in the background, the TTL thread can begin deleting documents while the index is building.
TTL thread that removes expired documents runs every 60 seconds.
If you created index on the replica that was taken out of the replica set and is running in standalone mode index WILL be created but documents will NOT be removed until you rejoin (or remove replica set) configuration. If this is the case you may get something similar to this in the mongod.log
** WARNING: mongod started without --replSet yet 1 documents are
** present in local.system.replset
** Restart with --replSet unless you are doing maintenance and no other
** clients are connected.
** The TTL collection monitor will not start because of this.
** For more info see http://dochub.mongodb.org/core/ttlcollections

Find all documents within last n days

My daily collection has documents like:
..
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "ED", "san" : 7046.25, "izm" : 1243.96 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "UA", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "PAL", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-03T00:00:00Z"), "vid" : "CTA_TR", "san" : 0, "izm" : 0 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "CAD", "san" : 0, "izm" : 169.9 }
{ "date" : ISODate("2013-01-04T00:00:00Z"), "vid" : "INT", "san" : 0, "izm" : 169.9 }
...
I left off _id field to spare the space here.
My task is to "fetch all documents within last 15 days". As you can see I need somehow to:
Get 15 unique dates. The newest one should be taken as the newest document in collection (what I mean that it isn't necessary the today's date, it's just the latest one in collection based on date field), and the oldest.. well, maybe it's not necessary to strictly define the oldest day in query, what I need is some kind of top15 starting from the newest day, if you know what I mean. Like 15 unique days.
db.daily.find() all documents, that have date field in that range of 15 days.
In the result, I should see all documents within 15 days starting from the newest in collection.
I just tested the following query against your data sample and it worked perfectly:
db.datecol.find(
{
"date":
{
$gte: new Date((new Date().getTime() - (15 * 24 * 60 * 60 * 1000)))
}
}
).sort({ "date": -1 })
Starting in Mongo 5, it's a nice use case for the $dateSubtract operator:
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-11-28") } <= older than 5 days
db.collection.aggregate([
{ $match: {
$expr: {
$gt: [
"$date",
{ $dateSubtract: { startDate: "$$NOW", unit: "day", amount: 5 } }
]
}
}}
])
// { date: ISODate("2021-12-05") }
// { date: ISODate("2021-12-02") }
// { date: ISODate("2021-12-02") }
With $dateSubtract, we create the oldest date after which we keep documents, by subtracting 5 (amount) "days" (unit) out of the current date $$NOW (startDate).
And you can obviously add a $sort stage to sort documents by date.
You need to run the distinct command to get all the unique dates. Below is the example. The "values" array has all the unique dates of the collection from which you need to retrieve the most recent 15 days on the client side
db.runCommand ( { distinct: 'datecol', key: 'date' } )
{
"values" : [
ISODate("2013-01-03T00:00:00Z"),
ISODate("2013-01-04T00:00:00Z")
],
"stats" : {
"n" : 2,
"nscanned" : 2,
"nscannedObjects" : 2,
"timems" : 0,
"cursor" : "BasicCursor"
},
"ok" : 1
}
You then use the $in operator with the most recent 15 dates from step 1. Below is an example that finds all documents that belong to one of the mentioned two dates.
db.datecol.find({
"date":{
"$in":[
new ISODate("2013-01-03T00:00:00Z"),
new ISODate("2013-01-04T00:00:00Z")
]
}
})

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.