`mongo` Query To Purge Old Entries - mongodb

Anyone have a handy mongo command to remove all entries from a DB that are older than X date/ X days?
Basically have a dev and production DB, I'm looking to prune the dev DB out a bit to limit size.
Thanks for the help!

You could do something like this from the mongo shell.
var older=Date.parse("2013-03-01"),collection=db.so,all=collection.find();
all.forEach(function(doc) { var ts = doc._id.getTimestamp();
if (ts < older) { collection.remove(doc); } });
The above line (which you'd paste into the shell) will delete all documents in the specified collection (collection=db.so) created before the first of March, 2013. It relies on the fact that each ObjectId has an embedded timestamp (based on the timestamp of document creation (docs)), which can be retrieved and used.
You could of course change the query to look for a specific timestamp field in a document.
if (doc.timestampField < older) { collection.remove(doc); } })

You can use the mongo concept of deleting data after some specified amount of time, Expire Data from Collections by Setting TTL.
Please refer below link to do so
http://docs.mongodb.org/manual/tutorial/expire-data/

Related

Most recently added collection in a Mongo DB

Within a Mongo Database, how do i get the most recently added/updated collection?
> show collections;
> collection_1
> collection_2
> collection_3
> ...........
> ...........
> collection_n
n can vary from 1 to 1000;
My application adds a new collection or might update an existing collection. How do retreive the last updated or a newly added collection within a database?
Methods that i looked in the internet,applies to within a collection,
for example the answers shown here Get the latest record from mongodb collection
there is no built in solution to it, you need implement it.
you can implement this solution diagrammaticall.
Ie.: log last operation timestamp for each collection, you can store in new
collection;
Collection's metadata doesn't have any created date.
So, A collection does not "know" when it was created.
In short it can not be possible!
by using updated column date and time you can get the result
db.collectionname.find().sort( { UpdatedOn: -1 } )

Remove document if timestamp is too old

I know there is possible to remove some documents when the saved date is passed away with this command:
db.course.deleteMany({date: {"$lt": ISODate()}})
I trying to do the same thing but I'm trying to check is a timestamps is passed away
All saved timestamps are like this one
1492466400000
Is it possible to make a command with a condition to delete all documents with a too old timestamp?
EDIT
I use milliseconds timestamps
In MongoDB you 3.2 you can also use TTL Indexes where you can say to MongoDB "please remove all the Documents if the {fieldDate} is more older then 3600 seconds". This is pretty useful for Logging (remove all logs more older then 3 months).
Maybe is not your use case, but I think is pretty good to know.
TL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time. Data expiration is useful for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.
db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )
https://docs.mongodb.com/v3.2/core/index-ttl/
I find the way to do this
db.course.deleteMany({date: {"$lt": ISODate()*1}})
ISODate()*1 condition will convert the Date into a timestamp
and
ISODate(timestamp) convert the timestamps into a Date.

MongoDB: verifying document was not updated

We would like to apply some auditing in our current project. For that we created a scenario but I don't see how to make point 1 and 2 atomic.
Scenario
Every document has to have a timestamp that will server
as a version. When saving a document we will:
Verify document was not changed - first compare the timestamps of the latest document docLatest
with the document we would like to store docUpdated. The timestamps must be equal.
If not, save request is refused.
If ok, go to next point.
Update the document
Create diff with the previous doc - The latest document must be our last
document. We will create a diff and store it.
I stumble upon this idea once. My idea will utilize long_polling technique. I am not going to tell you how to architect your data, but you can convert date to numeric value, and compare by it.
for 1 and 2, you can convert Date-format to number, the schema will look something like below:
var document= {
updatedAt: { type: Number, default: Date.parse(new Date()) }
}
then for every document submitted by client, just check, if the
if(latestDocument.updatedAt - prevDocument.updatedAt > 0) {
//if latest's timestamp is bigger than prev, then store it in mongodb
} else {
//if latest document is the same or even older, just ignore this document
}
for number 3. I found that, if the document has changed, do you even need to diff it? I decide to follow react/flux's method, if the document has changed just replaced the whole document.

How to find last update/insert/delete operation time on mongodb collection without objectid field

I have some unused collections in the MongoDb database. I've to find out when the CRUD operations done against collections in the database. We have our own _id field instead of mongo's default object_id. We dont have any time filed in the collections to find out the modification time. is there any way to find out the modification time of collections in mongodb from meta data? Is there any data dictionay informations like in oracle to find out this? please give some idea/workarounds
To make a long story short: MongoDB has a flexible schema. Simply add a date field. Since older entries don't have it, they can not be the last entry.
Let's call that field mtime.
So after adding a date field to your schema definition, we generate an index in descending order on the new field:
db.yourCollction.createIndex({mtime:-1})
Finding the last mtime for a collection now is easy:
db.yourCollection.find({"mtime":{"$exists":true}}).sort({"mtime":-1}).limit(1)
Do this for every collection. When the above query does not return a value within the timeframe you defined for purging a collection, simply drop it, since it has not been modified since you introduced the mtime field.
After your collections are cleaned up, you may remove the mtime field from your schema definition. To remove it from the documents, you can run a simple query:
db.yourCollection.update(
{ "mtime":{ $exists:true} },
{ "$unset":{ "mtime":""} },
{ multi: true}
)
There is no "data dictionary" to get this information in MongoDB.
If you've enabled the profiling level in advance to log all operations (db.setProfilingLevel(2)) and you haven't had many operations to log, so that the system.profile capped collection hasn't overwritten whatever logs you are interested in, you can get the information you need there—but otherwise it's gone.

MongoDB: range queries on insertion time with _id and ObjectID

I am trying to use mongodb's ObjectID to do a range query on the insertion time of a given collection. I can't really find any documentation that this is possible, except for this blog entry: http://mongotips.com/b/a-few-objectid-tricks/ .
I want to fetch all documents created after a given timestamp. Using the nodejs driver, this is what I have:
var timeId = ObjectId.createFromTime(timestamp);
var query = {
localUser: userId,
_id: {$gte: timeId}
};
var cursor = collection.find(query).sort({_id: 1});
I always get the same amount of records (19 in a collection of 27), independent of the timestamp. I noticed that createFromTime only fills the bytes in the objectid related to time, the other ones are left at 0 (like this: 4f6198be0000000000000000).
The reason that I try to use an ObjectID for this, is that I need the timestamp when inserting the document on the mongodb server, not when passing the document to the mongodb driver in node.
Anyone knows how to make this work, or has another idea how to generate and query insertion times that were generated on the mongodb server?
Not sure about nodejs driver in ruby, you can simply apply range queries like this.
jan_id = BSON::ObjectId.from_time(Time.utc(2012, 1, 1))
feb_id = BSON::ObjectId.from_time(Time.utc(2012, 2, 1))
#users.find({'_id' => {'$gte' => jan_id, '$lt' => feb_id}})
make sure
var timeId = ObjectId.createFromTime(timestamp) is creating an ObjectId.
Also try query without localuser