Which is the fastest way to remove MongoDB documents by Date? - mongodb

In our company we have a retention of eight days of data (with one million of records aprox.) so we have a cronjob that remove documents older than eight days each day. Now we're using the Published field and this field is not indexed.
It takes like 15 minutes to finish to get rid off 100.000 records and we found that this operation is too long.
This is the query where 'docs' is a variable with an array of documents that we don't want to remove. The 'theDate' variable is the date of eight days ago.
records.remove( { "Published" : { $lte : theDate }, "_id" : { $nin : docs } }
Would be it better to use the _id field, which is indexed, in ordered to do this operation? How can we use the _id field in order to do the same operation?

Discard the Cron job entirely: this is a job for TTL indexes. http://docs.mongodb.org/manual/core/index-ttl/
Create a TTL index on the Published field with expireAfterSeconds: 691200 and watch as your documents are automatically removed 8 days after publication.
And if you don't want to indiscriminately delete all documents 8 days after their publication, keep your Cron job and just create a plain index on the Published field.

Related

count for a collection based on its CURD operation for the last 2 days in mongodb?

I want to know the count for the last 2 days happened for a collection based on its CURD operations (UPDATE, DELETE and INSERT) in mongodb.
End result is i need to know how many documents got updated , inserted and deleted and its counts for the last 2 days.
In case I don't have creationDate or updationDate key in my collection then how to figure it out?
MongoDB doesn't store updated time. You have to store it yourself.
You can get created date-time using ObjectId
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp() // returns ISO date time
Suggestion: It is always a good practice to store createdDate and updateDate with each document in a collection or each row in a table

How to write a query in Mongo to remove records where DateTimeOffset is greater than 30 days

A capture from mongoDb with the structure of a Date time
You can set TTL indexes while creating your records.TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time.
To create a TTL index, use the db.collection.createIndex() method with the expireAfterSeconds option on a field whose value is either a date or an array that contains date values.
For example, to create a TTL index that removes the record after 30 days on the createdDate field of the User collection, use the following operation in the mongo shell:
db.User.createIndex( { "createdDate": 1 }, { expireAfterSeconds: 2592000 } )
src: https://docs.mongodb.com/manual/core/index-ttl/

MongoDB find without index is fast certain values of a field

I have a collection named "message" with 3.5 million documents. There are no indices on the collection except the _id index.
3.3 million of the documents have field "name" with value "peter".
200k of the documents have field "name" with value "john".
If I query db.message.find({name: "peter"}) it takes around 1 milliseconds.
If I query db.message.find({name: "john"}) it takes around 2 seconds.
My question is why is the one query really fast while the other slow?

I got a mongodb collection which items has create_time and delete_time field.I want a Aggregation

I want a Aggregation.(or does aggregation can solve my question?)
for example:
There is a timestamp, if one item's create_time less than the timestamp and the delete_time large than the timestamp, it count 1, loop all items, i can get the count in the timestamp, a series of count is what i needed.
This process too slow in my app.
Can mongodb aggregation help me? thanks
If i understand correctly, this is the query
db.collection.aggregate([{ $project:{cmp:{$and: [{$lt:["$create_time",ISODate("2016-04-24T13:10:09.518Z")]}, {$gt:["$delete_time",ISODate("2016-04-25T13:10:09.518Z")]}] } } },{$match:{cmp:true}},{$group:{_id:"$cmp",count:{$sum:1}}}])
This one will basically gives output as { "_id" : true, "count" : 2 }.
Where count is number of documents satisfying the condition you specified. Replace ISODate("xxxx") with your ISO Date.
And if you also want number of documents that don't satisfy your query remove the match object in the pipeline.

Efficient mongodb query to find the average time in a collection of 10K+ records?

Following is the one record of a collections named outputs.
db.outputs.findOne()
{
"_id" : ObjectId("4e4131e8c7908d3eb5000002"),
"company" : "West Edmonton Mall",
"country" : "Canada",
"created_at" : ISODate("2011-08-09T13:11:04Z"),
"started_at" : ISODate("2011-08-09T11:11:04Z"),
"end_at" : ISODate("2011-08-09T13:09:04Z")
}
The above is just a document. There are around 10K docs and it will keep increasing.
What I need is to find the average hours (taking started_at and end_at) for the past 1 week (taking created_at)?
Right now, youre going to need to query the documents you need to average, likely selecting only the fields you need (started_at and end_at) and do the calculation in your app code.
If you wait for the next major version of MongoDB, there will be a new aggregation framework that will allow you to build an aggregation pipeline for querying documents, selecting fields, and performing calculations on them, and finally returning the calculated value(s). its very cool.
https://www.mongodb.org/display/DOCS/Aggregation+Framework
You can maintain the sum and counts in a separate collection using $inc operator with a value of _id that represents a week. That way, you don't have to query all 10k records. You can just query the collection mantaining sum & count, and divide the sum by count to get the average.
I have explained this in detail in the following post:
http://samarthbhargava.wordpress.com/2012/02/01/real-time-analytics-with-mongodb/