MongoDb TTL index for non ISO date format - mongodb

Below is my sample Json message and it has Timestamp format YYYY-MM-DDThh:mmTZD (eg 2015-08-18T22:43:01-04:00)
Also I have a TTL index setup for 30 days but my data is not getting removed. I know that Mongodb uses ISODate("2015-09-03T14:21:30.177-04:00") kind format but is that absolutely necessary? What modification I can do in my index to get the TTL working.
We have millions of documents under multiple collections and we run of space every now and then.
JSON:
{
"_id" : ObjectId("55d3ed35817f4809e14e2"),
"AuditEnvelope" : {
"TrackingInformation" : {
"CorelationId" : "2703-4ce2-af68-47832462",
"Timestamp" : "2015-08-18T22:43:01-04:00",
"LogData" : {
"msgDetailJson" : "[Somedata here]"
}
}
}
}
Index
"1" : {
"v" : 1,
"key" : {
"AuditEnvelope.TrackingInformation.Timestamp" : 1
},
"name" : "TTL",
"ns" : "MyDB.MyColl",
"expireAfterSeconds" : 2592000
},
MongoDB version : 3.0.1

In order for the TTL clean-up process to work with a defined TTL index, the specified field must contain a Date BSON type, as is covered in the documentation for TTL indexes.
If the indexed field in a document is not a date or an array that holds a date value(s), the document will not expire.
You will need to convert such strings as BSON dates. This is also a wise thing to do as the internal storage of a BSON Date is a numeric timestamp value, and this takes up a lot less storage than a string does.
Tranformation requires an update to "cast" to a date object. As a "one off" operation this probably best done through the MongoDB shell and with the use of Bulk Operations to minimize the network overhead when writing back the data.
var bulk = db.MyColl.initializeOrderedBulkOp(),
count = 0;
db.MyColl.find({
"AuditEnvelope.TrackingInformation.Timestamp": { "$type": 2 }
}).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": {
"AuditEnvelope.TrackingInformation.Timestamp":
new Date(doc.AuditEnvelope.TrackingInformation.Timestamp)
}
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.MyColl.initializeOrderedBulkOp();
}
});
if ( count % 1000 != 0 )
bulk.execute();
Also not the BSON $type operation there is designed to match "strings", so even if you began a conversion or changed some code to start producing BSON date objects in the field then the query only picks up the "string" values for conversion.
Ideally you should drop the indexes already on the "Timestamp" fields and then re-create them after the update. This removes the overhead of writing to the index with the updated information. You can also set a foreground index build on the new index creation and this will also save some space in what the index itself consumes.

Related

Mongodb Sorting returns null instead of data

My Mongodb dataset is like this
{
"_id" : ObjectId("5a27cc4783800a0b284c7f62"),
"action" : "1",
"silent" : "0",
"createdate" : ISODate("2017-12-06T10:53:59.664Z"),
"__v" : 0
}
Now I have to find the data whose Action value is 1 and silent value is 0. one more thing is that all the data returns is descending Order.
My Mongodb Query is
db.collection.find({'action': 1, 'silent': 0}).sort({createdate: -1}).exec(function(err, post) {
console.log(post.length);
});
Earlier It works Fine for me But Now I have 121000 entry on this Collection. Now it returns null.
I know there is some confusion on .sort()
If i remove the sort Query then everything is fine. Example
db.collection.find({'action': 1, 'silent': 0}).exec(function(err, post) {
console.log(post.length);// Now it returns data but not on Descending order
});
MongoDB limits the amount of data it will attempt to sort without an index .
This is because Mongo has to sort the data in memory or on disk, both of which can be expensive operations, particularly for queries run frequently.
In most cases, this can be alleviated by creating indexes on the fields you sort on.
you can create index with :-
db.myColl.createIndex( { createdate: 1 })
thanks !

Sort before querying

Is it possible to run a sort on a Mongo collection before running the filtering query? I have older code in which I was using a method of getting a random result from the database by having a field which was a random float between 0 and 1, then querying with findOne to get the first document with a value greater than a random float generated at that time. The sample set was small, so didn't notice a problem at the time, but recently noticed that with one query, I was almost always getting the same value. The "first" document had a random > .9, so nearly every query matched it first.
I realized, for this solution to work, I need to sort by random, then find the first value greater than my random. As I understand it, this isn't as necessary a solution as in the past, as $sample exists as of 3.2, but I figure learning how I could do this would be good? Plus, my understanding is that $sample can return the same document multiple times (where N > 1 obviously, so not directly applicable to my question).
So for example, the following data:
> db.links.find()
{ "_id" : ObjectId("553c072bc87652a80e00002a"), "random" : 0.9162904409691691 }
{ "_id" : ObjectId("553c3332c87652c80700002a"), "random" : 0.00427396921440959 }
{ "_id" : ObjectId("553c3c5cc87652a80e00002b"), "random" : 0.2409569111187011 }
{ "_id" : ObjectId("553c3c66c876521c10000029"), "random" : 0.35101076657883823 }
{ "_id" : ObjectId("553c3c6ec87652200700002e"), "random" : 0.3234482416883111 }
{ "_id" : ObjectId("553c68d5c87652a80e00002c"), "random" : 0.5221220930106938 }
Any attempt to run db.mycollection.findOne({'random': {'$gte': x}}) where x is any value up to .91 always return the first object (_id 553c072). Anything greater returns nothing. If I could sort by the random value in ascending order then filter, it would keep searching until it found the correct value.
I would strongly recommend you to drop your custom solution and simply switch to using the MongoDB built-in $sample stage which will return a random result from your collection.
EDIT based on your comment:
Here's how you can do what you originally asked for:
db.links.find({ "random": { $gte: /* put your value here */ } })
.sort({ "random": 1 /* sort by "random" field in ascending order */ })
.limit(1)
You can, but don't need to use the aggregation framework, too:
db.links.aggregate({
$match: {
"random": {
$gte: /* put your value here */ // filter the collection
}
}
}, {
$sort: {
"random": 1 // sort by "random" field in ascending order
}
}, {
$limit: 1 // return only the first element
})

MongoDB Performance with Upsert

We are trying to make a "real time" statistics part for our application,
and we want to use MongoDB.
So, to do this, I basically imagine a DB named storage. In this db, I create a statistics collection.
And I store my data like this :
{
"_id" : ObjectId("55642d270528055b171fedf5"),
"cat" : "module",
"name" : "Injector",
"ts_min" : ISODate("2015-05-22T13:16:00Z"),
"nb_action" : {
"0" : 156
},
"tps_action" : {
"0" : 45016
},
"min_tps" : 10,
"max_tps" : 879
}
So, I have a category, a name and a date to determine an unique Object. In this object, I store :
number of used per second (nb_action.[0..59])
Total time per second (tps_action.[0..59])
Min time
Max time
Now, to inject my data I use an Upsert method:
db.statistics.update({
ts_min: ISODate("2015-05-22T13:16:00.000Z"),
name: "Injector",
cat: "module"
},
{
$inc: {"nb_action.0":1, "tps_action.0":250},
$min: {min_tps:250},
$max: {max_tps:250}
},
{ upsert: true })
So, I perform 2 $inc to manage my counter and used $min and $max to manage my stats.
All of this works...
With 1 thread injecting 50.000 data on one single machine (no shard) (for 10 modules), I observe 3.000/3.500 ops per second.
And my problem is.... I can't say if it's good or not.
Any suggestions?
PS: I use long name field for the example and add a set part for initialize each second in case of insert

Overflow sort stage buffered data usage

We have a mongoDB 2.6.4 replica set running and are trying to diagnose this behavior. We are getting the Runner error: Overflow sort stage buffered data usage of 33598393 bytes exceeds internal limit of 33554432 bytes when we expect that we would not. The collection has millions of records and has a compound index that includes the key that is being sorted. As an example
index looks like this
{ from: 1, time : -1, otherA : 1, otherB : 1}
our find is
find.collection({ from : { $in : ["a", "b"] }, time : { $gte : timestamp },
otherA : {$in:[...]}, otherB : {$in:[...]}})
.sort( time : -1 )
mongoDB parallels (clauses) this query like this:
{ from : a }, { time : { $gte : timestamp }, ... }
{ from : b }, { time : { $gte : timestamp }, ... }
In the explain each stage reports that scanAndOrder : false, which implies that the index was used to return the results. This all seems fine, however the mongoDB client gets the Runner error: Overflow sort stage buffered data usage error. This seems to imply that the sort was done in memory. Is this because it is doing an in-memory merge sort of the clauses? Or is there some other reason that this error could occur?
I was also facing the same problem of memory overflow.
I am using PHP with MongoDB to manipulate Documents.
When I am accessing a collection which is probably having large set of documents it is throwing an error.
As per the following link, it can sort only upto 32MB data at a time.
http://docs.mongodb.org/manual/reference/limits/#Sorted-Documents .
So as per the description given in MongoDocs about sorting,
I sorted the array converted from MongoCursor object with PHP rather than Mongo's sort() method.
Hope it'll help you.
Thanks.

MongoDB: phantom records in $unwind results under heavy load

I have a simple collection of elements like this
{_id: n, xs: [...]}
I'm trying to count total number of elements in all arrays
db.testRace.aggregate([{ $unwind : "$xs" }, { $group : { _id : null, count : { $sum : 1 } } }])
And it works great unless I start to do massive updates of this collection. Under heavy load of update operations I get wrong total - slightly bigger than it should be.
It can be easily reproduced.
First generate some test data
for(var i = 1; i <= 1000000; i++) {
db.testRace.insert({_id: i, xs: [i]});
}
Then simulate a lot of updates
while(true) {
var id = Math.floor((Math.random() * 1000000) + 1);
var obj = db.testRace.find({_id: id}).next();
obj.some="change";
db.testRace.update({_id: id}, obj);
}
And while it is running do aggregate unwind query.
Without load I get right result - 1000000. But when there are a lot of updates I get bigger numbers, like 1001456.
And if I run query like this
db.testRace.aggregate([{ $unwind : "$xs" }, {$group: {_id:"$xs", count:{$sum: 1}}}, { $sort : { count : -1 } }, { $limit : 2 }]);
I get
"result" : [
{
"_id" : 996972,
"count" : 2
},
{
"_id" : 997789,
"count" : 2
}
],
So it seems aggregate count some records twice.
Is it expected behaviour or maybe I'm doing aggregation wrong?
I tested on local mongodb instance, version - 2.4.9
It's expected behavior due to the way MongoDB handles read isolation. When you have a long running query (and an aggregation that reads every single document is a long running query) with updates to that data during the query it may impact whether or no the updated data is returned in the query - depending on what happens when, you could miss a document, receive it or receive it twice.
From the source code:
Any data inserted, deleted, or modified during a yield that should be
returned by a query may or may not be returned by that query. The
query could return: nothing; the data before; the data after; or both
the data before and the data after.
In short, there is no isolation between a query and an
insert/delete/update. AKA, READ_UNCOMMITTED.
https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/plan_stage.h
Your aggregation query is yielding mid query, during which some of the data is updated. This impacts the results of the query.