Mongo TTL not removing documents - mongodb

I'm toying with auto-expiring documents from a collection. The java application creates an index per the Mongo TTL docs.
coll.createIndex(new Document("Expires", 1).append("expireAfterSeconds", 0));
When inserting my document, I set the Expires field to a future Date. For this testing I've been setting it 1 minute in the future.
I've verified the date exists properly, the index appears to be correct, and I've waited 10+ minutes (even though the ttl runner operates every sixty seconds) but the document remains.
{
"_id" : ObjectId("569847baf7794c44b8f2f17b"),
// my data
"Created" : ISODate("2016-01-15T02:02:30.116Z"),
"Expires" : ISODate("2016-01-15T02:03:30.922Z")
}
What else could I have missed? Here are the indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "prism.prismEventRecord"
},
{
"v" : 1,
"key" : {
"Location.X" : 1,
"Location.Z" : 1,
"Location.Y" : 1,
"Created" : -1
},
"name" : "Location.X_1_Location.Z_1_Location.Y_1_Created_-1",
"ns" : "prism.prismEventRecord"
},
{
"v" : 1,
"key" : {
"Created" : -1,
"EventName" : 1
},
"name" : "Created_-1_EventName_1",
"ns" : "prism.prismEventRecord"
},
{
"v" : 1,
"key" : {
"Expires" : 1,
"expireAfterSeconds" : 0
},
"name" : "Expires_1_expireAfterSeconds_0",
"ns" : "prism.prismEventRecord"
}
]

I wonder if it makes sense to take the java mongo client out of the pic for a minute.
I have created a similar collection, and made the following call in the shell.
db.weblog.createIndex({"expireAt":1},{expireAfterSeconds:0})
When I do, and then I call db.weblog.getIndexes(), this is what the expiring index looks like:
{
"v" : 1,
"key" : {
"expireAt" : 1
},
"name" : "expireAt_1",
"ns" : "logs.weblog",
"expireAfterSeconds" : 0
}
I think your java call may be "appending" a new column to your index (not setting the property you were hoping to set). Take a look... your index def looks like this:
{
"v" : 1,
"key" : {
"Expires" : 1,
"expireAfterSeconds" : 0
},
"name" : "Expires_1_expireAfterSeconds_0",
"ns" : "prism.prismEventRecord"
}
See what I mean? "expireAfterSeconds is a key, not a property. Now -- how do you do THAT with the java shell? Ummm ... don't yell at me, but Im a c# guy ... I found a post or two that punt on the question of ttl indexes from the java client, but they're old-ish.
Maybe the java client has gotten better and now supports options? Hopefully, knowing what the problem is gives a guy with your stellar coding skills enough to take it from here ;-)
EDIT: Stack java driver code (untested):
IndexOptions options = new IndexOptions()
.name('whocareswhatwecallthisindex')
.expireAfter(1, TimeUnit.DAYS);
coll.createIndex(new Document("Expires", 1), options);
EDIT2: C# driver code to create the same index:
var optionsIdx = new CreateIndexOptions() { ExpireAfter = new TimeSpan(0)};
await coll.Indexes.CreateOneAsync(Builders<MyObject>.IndexKeys.Ascending("expiresAt"), optionsIdx);

In case you run into this question as a Golang user, you have two choices:
1 Use structures: This works when you know the payload structure, and is documented extensively
2 Introduce an actual date object into your JSON payload: Only use this if for some reason your payload structure absolutely can't be known ahead of time for some reason.
In my case, the source data comes from a system whose structure is a black-box. I first tried to introduce an ISO-compliant TTL index whose format matched Mongo's but it was still read as text. This led me to deduct that it was due to the driver not being instructed to format it properly.
I believe a deep dive into Mongo's Golang driver's specifics to manipulate this process could only give us a short lived solution (since the implementation details are subject to change). So instead, I suggest introducing a real date property into your payload and let the driver adapt it for Mongo (adapting the principle in this snippet to your own code structure)
err = json.Unmarshal([]byte (objectJsonString.String()), &objectJson)
if err == nil && objectJson != nil {
//Must introduce as native or they are considered text
objectJson["createdAt"] = time.Now()
//Your other code
insertResult, err := collection.InsertOne(context.TODO(), objectJson)
}
So basically, create your JSON or BSON object normally with the rest of the data, and then introduce your TTL index using real date values rather than attempt to have your JSON parser do that work for you.
I look forward to corrections, just please be civil and I'll make sure to update and improve this answer with any observations made.

Related

Why does MongoDB take so long to sort a result with only one object in it?

I am optimizing a fairly complicated query of the form:
db.foo.find({
"$or":[
{"bar1":ObjectID("123123"), baz:false},
{"bar2":ObjectID("123123"), baz:false}
],
"deleted":false}
).sort("modified_on")
And the sort appears to be killing my performance. I have indexes on the modified_date field like so:
{
"v" : 1,
"key" : {
"modified_on" : 1
},
"ns" : "realtalk.customer_profile",
"name" : "modified_on_1"
},
{
"v" : 1,
"key" : {
"modified_on" : -1
},
"ns" : "realtalk.customer_profile",
"name" : "modified_on_-1"
}
These do not seem to speed up the query at all. What is really driving my crazy though is the size of my result. There is only ever a handful of objects returned by this query, often only one. How is it taking mongo so long to sort one thing, and how can I speed it up?

How can one detect "useless" indexes?

I have a MongoDB collection with a lot of indexes.
Would it bring any benefits to delete indexes that are barely used?
Is there any way or tool which can tell me (in numbers) how often a index is used?
EDIT: I'm using version 2.6.4
EDIT2: I'm now using version 3.0.3
Right, so this is how I would do it.
First you need a list of all your indexes for a certain collection (this will be done collection by collection). Let's say we are monitoring the user collection to see which indexes are useless.
So I run a db.user.getIndexes() and this results in a parsable output of JSON (you can run this via command() from the client side as well to integrate with a script).
So you now have a list of your indexes. It is merely a case of understanding which queries use which indexes. If that index is not hit at all you know it is useless.
Now, you need to run every query with explain() from that output you can judge which index is used and match it to and index gotten from getIndexes().
So here is a sample output:
> db.user.find({religion:1}).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "meetapp.user",
"indexFilterSet" : false,
"parsedQuery" : {
"religion" : {
"$eq" : 1
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"religion" : NumberLong(1)
},
"indexName" : "religion_1",
"isMultiKey" : false,
"direction" : "forward",
"indexBounds" : {
"religion" : [
"[1.0, 1.0]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "ip-172-30-0-35",
"port" : 27017,
"version" : "3.0.0",
"gitVersion" : "a841fd6394365954886924a35076691b4d149168"
},
"ok" : 1
}
There are a set of rules that the queryPlanner field will use and you will need to discover and write for them but this first one is simple enough.
As you can see: the winning plan (in winningPlan) is a single (could be multiple remember, this stuff you will need to code around) IXSCAN (index scan) and the key pattern for the index used is:
"keyPattern" : {
"religion" : NumberLong(1)
},
Great, now we can match that the key output of getIndexes():
{
"v" : 1,
"key" : {
"religion" : NumberLong(1)
},
"name" : "religion_1",
"ns" : "meetapp.user"
},
to tells us that the religion index is not useless and is in fact used.
Unfortunately this is the best way I can see. It used to be that MongoDB had an index stat for number of times the index was hit but it seems that data has been removed.
So you would just rinse and repeat this process for every collection you have until you have removed the indexes that are useless.
One other way of doing this, of course, is to remove all indexes and then re-add indexes as you test your queries. Though that might be bad if you do need to do this in production.
On a side note: the best way to fix this problem is to not have it at all.
I make this easier for me by using a indexing function within my active record. Once every so often I run (from PHP) something of the sort: ./yii index/rebuild which essentially goes through my active record models and detects which indexes I no longer use and have removed from my app and removes them in turn. It will, of course, create new indexes.

MongoDB Number Field will not insert or update with the number that I input

this is my problem:
db.Group6391102Bounds.insert({ bound:"latest",id:138548488276343678,complete:false})
db.Group6391102Bounds.find()
{ "_id" : ObjectId("5297d9e5ef9f659b82271617"), "bound" : "earliest", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc28b2d702ea878b540d"), "bound" : "latest", "id" : 138548488276343680, "complete" : false }
db.Group6391102Bounds.insert({ bound:"middle",id:138548488276343678,complete:false})
db.Group6391102Bounds.find()
{ "_id" : ObjectId("5297d9e5ef9f659b82271617"), "bound" : "earliest", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc28b2d702ea878b540d"), "bound" : "latest", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc3cb2d702ea878b540e"), "bound" : "middle", "id" : 138548488276343680, "complete" : false }
db.Group6391102Bounds.insert({ bound:"middle",name:138548488276343678,complete:false})
db.Group6391102Bounds.find()
{ "_id" : ObjectId("5297d9e5ef9f659b82271617"), "bound" : "earliest", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc28b2d702ea878b540d"), "bound" : "latest", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc3cb2d702ea878b540e"), "bound" : "middle", "id" : 138548488276343680, "complete" : false }
{ "_id" : ObjectId("5297dc91b2d702ea878b540f"), "bound" : "middle", "name" : 138548488276343680, "complete" : false }
As you can see, even though I insert a specific ID, mongoDB will add a different ID.
I have no Idea why this is happening. Any help would be greatly appreciated. Happy Thanksgiving!
Sorry, I did not understand your question in the beginning and therefore provided wrong answer (thanks cababunga for pointing this out). So here is a correct one.
Mongoshell supports different data types. And it tries to guess your datatype when you enter it. So you enter your big number: 138548488276343678. Note that it is bigger then then 2^31-1 which is the maximum for 32-bit integer. So it treats it as a float and because floats are not stored precisely, it modifies it a little bit. This is why your stored number is almost the same, but differs by a little bit (this difference will be less then 8). But you want to store this number precise and mongo support 64-bit integer (which fits your integer).
So you need to specify that you want to store it as 64bit integer. You can do this in the following way:
db.a.insert({
bound:"latest",
id: NumberLong("138548488276343678"), // Note these "". I was not using them and the number was not stored correctly
complete:false
})
After this you can retrieve your document db.a.find() and it will be correct. Note that a lot of drivers have similar problems and therefore you have to explicitly tell that you are going to save them as 64bit integer.
And here is my wrong attempt. If you thing that it should not be here,
please modify my question:
If you are not specifying _id for the document you are creating, mongodb creates _id field by itselft. You can read a little bit more about _id here and in official documentation.
If you have your own field, which you would like to be used as _id, instead of writing id:138548488276343678 you should write _id : 138548488276343678.
P.S. also because I see that you are using quite big numbers, keep in mind that integers in mongodb as stored as 64-bit integers (which means that it is between -2^63 to 2^63 - 1)

indexing vs normalization when optimizing for read speed

Having an architecture discussion with a coworker and we need to find an answer for this. Given a set of millions of data points that look like:
data =
[{
"v" : 1.44,
"tags" : {
"account" : {
"v" : "1055",
"name" : "Circle K"
}
"region" : "IL-East"
}
}, {
"v" : 2.25,
"tags" : {
"account" : {
"v" : "1055",
"name" : "Circle K"
}
"region" : "IL-West"
}
}]
and that we need to query on the fields in the tags collection (e.g. where account.name == "Circle K"), would there be any speed benefit to normalizing the account field to this:
accounts =
[{
_id : 507f1f77bcf86cd799439011,
v: "1055",
name : "Circle K"
}]
data =
[{
"v" : 1.44,
"tags" : {
"account" : 507f1f77bcf86cd799439011
"region" : "IL-East"
}
}, {
"v" : 2.25,
"tags" : {
"account" : 507f1f77bcf86cd799439011
"region" : "IL-West"
}
}]
I suspect I'll have to build 2 db's for this and just see what the speed looks like. The question is, is mongo better at querying on BSON IDs vs. strings? The db in question will be about 1:10 write vs. read.
The most important thing here is to make sure that you have enough RAM for your working set. That includes the space for the "tags.account.name" index and the expected query result set.
As for the key size. You use ObjectID-as-string above, which you should not do. Leave the real ObjectIDs in as their size is quite a bit smaller. If you really have a lot of small documents, then you might even want to think about shorting your field names as well.

Subdocument index in mongo

What exactly happens when I call ensureIndex(data) when typical data looks like data:{name: "A",age:"B", job : "C"} ? Will it create a compound index over these three fields or will it create only one index applicable when anything from data is requested or something altogether different ?
You can do either :
> db.collection.ensureIndex({"data.name": 1,"data.age":1, "data.job" : 1})
> db.collection.ensureIndex({"data": 1})
This is discussed in the documentation under indexes-on-embedded-fields and indexes on sub documents
The important section of the sub document section is 'When performing equality matches on subdocuments, field order matters and the subdocuments must match exactly.'
This means that the 2 indexes are the same for simple queries .
However, as the sub-document example shows, you can get some interesting results (that you might not expect) if you just index the whole sub-document as opposed to a specific field and then do a comparison operator (like $gte) - if you index a specific sub field you get a less flexible, but potentially more useful index.
It really all depends on your use case.
Anyway, once you have created the index you can check what's created with :
> db.collection.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.collection",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"data.name" : 1,
"data.age" : 1,
"data.job" : 1
},
"ns" : "test.collection",
"name" : "data.name_1_data.age_1_data.job_1"
}
]
As you can see from the output it created a new key called data.name_1_data.age_1_data.job_1 (the _id_ index is always created).
If you want to test your new index then you can do :
> db.collection.insert({data:{name: "A",age:"B", job : "C"}})
> db.collection.insert({data:{name: "A1",age:"B", job : "C"}})
> db.collection.find({"data.name" : "A"}).explain()
{
"cursor" : "BtreeCursor data.name_1_data.age_1_data.job_1",
.... more stuff
The main thing is that you can see that your new index was used (BtreeCursor data.name_1_data.age_1_data.job_1 in the cursor field is what indicates this is the case). If you see "cursor" : "BasicCursor", then your index was not used.
For more detailed information look here.
you can try this :
db.collection.ensureIndex({"data.name": 1,"data.age":1, "data.job" : 1})