Getting the physical location of extents of Mongodb indexes - mongodb

I've gone through the Mongodb docs trying to find a way to get the disk(physical) location of mongodb extents.
Well presentation1 and presentation2 helped me to clear my fundamental basics of storage internals I got an idea of virtual mapping of address, but there was not much about how virtual to physical mapping is done in these presentations.
The diskLoc() method provides us the details for index file & offset. But there is no data about the extent number or its physical location.
db.system.indexes.find().showDiskLoc()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "id", "ns" : "mydb1.abc", "$diskLoc" : { "file" : 0, "offset" : 28848 } }
{ "v" : 1, "key" : { "x" : 1 }, "name" : "x_1", "ns" : "mydb1.abc", "$diskLoc" : { "file" : 0, "offset" : 28976 } }
{ "v" : 1, "key" : { "x" : 14 }, "name" : "x_14", "ns" : "mydb1.abc", "$diskLoc" : { "file" : 0, "offset" : 29104 } }
Can anyone help me to get the physical address of each index extent of file?

Related

Slow query when sorting and filtering

I'm using mongodb version 3.6.5. I would like to do a query on a collection, and then sorting it based on date. I work on a (what I think) is a pretty large dataset, currently 195064301 data in this collection, and it's growing.
Doing the filter or the sort in separated query work perfectly
db.getCollection('logs').find({session: ObjectId("5af3baa173faa851f8b0090c")})
db.getCollection('logs').find({}).sort({date: 1})
The result is returned is less than 1 sec, but if I try to do it in a single query like so
db.getCollection('logs').find({session: ObjectId("5af3baa173faa851f8b0090c")}).sort({date: 1})
Now it take about 5minutes to return the data. I was thinking it was a index problem, but as far as I can tell, the index seems fine
> db.logs.getIndexes();
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "client1.logs"
},
{
"v" : 2,
"key" : {
"session" : 1
},
"name" : "session_1",
"ns" : "client1.logs",
"background" : true
},
{
"v" : 2,
"key" : {
"date" : 1
},
"name" : "date_1",
"ns" : "client1.logs",
"background" : true
},
{
"v" : 2,
"key" : {
"user" : 1
},
"name" : "user_1",
"ns" : "client1.logs",
"background" : true
}
]
I'm still new to mongo, I try thoses request directly in the console, I also tried to use the reIndex() method, but nothing really help.
So I'm hoping there is a solution on this.
Thanks.

MongoDB PartialFilterExpression filter issue

I have a collection of items where a document looks something like this:
{
"source" : "rest",
"serviceCode" : "fluff",
"fluff" : "puff",
"systemEntryTime" : ISODate("2018-05-16T09:04:00.585Z")
}
I have an index with a TTL option for two weeks :
{
"v" : 1,
"key" : {
"systemEntryTime" : 1
},
"name" : "systemEntryTime_1",
"ns" : "storage.item",
"expireAfterSeconds" : NumberLong(1209600)
}
Now I want certain documents where source = "ftp" to have a different TTL. For this purpose I created the following index with a partialFilterExpression:
{
"v" : 1,
"key" : {
"systemEntryTime" : 1,
"source" : 1
},
"name" : "systemEntryTime_1_source_1",
"ns" : "storage.item",
"expireAfterSeconds" : NumberLong(1),
"partialFilterExpression" : {
"source" : {
"$eq" : "ftp"
}
}
}
Unfortunately this is not working, what am I doing wrong here? I have experimented with dropping the old index and using only this, but no documents a dropped according to the TTL (or any documents at all for that matter).

mongodb insert really slow

i use mongodb to manage device log datas. Right now, it has over one million documents. the document contains more than 30 fields which combine with embed fields. Now, it's really slow when i insert new documents. The insert cost more than 1000ms. From the slow query ops, i get the logs like this:
{
"op" : "insert",
"ns" : "xxx.LogDeviceReport",
"query" : {
"_id" : ObjectId("xxxx"),
"deviceId" : ObjectId("xxxx"),
"en" : "xxxxxx",
"status" : 1,
'other fields, more than 30 fields...'
...
...
},
"ninserted" : 1,
"keyUpdates" : 0,
"writeConflicts" : 0,
"numYield" : 0,
"locks" : {
"Global" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"w" : NumberLong(3)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(2)
}
},
"Collection" : {
"acquireCount" : {
"W" : NumberLong(1)
},
"acquireWaitCount" : {
"W" : NumberLong(1)
},
"timeAcquiringMicros" : {
"W" : NumberLong(1477481)
}
},
"oplog" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
},
"millis" : 977,
"execStats" : {
},
"ts" : ISODate("2016-08-02T22:01:01.270Z"),
"client" : "xxx.xxx.xxxx",
"allUsers" : [
{
"user" : "xxx",
"db" : "xxx"
}
],
"user" : "xx#xx"
}
I checked the index, like this:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "xxx.LogDeviceReport"
},
{
"v" : 1,
"key" : {
"time" : 1
},
"name" : "time_1",
"ns" : "xxx.LogDeviceReport",
"expireAfterSeconds" : 604800,
"background" : true
}
]
Only an _id index and a ttl index by time, no any other indexes.
I guess the 'query' slow the operate. In mongodb doc, it tells that only the _id will be checked the unique, but in the logs, all fields in the 'query', does it matter?
if not this reason, what makes it so slow? Can any one help me ?
If you are using mongodb 3+ you can consider using WiredTiger as storage engine than MMAPV1 which is being used in your case.
I have personally saw a 4x improvement when I have inserted up to 156000 documents in a single go.
MMAPV1 took around 40 min and when I switched to WiredTiger same task was completed in 10 min.
Please check this link from MongoDB blog for more information
Note :: This is only from MongoDB 3.0 +

MongoDB index not covering trivial query

I am working on a MongoDB application, and am having trouble with covered queries. After reworking many of my queries to perform better when paginating I found that my previously covered queries were no longer being covered by the index. I tried to distill the working set down as far as possible to isolate the issue but I'm still confused.
First, on a fresh (empty) collection, I inserted the following documents:
devdb> db.test.find()
{ "_id" : ObjectId("53157aa0dd2cab043ab92c14"), "metadata" : { "created_by" : "bcheng" } }
{ "_id" : ObjectId("53157aa6dd2cab043ab92c15"), "metadata" : { "created_by" : "albert" } }
{ "_id" : ObjectId("53157aaadd2cab043ab92c16"), "metadata" : { "created_by" : "zzzzzz" } }
{ "_id" : ObjectId("53157aaedd2cab043ab92c17"), "metadata" : { "created_by" : "thomas" } }
{ "_id" : ObjectId("53157ab9dd2cab043ab92c18"), "metadata" : { "created_by" : "bbbbbb" } }
Then, I created an index for the 'metadata.created_by' field:
devdb> db.test.getIndices()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "devdb.test",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"metadata.created_by" : 1
},
"ns" : "devdb.test",
"name" : "metadata.created_by_1"
}
]
Now, I tried to lookup a document by the field:
devdb> db.test.find({'metadata.created_by':'bcheng'},{'_id':0,'metadata.created_by':1}).sort({'metadata.created_by':1}).explain()
{
"cursor" : "BtreeCursor metadata.created_by_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"metadata.created_by" : [
[
"bcheng",
"bcheng"
]
]
},
"server" : "localhost:27017"
}
The correct index is being used and no extraneous documents are being scanned. Regardless of the presence of .hint(), limit(), or sort(), indexOnly remains false.
Digging through the documentation, I've seen that covered indices will fail to cover queries on array elements, but that isn't the case here (and isMultiKey shows false).
What am I missing? Are there other reasons for this behavior (eg. insuffient RAM, disk space, etc.)? And if so, how can I best diagnose these issues in the future?
It is currently not supported. See this Jira issue.

can't shard a collection on mongodb

I have a db ("mydb") on mongo that contains 2 collections (c1 and c2). c1 is already hash sharded. I want to shard a second collection the same way. I get the following error :
use mydb
sh.shardCollection("mydb.c2", {"LOG_DATE": "hashed"})
{
"proposedKey" : {
"LOG_DATE" : "hashed"
},
"curIndexes" : [
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mydb.c1",
"name" : "_id_"
}
],
"ok" : 0,
"errmsg" : "please create an index that starts with the shard key before sharding."
So I did
db.c2.ensureIndex({LOG_DATE: 1})
sh.shardCollection("mydb.c2", {"LOG_DATE": "hashed"})
Same error but it shows the new index.
"proposedKey" : {
"LOG_DATE" : "hashed"
},
"curIndexes" : [
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mydb.c2",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"LOG_DATE" : 1
},
"ns" : "mydb.c2",
"name" : "LOG_DATE_1"
}
],
"ok" : 0,
"errmsg" : "please create an index that starts with the shard key before sharding."
Just to be sure, I run :
db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "mydb.c1", "name" : "_id_" }
{ "v" : 1, "key" : { "timestamp" : "hashed" }, "ns" : "mydb.c1", "name" : "timestamp_hashed" }
{ "v" : 1, "key" : { "_id" : 1 }, "ns": "mydb.c2", "name" : "_id_" }
{ "v" : 1, "key" : { "LOG_DATE" : 1 }, "ns" : "mydb.c2", "name" : "LOG_DATE_1" }
I try again the same commands on admin and it fails with the same error.
Then I tried on admin without "hashed" and it worked.
db.runCommand({shardCollection: "mydb.c2", key: {"LOG_DATE": 1}})
Problem : now my collection is sharded on something that is not hashed and I can't change it (error : "already sharded")
What was wrong with what I did ?
How can I fix this ?
Thanks in advance
Thomas
The problem initially was that you did not have a hashed index what you proposed to use for sharding this is the error message is about. After the first error message, when you created an index which is
{
"v" : 1,
"key" : {
"LOG_DATE" : 1
},
"ns" : "mydb.c2",
"name" : "LOG_DATE_1"
}
You still just have an ordinary index which is not a hashed one. If you would do this :
db.c2.ensureIndex({LOG_DATE: "hashed"})
Instead of this :
db.c2.ensureIndex({LOG_DATE: 1})
Than would be a hashed index. As you can see in the output of the db.system.indexes.find() on the other collection you have a hashed index for the timestamp i assume this is the shard key for that collection.
So if you have no data in the c2 collection:
db.c2.drop()
db.createCollection('c2')
db.c2.ensureIndex({LOG_DATE: "hashed"})
sh.shardCollection("mydb.c2", {"LOG_DATE": "hashed"})
This will work properly.