When I call ensureIndex from the mongo shell on a collection for a compound index an _id field of type ObjectId is auto-generated in the index object.
> db.system.indexes.find();
{ "name" : "_id_", "ns" : "database.coll", "key" : { "_id" : 1 } }
{ "_id" : ObjectId("4ea78d66413e9b6a64c3e941"), "ns" : "database.coll", "key" : { "a.b" : 1, "a.c" : 1 }, "name" : "a.b_1_a.c_1" }
This makes intuitive sense as all documents in a collection need an _id field (even system.indexes, right?), but when I check the indexes generated by morphia's ensureIndex call for the same collection *there is no _id property*.
Looking at morphia's source code, it's clear that it's calling the same code that the shell uses, but for some reason (whether it's the fact that I'm creating a compound index or indexing an Embedded document or both) they produce different results. Can anyone explain this behavior to me?
Not exactly sure how you managed to get an _id field in the indexes collection but both shell and Morphia originated ensureIndex calls for compound indexes do not put an _id field in the index object :
> db.test.ensureIndex({'a.b':1, 'a.c':1})
> db.system.indexes.find({})
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.test", "name" : "_id_" }
{ "v" : 1, "key" : { "a.b" : 1, "a.c" : 1 }, "ns" : "test.test", "name" : "a.b_1_a.c_1" }
>
Upgrade to 2.x if you're running an older version to avoid running into now resolved issues. And judging from your output you are running 1.8 or earlier.
Related
I have a collection User in mongo. When I do a count on this collection I got 13204951 documents
> db.User.count()
13204951
But when I tried to find the count of non-stale documents like this I got a count of 13208778
> db.User.find({"_id": {$exists: true, $ne: null}}).count()
13208778
> db.User.find({"UserId": {$exists: true, $ne: null}}).count()
13208778
I even tried to get the count of this collection using MongoEngine
user_list = set(User.objects().values_list('UserId'))
len(resume_list)
13208778
Here are the indexes of this User collection
>db.User.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "user_db.User"
},
{
"v" : 1,
"unique" : true,
"key" : {
"UserId" : 1
},
"name" : "UserId_1",
"ns" : "user_db.User",
"sparse" : false,
"background" : true
}
]
Any pointers on how to debug the mismatch in counts from different queries.
refer to this document
On a sharded cluster, db.collection.count() can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
Also, refer to this question
If you are not using sharding cluster, you can refer to this question
The basic idea is db.{collection}.count() might do some tricks to make it fast to return a count, and it might be not accurate, use a count() with query should be accurate.
I have the following DB:
{
"_id" : ObjectId("556da79a77f9f7465943ff93"),
"guid" : "a12345",
"update_day" : "12:05:10 02.06.15"
}
{
"_id" : ObjectId("556dc4509a0a6a002f97e972"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
{
"_id" : ObjectId("556dc470e57836242f5519eb"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
{
"_id" : ObjectId("556dc47c7e882d3c2fe9e0fd"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
I want to set the guid to be unique, so no to duplicate is possible (Like primary key in MYSQL). So the DB will look like this:
{
"_id" : ObjectId("556da79a77f9f7465943ff93"),
"guid" : "a12345",
"update_day" : "12:05:10 02.06.15"
}
{
"_id" : ObjectId("556dc4509a0a6a002f97e972"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
and when I will insert another "guid":"bbbb" (with the save command), it will fails.
While declaring schema in mongoose, do this
guid : { type : String, unique : true}
AND if you want mongodb to create the guid on its own (like _id) then do this
guid : { type : String, index : { unique : true} }
First, you have to deal with the current state of your MongoDB collection and delete all the duplicated documents.
One thing is sure : you won't be able to create the unique index with duplicates in your collection and dropDupes is now deprecated since the version 2.7.5 so you can't use it. By the way, it was removed because it was almost impossible to predict which document would be deleted in the process.
Two possible solutions :
Create a new collection. Create the unique index on this new collection and run a batch to copy all the documents from the old collection to the new one and make sure you ignore duplicated key error during the process.
Deal with it in your own collection manually :
make sure you won't insert more duplicated documents in your code,
run a batch on your collection to delete the duplicates (and make sure you keep the good one if they are not completely identical),
then add the unique index.
I would declare my guid like so in mongoose :
guid : { type : String, unique : true}
I have a collection named App and need to query those active (active: true) apps that belong to a particular user (user_id) or are available to all users (by their _id). I use query like this
{
"active" : true,
"$or" : [
{
"user_id" : "111111111111111111111111"
},
{
"_id" : {
"$in" : [
ObjectId("222222222222222222222222"),
ObjectId("333333333333333333333333"),
ObjectId("444444444444444444444444")
]
}
}
]
}
However in db.currentOp(true) I see that this query is running very slowly: lockStats.timeLockedMicros.r is about 3000.
How can I optimize performance of this query? I already have the following indexes on App:
> db.App.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.App"
},
{
"v" : 1,
"key" : {
"active" : 1,
"created_at" : -1
},
"name" : "active_1_created_at_-1",
"ns" : "mydb.App",
"background" : true
},
{
"v" : 1,
"key" : {
"active" : 1,
"user_id" : 1
},
"name" : "active_1_user_id_1",
"ns" : "mydb.App",
"background" : true
}
]
Two issues I see here:
1) You would not need index on the boolean field active as it would have low selectivity and not benefiting query performance.
"If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes." source
2) You need an index for user_id because user_id cannot use the compound index you created for active_1_user_id_1
Edit: You can always check index efficiency by doing a explain(true) and look at which indexes are used for that query.
I would try to do the following:
remove all your indexes, your active field has a low cardinality (boolean) and does not help you at all, you are not using created_at, so there is no reason for it.
add an index only on user_id key
change your strings as numbers to numbers.
What exactly happens when I call ensureIndex(data) when typical data looks like data:{name: "A",age:"B", job : "C"} ? Will it create a compound index over these three fields or will it create only one index applicable when anything from data is requested or something altogether different ?
You can do either :
> db.collection.ensureIndex({"data.name": 1,"data.age":1, "data.job" : 1})
> db.collection.ensureIndex({"data": 1})
This is discussed in the documentation under indexes-on-embedded-fields and indexes on sub documents
The important section of the sub document section is 'When performing equality matches on subdocuments, field order matters and the subdocuments must match exactly.'
This means that the 2 indexes are the same for simple queries .
However, as the sub-document example shows, you can get some interesting results (that you might not expect) if you just index the whole sub-document as opposed to a specific field and then do a comparison operator (like $gte) - if you index a specific sub field you get a less flexible, but potentially more useful index.
It really all depends on your use case.
Anyway, once you have created the index you can check what's created with :
> db.collection.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.collection",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"data.name" : 1,
"data.age" : 1,
"data.job" : 1
},
"ns" : "test.collection",
"name" : "data.name_1_data.age_1_data.job_1"
}
]
As you can see from the output it created a new key called data.name_1_data.age_1_data.job_1 (the _id_ index is always created).
If you want to test your new index then you can do :
> db.collection.insert({data:{name: "A",age:"B", job : "C"}})
> db.collection.insert({data:{name: "A1",age:"B", job : "C"}})
> db.collection.find({"data.name" : "A"}).explain()
{
"cursor" : "BtreeCursor data.name_1_data.age_1_data.job_1",
.... more stuff
The main thing is that you can see that your new index was used (BtreeCursor data.name_1_data.age_1_data.job_1 in the cursor field is what indicates this is the case). If you see "cursor" : "BasicCursor", then your index was not used.
For more detailed information look here.
you can try this :
db.collection.ensureIndex({"data.name": 1,"data.age":1, "data.job" : 1})
I can't remove the _id index, why?
When I try running the dropIndexes command, it removes all indexes but not the _id index.
Doing 'db.runCommand' doesn't work either:
> db.runCommand({dropIndexes:'fs_files',index:{_id:1}})
{ "nIndexesWas" : 2, "errmsg" : "may not delete _id index", "ok" : 0 }
not ok.
Can i use a field including _id in a composite index?
I couldn't find anything online, the ensureindex command can't do it.
db.fs_files.ensureIndex({'_id':1, 'created':1});
the above command just created a new composite index. i haven't found some similar 'create Index' command.
the default _id index is a unique index?
the getIndexes returns it's not a unique index.
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "gridfs.fs_files",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"created" : 1
},
"unique" : true,
"ns" : "gridfs.fs_files",
"name" : "created_1"
}
There is a createIndex command in addition to ensureIndex also.
E.g.
db.<coll>.createIndex({foo:1})
You cannot delete the index on "_id" in mongodb.
Please see the documentation here