How I can find all unique indexes in MongoDB?
The db.collection.getIndexes() function doesn't give any information about uniqueness.
getIndexes() should work:
db.collection.createIndex({key: 1}, {unique: true})
db.collection.getIndexes()
[
{
"v" : 2,
"key" : { "_id" : 1 },
"name" : "_id_"
},
{
"v" : 2,
"key" : { "key" : 1 },
"name" : "key_1",
"unique" : true
}
]
If the index is not unique then "unique": true is simply missing.
Related
I have a unique index, but now some data does not have this index, resulting in duplicate data, I want to find out this part of the data, I want to query data that does not have this index.
Like this:
MongoDB shell version v3.6.8
MongoDB server version: 4.0.12
# there is no not_hint func
db.col.find().not_hint("md5_1_domain_1_ip_1_uri_1")
# hint not allowed to use $ne
db.col.find()._addSpecial("$hint", {"$ne": {"md5" : 1, "domain" : 1, "ip" : 1, "uri" : 1}})
The unique index
{
"v" : 2,
"key" : {
"md5" : "hashed"
},
"name" : "md5_hashed",
"ns" : "mdm.col"
},
{
"v" : 2,
"unique" : true,
"key" : {
"md5" : 1,
"domain" : 1,
"ip" : 1,
"uri" : 1
},
"name" : "md5_1_domain_1_ip_1_uri_1",
"background" : true,
"ns" : "mdm.col"
}
The data, I have modified some sensitive information and I am sure they are the same. And the data cannot be queried by a unique index. Only use _id or the other index to query.
mongos> db.col.find({ "_id" : ObjectId("5fb2df3b32b0f42dced04ea7")})
{ "_id" : ObjectId("5fb2df3b32b0f42dced04ea7"), "domain" : null, "ip" : 1, "md5" : BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri" : "x * 1025", "count" : 6, "fseen" : ISODate("2019-08-03T13:56:38Z"), "lseen" : ISODate("2019-08-03T13:56:38Z"), "sha1" : null, "sha256" : null, "src" : [ "xx2", "xx3" ] }
mongos> db.col.find({'_id': ObjectId('5fb2df3d32b0f42dced0721d')})
{ "_id" : ObjectId("5fb2df3d32b0f42dced0721d"), "domain" : null, "ip" : 1, "md5" : BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri" : "x * 1025", "count" : 6, "fseen" : ISODate("2019-08-03T13:56:38Z"), "lseen" : ISODate("2019-08-03T13:56:38Z"), "sha1" : null, "sha256" : null, "src" : [ "xx2", "xx3" ] }
mongos> db.col.find({"md5": BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri": "x * 1025", "ip": 1}
mongos> # it is None
And this info:
mongos> db.col.find().count()
5549020886
mongos> db.col.find().hint("md5_1_domain_1_ip_1_uri_1").count()
5521037206
The uri length is over 1024 and the data is not indexed. I want to find that 27983680 terms data and repair it.
Thanks
Strange how it could be. Anyway, you can find duplicate data with this aggregation pipeline:
db.col.aggregate([
{
$group: {
_id: {
md5: "$md5",
domain: "$domain",
ip: "$ip",
uri: "$uri"
}
},
count: { $sum: 1 },
ids: { $push: "$_id" }
},
{ $match: { count: { $gt: 1 } } }
], { allowDiskUse: true })
The result has field ids with array of _id from duplicate data.
I found the reason. The uri length is over 1024 and the data is not indexed.The DBA colse failIndexKeyTooLong. But I still can't find this part of the data.
I probably would not ask if I have not seen this:
> db.requests.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"unique" : true,
"key" : {
"name" : 1,
},
"name" : "name_1"
}
]
Where _id index does not have unique: true. Can it mean that the _id index is somehow not truly unique or something? Can it behave differently (non-unique) if _id is populated with non-ObjectId values - some other fundamental types?
After some activities with my database, I lost my index. I had these indexes:
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "collection.statement"
},
{
"v" : 1,
"unique" : true,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "collection.statement"
}
and now I have only first one.
I've entered this command
db.collection.createIndex({
"v" : 1,
"unique" : true,
"key" :{ "name" : 1 },
"name" : "name_1",
"ns" : "collection.statement"
})
and I only get and error message that i have a bad index key pattern.
Please, help me, how to return this index? What I do wrong?
Use this:
db.collection.createIndex( { "name": 1 }, { unique: true } )
You attempt includes internal aspects of the index ("v" : 1), you just need to supply the field(s) and an order for each and the unique instruction.
More details in the docs.
One mongodb collection
{
"_id" : ObjectId("574bbae4d009b5364abaebe5"),
"cityid" : 406,
"location" : {
"type" : "Point",
"coordinates" : [
118.602355,
24.89083
]
},
"shopid" : "a"
}
with about 50, 000 rows;
and indexes:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "pingan-test.shop_actinfo_collection_0530"
},
{
"v" : 1,
"key" : {
"location" : "2dsphere"
},
"name" : "location_2dsphere",
"ns" : "pingan-test.shop_actinfo_collection_0530",
"2dsphereIndexVersion" : 3
},
{
"v" : 1,
"key" : {
"shopid" : 1,
"cityid" : 1
},
"name" : "shopid_1_cityid_1",
"ns" : "pingan-test.shop_actinfo_collection_0530"
}
]
I query this collection like:
body = {'cityid': 2, 'location': {'$near': {'$geometry': {'type': 'Point', 'coordinates': [122.0, 31.0]}}}, 'shopid': {'$in': ['a','b']}}
results = collection.find(body, {'shopid': 1, '_id':0},).batch_size(20).limit(20)
shops = list(results)
The question is that it run about 400ms. But it just take 30ms if we don't care about location.
why and how to fix? please.
You have an index on shopid and cityid, but you search for cityid. Since the index is ordered by shopid first it cannot be used to search by cityid. If you change the index to cityid: 1, shopid: 1, then you will see a performance improvement because your query will be able to search using the index.
after all, i got it.
I just create a index to cityid: 1, shopid: 1, "location" : "2dsphere"
, and then, world peace。
and thanks #tiramisu again.
I'm using mongodb 2.6.1. However, I'm not able to create unique index with sparse. Currently, I have the following indexes:
> db.products.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "snapyshop_production.products"
},
{
"v" : 1,
"key" : {
"pickup_location" : "2dsphere"
},
"name" : "pickup_location_2dsphere",
"background" : true,
"ns" : "snapyshop_production.products",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"category_id" : 1
},
"name" : "category_id_1",
"background" : true,
"ns" : "snapyshop_production.products"
},
{
"v" : 1,
"key" : {
"_keywords" : 1
},
"name" : "_keywords_1",
"background" : true,
"ns" : "snapyshop_production.products"
}
]
But when I run this command, it prints out error:
> db.products.ensureIndex( { source_url: 1 }, { background: true, sparse: true, unique: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 4,
"ok" : 0,
"errmsg" : "E11000 duplicate key error index: snapyshop_production.products.$source_url_1 dup key: { : null }",
"code" : 11000
}
I really have no idea how to fix it.
The sparse index you're creating will allow multiple documents to exist without a source_url field, but will still only allow one document where the field is present with a value of null. In other words, the sparse index doesn't treat the null value case any different, only the missing field case.
So the typical way to handle your problem would be to update your collection to remove the source_url field from your existing docs where its value is null before creating the index:
db.products.update({source_url: null}, {$unset: {source_url: true}}, {multi: true})
And then use the absence of the field as your null indicator in your program logic.