Is _id field truly unique in mongodb? - mongodb

I probably would not ask if I have not seen this:
> db.requests.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
{
"v" : 2,
"unique" : true,
"key" : {
"name" : 1,
},
"name" : "name_1"
}
]
Where _id index does not have unique: true. Can it mean that the _id index is somehow not truly unique or something? Can it behave differently (non-unique) if _id is populated with non-ObjectId values - some other fundamental types?

Related

Finding Unique Indexes in MongoDB

How I can find all unique indexes in MongoDB?
The db.collection.getIndexes() function doesn't give any information about uniqueness.
getIndexes() should work:
db.collection.createIndex({key: 1}, {unique: true})
db.collection.getIndexes()
[
{
"v" : 2,
"key" : { "_id" : 1 },
"name" : "_id_"
},
{
"v" : 2,
"key" : { "key" : 1 },
"name" : "key_1",
"unique" : true
}
]
If the index is not unique then "unique": true is simply missing.

mongo find data not using someone index

I have a unique index, but now some data does not have this index, resulting in duplicate data, I want to find out this part of the data, I want to query data that does not have this index.
Like this:
MongoDB shell version v3.6.8
MongoDB server version: 4.0.12
# there is no not_hint func
db.col.find().not_hint("md5_1_domain_1_ip_1_uri_1")
# hint not allowed to use $ne
db.col.find()._addSpecial("$hint", {"$ne": {"md5" : 1, "domain" : 1, "ip" : 1, "uri" : 1}})
The unique index
{
"v" : 2,
"key" : {
"md5" : "hashed"
},
"name" : "md5_hashed",
"ns" : "mdm.col"
},
{
"v" : 2,
"unique" : true,
"key" : {
"md5" : 1,
"domain" : 1,
"ip" : 1,
"uri" : 1
},
"name" : "md5_1_domain_1_ip_1_uri_1",
"background" : true,
"ns" : "mdm.col"
}
The data, I have modified some sensitive information and I am sure they are the same. And the data cannot be queried by a unique index. Only use _id or the other index to query.
mongos> db.col.find({ "_id" : ObjectId("5fb2df3b32b0f42dced04ea7")})
{ "_id" : ObjectId("5fb2df3b32b0f42dced04ea7"), "domain" : null, "ip" : 1, "md5" : BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri" : "x * 1025", "count" : 6, "fseen" : ISODate("2019-08-03T13:56:38Z"), "lseen" : ISODate("2019-08-03T13:56:38Z"), "sha1" : null, "sha256" : null, "src" : [ "xx2", "xx3" ] }
mongos> db.col.find({'_id': ObjectId('5fb2df3d32b0f42dced0721d')})
{ "_id" : ObjectId("5fb2df3d32b0f42dced0721d"), "domain" : null, "ip" : 1, "md5" : BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri" : "x * 1025", "count" : 6, "fseen" : ISODate("2019-08-03T13:56:38Z"), "lseen" : ISODate("2019-08-03T13:56:38Z"), "sha1" : null, "sha256" : null, "src" : [ "xx2", "xx3" ] }
mongos> db.col.find({"md5": BinData(5,"anQTYWNGHKoj4xx+KTjNxQ=="), "uri": "x * 1025", "ip": 1}
mongos> # it is None
And this info:
mongos> db.col.find().count()
5549020886
mongos> db.col.find().hint("md5_1_domain_1_ip_1_uri_1").count()
5521037206
The uri length is over 1024 and the data is not indexed. I want to find that 27983680 terms data and repair it.
Thanks
Strange how it could be. Anyway, you can find duplicate data with this aggregation pipeline:
db.col.aggregate([
{
$group: {
_id: {
md5: "$md5",
domain: "$domain",
ip: "$ip",
uri: "$uri"
}
},
count: { $sum: 1 },
ids: { $push: "$_id" }
},
{ $match: { count: { $gt: 1 } } }
], { allowDiskUse: true })
The result has field ids with array of _id from duplicate data.
I found the reason. The uri length is over 1024 and the data is not indexed.The DBA colse failIndexKeyTooLong. But I still can't find this part of the data.

Mongodb sorting issue

My mongodb collection:
[{
"_id" : ObjectId("5dd6598d55396f36052e347d"),
"isActive" : true,
"myarray" : [
{
"my_id" : "5d967d08821b4031a197b002",
"name" : "jack"
},
{
"my_id" : "5d967d2c821b4031a197b003",
"name" : "manison"
}
]
},
{
"_id" : ObjectId("5dd6598d55396f36052e347d"),
"isActive" : true,
"myarray" : [
{
"my_id" : "5d967d08821b4031a197b002",
"name" : "penelope"
},
{
"my_id" : "5d967d2c821b4031a197b003",
"name" : "cruz"
}
]
}]
Here i am trying to sort based on the name.
not expecting to sort inside the array but expecting outside.
Expecting result be like
[{
"_id" : ObjectId("5dd6598d55396f36052e347d"),
"isActive" : true,
"myarray" : [
{
"my_id" : "5d967d08821b4031a197b002",
"name" : "penelope"
},
{
"my_id" : "5d967d2c821b4031a197b003",
"name" : "cruz"
}
]
},{
"_id" : ObjectId("5dd6598d55396f36052e347d"),
"isActive" : true,
"myarray" : [
{
"my_id" : "5d967d08821b4031a197b002",
"name" : "jack"
},
{
"my_id" : "5d967d2c821b4031a197b003",
"name" : "manison"
}
]
}]
"name" : "cruz" coming first because alphabatically C comes fast than J AND M (which is in second json)
And Prenelop and cruz didn't switched just the main document json switched as per the name order
Query i am using
db.traffic.aggregate([
{$unwind: "$customFieldArray"},
{$sort: {"customFieldArray.field_value":1}},
{$group: {_id:"$_id", customFieldArray: {$push:"$customFieldArray"}}}
]);
But it is sorting inside like taking cruz to penelope and vice versa.
And main json staying stable.
Please have a look
You can do with simple find query with sort cursor
db.traffic.find({}).sort({ "myarray.name": -1 })
From the docs
With arrays, a less-than comparison or an ascending sort compares the
smallest element of arrays, and a greater-than comparison or a
descending sort compares the largest element of the arrays.

Cannot create index - bad index key pattern

After some activities with my database, I lost my index. I had these indexes:
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "collection.statement"
},
{
"v" : 1,
"unique" : true,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "collection.statement"
}
and now I have only first one.
I've entered this command
db.collection.createIndex({
"v" : 1,
"unique" : true,
"key" :{ "name" : 1 },
"name" : "name_1",
"ns" : "collection.statement"
})
and I only get and error message that i have a bad index key pattern.
Please, help me, how to return this index? What I do wrong?
Use this:
db.collection.createIndex( { "name": 1 }, { unique: true } )
You attempt includes internal aspects of the index ("v" : 1), you just need to supply the field(s) and an order for each and the unique instruction.
More details in the docs.

Unable to create unique index with sparse mongodb

I'm using mongodb 2.6.1. However, I'm not able to create unique index with sparse. Currently, I have the following indexes:
> db.products.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "snapyshop_production.products"
},
{
"v" : 1,
"key" : {
"pickup_location" : "2dsphere"
},
"name" : "pickup_location_2dsphere",
"background" : true,
"ns" : "snapyshop_production.products",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"category_id" : 1
},
"name" : "category_id_1",
"background" : true,
"ns" : "snapyshop_production.products"
},
{
"v" : 1,
"key" : {
"_keywords" : 1
},
"name" : "_keywords_1",
"background" : true,
"ns" : "snapyshop_production.products"
}
]
But when I run this command, it prints out error:
> db.products.ensureIndex( { source_url: 1 }, { background: true, sparse: true, unique: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 4,
"ok" : 0,
"errmsg" : "E11000 duplicate key error index: snapyshop_production.products.$source_url_1 dup key: { : null }",
"code" : 11000
}
I really have no idea how to fix it.
The sparse index you're creating will allow multiple documents to exist without a source_url field, but will still only allow one document where the field is present with a value of null. In other words, the sparse index doesn't treat the null value case any different, only the missing field case.
So the typical way to handle your problem would be to update your collection to remove the source_url field from your existing docs where its value is null before creating the index:
db.products.update({source_url: null}, {$unset: {source_url: true}}, {multi: true})
And then use the absence of the field as your null indicator in your program logic.