I'd like to make sharding of my existing users collection. Users collection has already single ascending index by default {"_id" : 1}. I want to convert this index to "hashed" and to shard based on this hashed key according to the documentation:
I've tried "brute-force" solution to delete default index and then recreate it with "hashed" parameter but it doesn't allow to do that.
UPDATE: I've also tried db.users.ensureIndex({_id: "hashed"}). But after I run this command nothing really happens.
switched to db bg_shard_single
mongos> db.users.ensureIndex({_id:"hashed"});
mongos> db.users.getIndexes();
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "bg_shard_single.users",
"name" : "_id_"
}
]
It does not allow you to do so because you can not create an index from _id field. Instead of this you can do something like this db.collection.ensureIndex( { _id: "hashed" } ) to create a hashing index on this field.
Then you will see "name" : "_id_hashed" as your hashed index which you can use for sharding purposes later.
I've found what was the problem. Apparently, I was using the old version of mongodb. That's why mongos didn't want me to update '_id' to "hashed". After I've updated to 2.4.8 as #Salvador-Dali mentions it becomes "name" : "_id_hashed".
Related
I recently saw this error in a Mongo 2.6 replicaset:
WARNING: the collection 'mydatabase.somecollection' lacks a unique index on _id. This index is needed for replication to function properly.
I assumed the _id index would be unique by default. But I am trying to check / set it. getIndexes shows there is no unique option set.
> db.somecollection.getIndexes()[0]
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mydatabase.somecollection",
"name" : "_id_"
}
> db.somecollection.ensureIndex({"_id":1},{unique:true})
> { "numIndexesBefore" : 3, "note" : "all indexes already exist", "ok" : 1 }
> db.somecollection.getIndexes()[0]
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "mydatabase.somecollection",
"name" : "_id_"
}
I have tried .validate(true):
...
"valid" : true,
"errors" : [ ],
"ok" : 1
}
and also .reIndex() that runs without error. I am unable to remove the _id index to recreate it - how can I set the index to unique or what should I do to ensure data consistency in the RS? Note the RS was upgraded as per upgrade instructions from 2.2 --> 2.4 --> 2.6. I have found this MongoDB - Collection lacks a unique index on _id but there is nothing that resolves my issue in there.
I have seen this in the past when a new member to the replica set was added with a different Compatibility Version. Run db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } ) on all of your nodes in the replica set and if one is different, stop the replication on that node, change the CompatibilityVersion and then ready it to the replica.
So it turns out that the error came up when a new member was added to the existing replica set, and was only shown on that member. If I connect to the database and try add a duplicate _id, I get the usual E11000 duplicate key error index: ... even though getIndexes() doesn't indicate the unique constraint on the index (assuming it is implicit).
My query is failing to find all matching results. If I add an additional _id parameter to a specific matching example, I am getting results
> db.reviews.count({"contentProvider":"GLORP", "responses.0": {$exists: true}})
0
> db.reviews.count({_id: "1234", "contentProvider":"GLORP", "responses.0": {$exists: true}})
1
the first query is using index:
"indexName" : "contentProvider_1_reviewDetail_1_reviewerUserName_1_providerReviewId_1",
and the query with the _id is of course using the _id_ index:
"indexName" : "_id_"
Here is the index in question:
{
"v" : 1,
"key" : {
"contentProvider" : 1,
"reviewDetail" : 1,
"reviewerUserName" : 1,
"providerReviewId" : 1
},
"name" : "contentProvider_1_reviewDetail_1_reviewerUserName_1_providerReviewId_1",
"ns" : "test.reviews",
"background" : true
}
Using mongodb version 3.2.3
Is the index corrupted? Will dropping it and readding it likely fix the problem?
It's possible and you could certainly try it, however without knowing what version of MongoDB you are using and without seeing the index definition I cannot say for certain.
There are multiple different types of indexes as well as index properties like: sparse or partial that can change behavior and may explain why the index doesn't return the results you expect.
I'd recommend checking the index first and see if the index definition has any properties that would result in the document being excluded.
If not then you can always drop the index and recreate it.
I created mongo db collection index using java code
dbCollection.createIndex("accountNumber");
When i see indices using
db.accounts.getIndexes()
I am getting the index name as "accountNumber_1"
How to get the index name also same as document field? or how to give index name?
Is naming indices important or i can ignore this?
When we create index on the document users
> db.users.createIndex({name: 1})
{
"ok" : 0,
"errmsg" : "Index with name: name_1 already exists with different option
s",
"code" : 85
}
the name: name_1 is returned, then we can get the index through getIndexes()
> db.users.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.users"
},
{
"v" : 1,
"unique" : true,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "test.users",
"background" : true,
"safe" : null
}
]
We know, the name_1 is just the value of index name. and the key name is used to create index for document users. I think the name_1 is the value of name to meet BSON structure. We can ignore it...
You can create index with the name you wanted using the other variant of createIndex method, refer java API here.
public void createIndex(DBObject keys,
DBObject options)
Creates an index on the field specified, if that index does not already exist.
Prior to MongoDB 3.0 the dropDups option could be used with unique indexes allowing documents with duplicate values to be dropped when building the index. Later versions of MongoDB will silently ignore this setting.
Parameters:
keys - a document that contains pairs with the name of the field or fields to index and order of the index
options - a document that controls the creation of the index.
MongoDB documentation
Index Creation Tutorials
You can corresponding mongodb documentation here.
Basically the second parameter 'options' contain an option to supply index name explicitly.
I have a collection in MongoDB which has following documents.
/* 0 */
{
"T" : [
374135056604448742
],
"_id" : {
"#" : 7778532275691,
"ts" : ISODate("2013-07-26T02:25:00Z")
}
}
/* 1 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:00:00Z")
}
}
/* 2 */
{
"T" : [
1056188940167152853
],
"_id" : {
"#" : 34103385525388,
"ts" : ISODate("2013-07-30T03:18:00Z")
}
}
Now, I'm trying to query some documents with following query.
db.entries.find({
'_id.ts': {'$gte': beginTS, '$lte': endTS},
'_id.#' : 884327843395156951
}).hint([('_id', 1)]).explain()
According to my understanding, since _id is a compound field, and Mongo always maintains a index on _id, hence to answer above query, Mongo should have used the index on '_id'. However, the answer to the above query is as following:
{u'allPlans': [{u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'n': 2803,
u'nscanned': 4869528,
u'nscannedObjects': 4869528}],
u'cursor': u'BtreeCursor _id_',
u'indexBounds': {u'_id': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
u'indexOnly': False,
u'isMultiKey': False,
u'millis': 128415,
u'n': 2803,
u'nChunkSkips': 0,
u'nYields': 132,
u'nscanned': 4869528,
u'nscannedAllPlans': 4869528,
u'nscannedObjects': 4869528,
u'nscannedObjectsAllPlans': 4869528,
u'scanAndOrder': False,
As it can be observed, MongoDB is doing an entire scan of DB to find just handful of documents. I don't know what the hell is wrong here.
I tried changing the order of query, but same result. I have no idea what is happening here. Any help if deeply appreciated.
UPDATE
I understood the nuance here. The _id is not a compound index, it's a mere exact index. This means that if _id is a document then irrespective of the structure of document and how many nested attrs or sub-documents it may have, the _id index will only contain one entry for the _id field. This entry is suppose to be hash of _id document and will be maintained unique.
You are using an object as a key, but you're not using a compund index here.
The _id index is a bit special, because it is created automatically and is always unique. Normally, the _id index is an ObjectId, a UUID or maybe an integer or a string that contains some kind of hash. MongoDB supports complex objects as keys. However, to MongoDB, this is still just a document. It can be compared to other documents, and documents that have the same fields and values will be equal. But since you didn't create the index keys (and you can't create that index manually), MongoDB has no idea that it contains a field # and a field ts.
A compound index, on the other hand, refers to the fields of a document explicitly, e.g. {"product.quantity" : 1, "product.created" : -1}. This must be specified when the index is created.
It seems you're trying to basically store a timestamp in your primary key. MongoDB's ObjectId already contains a timestamp, so you can do date-based range queries on ObjectIds directly.
I remember reading somewhere that the mongo engine was more confortable when the entire structure of a document was already in place in case of an update, so here is the question.
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
{
_id: ObjectId("5192b6072fda974610000005"),
description: ""
}
or
{
_id: ObjectId("5192b6072fda974610000005"),
description: null
}
or
{
_id: ObjectId("5192b6072fda974610000005")
}
You have to remember that the description field may or may not be filled in every document (based on user input).
Introduction
If a document doesn't have a value, the DB considers its value to be null. Suppose a database with the following documents:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
If you create a query to find documents with the field desc different than null, you will get just one document:
db.test.find({desc: {$ne: null}})
// Output:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The database doesn't differ documents without a desc field and documents with a desc field with the value null. One more test:
db.test.find({desc: null})
// Output:
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
But the differences are only ignored in the queries, because, as shown in the last example above, the fields are still saved on disk and you'll receive documents with the same structure of the documents that were sent to the MongoDB.
Question
When dealing with "empty" data, for example when inserting an empty string, should I default it to null, "" or not insert it at all ?
There isn't much difference from {desc: null} to {}, because most of the operators will have the same result. You should only pay special attention to these two operators:
$exists
$type
I'd save documents without the desc field, because the operators will continue to work as expected and I'd save some space.
Padding factor
If you know the documents in your database grow frequently, then MongoDB might need to move the documents during the update, because there isn't enough space in the previous document place. To prevent moving documents around, MongoDB allocates extra space for each document.
The ammount of extra space allocated by MongoDB per document is controlled by the padding factor. You cannot (and don't need to) choose the padding factor, because MongoDB will adaptively learn it, but you can help MongoDB preallocating internal space for each document by filling the possible future fields with null values. The difference is very small (depending on your application) and might be even smaller after MongoDB learn the best padding factor.
Sparse indexes
This section isn't too important to your specific problem right now, but may help you when you face similar problems.
If you create an unique index on field desc, then you wouldn't be able to save more than one document with the same value and in the previous database, we had more than one document with same value on field desc. Let's try to create an unique index in the previous presented database and see what error we get:
db.test.ensureIndex({desc: 1}, {unique: true})
// Output:
{
"err" : "E11000 duplicate key error index: test.test.$desc_1 dup key: { : null }",
"code" : 11000,
"n" : 0,
"connectionId" : 3,
"ok" : 1
}
If we want to be able to create an unique index on some field and let some documents have this field empty, we should create a sparse index. Let's try to create the unique index again:
// No errors this time:
db.test.ensureIndex({desc: 1}, {unique: true, sparse: true})
So far, so good, but why am I explaining all this? Because there is a obscure behaviour about sparse indexes. In the following query, we expect to have ALL documents sorted by desc.
db.test.find().sort({desc: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
The result seems weird. What happened to the missing document? Let's try the query without sorting it:
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
All documents were returned this time. What's happening? It's simple, but not so obvious. When we sort the result by desc, we use the sparse index created previously and there is no entries for the documents that haven't the desc field. The following query show us the use of the index to sort the result:
db.test.find().sort({desc: 1}).explain().cursor
// Output:
"BtreeCursor desc_1"
We can skip the index using a hint:
db.test.find().sort({desc: 1}).hint({$natural: 1})
// Output:
{ "_id" : ObjectId("5192d23f1698aa96f0690d97"), "a" : 1, "desc" : null }
{ "_id" : ObjectId("5192d2441698aa96f0690d98"), "a" : 1 }
{ "_id" : ObjectId("5192d23b1698aa96f0690d96"), "a" : 1, "desc" : "" }
Summary
Sparse unique indexes don't work if you include {desc: null}
Sparse unique indexes don't work if you include {desc: ""}
Sparse indexes might change the result of a query
There is little difference between the null value field and a document without the field. The main difference is that the former consumes a little disk space, while the latter does not consume at all. They can be distinguished by using $exists operator.
The field with an empty string is quite different from them. Though it depends on purpose I don't recommend to use it as a replacement for null. To be precise, they should be used to mean different things. For instance, think about voting. A person who cast a blank ballot is different from a person who wasn't permitted to vote. The former vote is an empty String, while the latter vote is null.
There is already a similar question here.