Is MongoDB _id unique by default, or do I have to set it to unique?
For the most part, _id in mongodb is unique enough. There is one edge case where it's possible to generate a duplicate id though:
If you generate more than 16,777,215 object ids in the same single second, while running in one process while on the same machine then you will get a duplicate id due to how object ids are generated. If you change any one of those things like generating an id in a different process or on a different machine, or at a different unix time, then you won't get a duplicate.
Let me know if you ever manage to pull this off with a realistic use case. Apparently google only gets around 70,000 searches a second and those are all processed on different machines.
All documents contain an _id field. All collections (except for capped ones) automatically create unique index on _id.
Try this:
db.system.indexes.find()
ok .. short version
YES YES YES
_id uniqid by default , mongoDB creates index on _id by default and you do not need any settings
According to MongoDB's manual the answer is yes, it's unique by default:
MongoDB creates the _id index, which is an ascending unique index on the _id field, for all collections when the collection is created. You cannot remove the index on the _id field.
Share what I learned about this topic here:
Different from other RDBMS, Mongodb document's Id is generated on the client side. This functionality is usually implemented in the drivers of various programming languages.
The Id is string with 12 bytes length, which consists of several parts as follows:
TimeStamp(4 bytes) + MachineId(3 bytes) + ProcessId(2 bytes) + Counter(3 bytes)
Based on this pattern, it's extremely unlikely to have two Ids duplicated.
Related
The chat app schema that I have is something like below.
1. conversations {participants[user_1, user_2], convsersation_id}
2. messages {sender: user_1, sonversation_id, timestamps}
I want to map this relationship using existing _id:ObjectId which is already indexed.
But if I want to get all conversation of user_1 I have to first search in which conversation that user is involed and get that conversation's _id and again search for the messages in messages using that conversation _id.
So my questions are -
Does length of indexed field (here _id) matters while searching?
Should I create another shorter indexed fields?.
Also if there is any better alternative schema please suggest.
I would suggest you to maintain the data as sub documents instead of array. The advantage you have is you can build another index (only) on conversation_id field, which you want to query to know the user's involvement
When you maintain it as array, you cannot index the converstaion_id field separately, instead you will have to build a multi key index, which indexes all the elements of the array (sender and timestamps fields) which you are never going to use for querying and it also increases the index size
Answering you questions:
Does length of indexed field (here _id) matters while searching? - Not really
Should I create another shorter indexed fields? - Create sub-document and index converstaion_id
Also if there is any better alternative schema please suggest. - Maintain the array fields as sub-documents
I have a situation in which I have a User schema that contains a unique field called "username." At the same time, mongo automatically creates its own unique key, "_id."
I've noticed that for a lot of my schemas I need both an array of "usernames" as well as "ids". This is quite redundant sometimes so my question is:
Is a lookup via "_id" faster than a lookup for a field "username" (let's say a 10 character string)? If they are the same, is it viable to use my unique identifier username for the value of _id?
If your data naturally has a required, unique field, then it's perfectly fine to use that value as your _id.
As long as the field's data is comparable in size to an ObjectId (which is 12 bytes), then performance should be the same. A 10 character string is 20 bytes, so the index for username will take a bit more memory, but probably not enough to make a difference performance-wise.
Since you're using Mongoose, you could also create a virtual field (named username) that exposes the _id field using that more descriptive name, as well.
I think this is fine, UNLESS you will be changing the structure of your usernames in the future. Thus I think it's better to just stick with ObjectId() for the ID and then stick an extra field username if you need it.
In MongoDb ObjectId is a 12-byte BSON type.
Is there any way to reduce the size of objectID?
No. It's a BSON data type. It's like asking a 32-bit integer to shrink itself.
Every object must have _id property, but you are not restricted to ObjectId.
Every document in a MongoDB collection needs to have a unique _id but the value does not have to be an ObjectId. Therefore, if you are looking to reduce the size of documents in your collection you have two choices:
Pick one of the unique properties of your documents and use it as the _id field. For example, if you have an accounts collection where the account ID--provided externally--is part of your data model, you could store the account ID in the _id field.
Manage primary keys for the collection yourself. Many drivers support custom primary key factories. As #assylias suggests, going with an int will give you good space savings but, still, you will use more space than if you can use one of the fields in your model as the _id.
BTW, the value of an _id field can be composite: you can use an Object/hash/map/dictionary. See, for example, this SO question.
If you are using some type of object/model framework on top of Mongo, I'd be careful with (1). Some frameworks have a hard time with developers overriding id generation. For example, I've had bad experience with Mongoid in Ruby. In that case, (2) may be the safer way to go as the generation happens at the driver layer.
What's the algorithm for MongoDB to calculate the "_id" field. It looks it is incremental.
I'm wondering if it is safe to sort by "_id" field as sort by time the document inserted.
The way ids are generated is described here. Turns out leading bytes are given to the timestamp, so probably the order of ids corresponds to the order of insertion (if we don't consider deviations in time between different machines).
If you need to sort by order of insertion then you need to add your own field for timestamp or incremental counter. In a sharded set-up sorting by _id might not work.
Considering I want to create mongoDB documents for a bunch of distinct URLs: what would be the pros and cons (if any) of using the actual URL as the documents _id value instead of the default BSON ObjectId. Thanks in advance!
Cheers,
Greg
An overview of the subject here: http://www.mongodb.org/display/DOCS/Object+IDs
It has to be unique, you could potentially put yourself in the position of having to resolve collisions yourself. Better to leave the default _id alone and simply query against a field you're storing in the document, just how God (10gen) intended.
From http://www.mongodb.org/display/DOCS/BSON
The element name "_id" is reserved for use as a primary key id, but
you can store anything that is unique in that field. The database
expects that drivers will prevent users from creating documents that
violate these constraints.
From #mongodb
stupid _id values will probably make querying slow, but that's about it
And another user from #mongodb
Tell him the collisions will result in garbage data