Change size of Objectid - mongodb

In MongoDb ObjectId is a 12-byte BSON type.
Is there any way to reduce the size of objectID?

No. It's a BSON data type. It's like asking a 32-bit integer to shrink itself.
Every object must have _id property, but you are not restricted to ObjectId.

Every document in a MongoDB collection needs to have a unique _id but the value does not have to be an ObjectId. Therefore, if you are looking to reduce the size of documents in your collection you have two choices:
Pick one of the unique properties of your documents and use it as the _id field. For example, if you have an accounts collection where the account ID--provided externally--is part of your data model, you could store the account ID in the _id field.
Manage primary keys for the collection yourself. Many drivers support custom primary key factories. As #assylias suggests, going with an int will give you good space savings but, still, you will use more space than if you can use one of the fields in your model as the _id.
BTW, the value of an _id field can be composite: you can use an Object/hash/map/dictionary. See, for example, this SO question.
If you are using some type of object/model framework on top of Mongo, I'd be careful with (1). Some frameworks have a hard time with developers overriding id generation. For example, I've had bad experience with Mongoid in Ruby. In that case, (2) may be the safer way to go as the generation happens at the driver layer.

Related

NoSQL/MongoDB: When do I need _id in nested objects?

I don't understand the purpose of the _id field when it refers to a simple nested object, eg it is not used as a root entity in a dedicated collection.
Let's say, we have a simple value-object like Money({currency: String, amount: Number }). Obviously, you are never going to store such a value object by itself in a dedicated collection. It will rather be attached to a Transaction or a Product, which will have their own collection.
But you still might query for specific currencies or amounts that are higher/lower than a given value. And you might programmatically (not on DB level) set the value of a transaction to the same as of a purchased product.
Would you need an _id in this case? What are the cases, where a nested object needs an _id?
Assuming the _id you are asking about is an ObjectId, the only purpose of such field is to uniquely identify an object. Either a document in a collection or a subdocument in an array.
From the description of your money case it doesn't seem necessary to have such _id field for your value object. A valid use case might be an array of otherwise non-unique/non-identifiable subdocuments e.g. log entries, encrypted data, etc.
ODMs might add such field by default to have a unified way to address subdocuments regardless of application needs.

Can mongo document ID format be customised?

I would like to encode some meaning behinds first N characters of every document ID i.e. make first three characters determine a document type sensible to the system being used in.
You can have a custom _id when you insert the document. If the document to be inserted doesn't contain _id, then MongoDB will insert a ObejctId for you.
The _id can be of any type but keeping it uniform for all the documents makes sense if you are accessing from application layer.
You can refer one of the old questions at SO - How to generate unique object id in mongodb

Mongodb id on bulk insert performance

I have a class/object that have a guid and i want to use that field as the _id object when it is saved to Mongodb. Is it possible to use other value instead of the ObjectId?
Is there any performance consideration when doing bulk insert when there is an _id field? Is _id an index? If i set the _id to different field, would it slow down the bulk insert? I'm inserting about 10 million records.
1) Yes you can use that field as the id. There is no mention of what API (if any) you are using for inserting the documents. So if you would do the insertion at the command line, the command would be:
db.collection.insert({_id : <BSONString_version_of_your_guid_value>, field1 : value1, ...});
It doesn't have to be BsonString. Change it to whatever Bson value is closest matching to your guid's original type (except the array type. Arrays aren't allowed as the value of _id field).
2) As far as i know, there IS effect on performance when db.collection.insert when you provide your own ids, especially in bulk, BUT if the id's are sorted etc., there shouldn't be a performance loss. The reason, i am quoting:
The structure of index is a B-tree. ObjectIds have an excellent
insertion order as far as the index tree is concerned: they are always
increasing, meaning they are always inserted at the right edge of
B-tree. This, in turn, means that MongoDB only has to keep the right
edge of the B-Tree in memory.
Conversely, a random value in the _id field means that _ids will be
inserted all over the tree. Then the machine must move a page of the
index into memory, update a tiny piece of it, then probably ignore it
until it slides out of memory again. This is less efficient.
:from the book `50 Tips and Tricks for MongoDB Developers`
The tip's title says - "Override _id when you have your own simple, unique id." Clearly it is better to use your id if you have one and you don't need the properties of an ObjectId. And it is best if your ids are increasing for the reason stated above.
3) There is a default index on _id field by MongoDB.
So...
Yes. It is possible to use other types than ObjectId, including GUID that will be saved as BinData.
Yes, there are considerations. It's better if your _id is always increasing (like a growing number, or ObjectId) otherwise the index needs to rebuild itself more often. If you plan on using sharding, the _id should also be hashed evenly.
_id indeed has an index automatically.
It depends on the type you choose. See section 2.
Conclusion: It's better to keep using ObjectId unless you have a good reason not to.

Does replacing the "_id" field in mongodb with a custom unique key decrease performance?

I have a situation in which I have a User schema that contains a unique field called "username." At the same time, mongo automatically creates its own unique key, "_id."
I've noticed that for a lot of my schemas I need both an array of "usernames" as well as "ids". This is quite redundant sometimes so my question is:
Is a lookup via "_id" faster than a lookup for a field "username" (let's say a 10 character string)? If they are the same, is it viable to use my unique identifier username for the value of _id?
If your data naturally has a required, unique field, then it's perfectly fine to use that value as your _id.
As long as the field's data is comparable in size to an ObjectId (which is 12 bytes), then performance should be the same. A 10 character string is 20 bytes, so the index for username will take a bit more memory, but probably not enough to make a difference performance-wise.
Since you're using Mongoose, you could also create a virtual field (named username) that exposes the _id field using that more descriptive name, as well.
I think this is fine, UNLESS you will be changing the structure of your usernames in the future. Thus I think it's better to just stick with ObjectId() for the ID and then stick an extra field username if you need it.

mongoDB - URL as document ID

Considering I want to create mongoDB documents for a bunch of distinct URLs: what would be the pros and cons (if any) of using the actual URL as the documents _id value instead of the default BSON ObjectId. Thanks in advance!
Cheers,
Greg
An overview of the subject here: http://www.mongodb.org/display/DOCS/Object+IDs
It has to be unique, you could potentially put yourself in the position of having to resolve collisions yourself. Better to leave the default _id alone and simply query against a field you're storing in the document, just how God (10gen) intended.
From http://www.mongodb.org/display/DOCS/BSON
The element name "_id" is reserved for use as a primary key id, but
you can store anything that is unique in that field. The database
expects that drivers will prevent users from creating documents that
violate these constraints.
From #mongodb
stupid _id values will probably make querying slow, but that's about it
And another user from #mongodb
Tell him the collisions will result in garbage data