How to reduce the length of an existing mongoID? [duplicate] - mongodb

This question already has answers here:
Shorten MongoDB ID in javascript
(2 answers)
Mongo ids leads to scary URLs
(4 answers)
Closed 4 years ago.
An ObjectId in MongoDB is a 12-byte BSON type. In the 12-byte
structure, the first 4 bytes of the ObjectId represent the time in
seconds since the UNIX epoch. The next 3 bytes of the ObjectId
represent the machine identifier. The next 2 bytes of the ObjectId
represent the process ID. And the last 3 bytes of the ObjectId
represent a random counter value.
These 12 bytes altogether uniquely identifies a document within the
MongoDB collection and serves as a primary key for that document.
ObjectId is the default value of _id field of each document and its
complexity helps to fetch a unique _id field for a particular document
in the MongoDB.
A client want to use the same identifier but for some reason they can't use the 5813eed6e6893b80c9ae5bba 24 long identifier, so in a moment of desperation the tec lead sugest cut off the 4 first digit
5813 -- eed6e6893b80c9ae5bba, but I dont know how a good idea is this and what is the probability of collision if this idea is apply, we are assuming the 4 first digit are corresponding to "the random counter value part" as expressed in the docs.
So my question is which is the probability of collision of this shorted id eed6e6893b80c9ae5bba ? if it is a bad idea, how to convert the id into a equivalent of 20 characters long ?.

Related

Why does mongodb has ObjectID as string?

As a general understanding, numeric operations are faster than string operations. It is faster to compute 2 < 5 than "hello" < "world".
Also, since the length of the _id is 12. If it is stored as a string where a character can take 2 bytes and thus the entire string would take 24 bytes. However a number of 8 bytes can very easily store this value of 12 digits. Basically, a number type would take lesser space.
So with these logics, why does mongodb store _id as string type ObjectIDs and not numeric ObjectID?
Your understanding is incorrect.
It is much better than Strings. It just takes 12 bytes in memory as mentioned in the documentation
Java implementation for the same.
Refer toHexString in the github link, that's what returned to you when you ask for String. It actually takes 24bytes whereas ObjectId takes only 12bytes.

inserting into a mongodb collection with manually specified _id

Suppose I have a collection in mongodb and the objects in the collection have an _id that is an ObjectID that I selected in some random manner completely external to mongoDB, such as starting with ObjectID 0000 ... 0000 and incrementing by 10000, or maybe just used a random number generator to make the ObjectID's.
Suppose I then go to add another item to the collection, but I don't have an ObjectID in mind for the new object, and am satisfied with letting the system pick one. Would the system ever select a ObjectID that was already a part of the collection?
If it is relevant, I am using the java API and the python API to do this.
_id is always unique. There's an implicit unique primary index on _id field.
So rest assured that MongoDB will not choose an _id field that has been taken.
Also the _id field is a 12 byte hexadecimal string in which :
4 bytes is for the date timestamp.
3 bytes is the MAC Address.
2 bytes is the process id.
3 bytes is the counter.
So if MongoDB chooses an _id for your document in a collection, it's definitely going to be unique from the other _id fields in other documents in your collection.
Hope this helps.

Is MongoDB _id (ObjectId) generated in an ascending order?

I know how the _id column contains a representation of timestamp when the document has been inserted into the collection. here is an online utility to convert it to timestamp: http://steveridout.github.io/mongo-object-time/
What I'm wondering is if the object id string itself is guaranteed maintain the ascending order or not? i.e. does this comparison always return true?
"newest object id" > "second newest object id"
No, there is no guarantee whatsoever. From the official documentation (at the time of the original answer):
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
And from the latest docs
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
For mongo version >= 3.4, the Objectid generation is changed a little.
Its structs are:
a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value, and
a 3-byte counter, starting with a random value.
So the first 4 bytes are still the seconds since the Unix epoch, it is still almost ascending but not strictly.
https://docs.mongodb.com/manual/reference/bson-types/#objectid
_id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3 bytes incrementer)
This is the id structure. So only last 3 bytes will increment uniquely. So the answer of your question is yes.

Collection ID length in MongoDB

i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?
Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.
Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc

Difference between "id" and "_id" fields in MongoDB

Is there any difference between using the field ID or _ID from a MongoDB document?
I am asking this, because I usually use "_id", however I saw this sort({id:-1}) in the documentation: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Sortbyidtosortbyinsertiontime
EDIT
Turns out the docs were wrong.
I expect it's just a typo in the documentation. The _id field is primary key for every document. It's called _id and is also accessible via id. Attempting to use an id key may result in a illegal ObjectId format error.
That section is just indicating that the automatically generated ObjectIDs start with a timestamp so it's possible to sort your documents automatically. This is pretty cool since the _id is automatically indexed in every collection. See http://www.mongodb.org/display/DOCS/Object+IDs for more information. Specifically under "BSON ObjectID Specification".
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON.
The _id field is the default field for Bson ObjectId's and it is,by default, indexed.
_id and id are not the same. You may also choose to add a field called id if you want, but it will not be index unless you add an index.
It is just a typo in the docs.
id is an alias for _id in mongoid.id would return the _id of the document.
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/fields.rb#L47
if the _id field is not specified an ObjectedId is generated automatically.
My two cents:
The _id field
MongoDB assigns an _id field to each document and assigns primary index on it. There're ways by which we can apply secondary indices as well. By default, MongoDB creates values for the _id field of type ObjectID. This value is defined in BSON spec and it's structured this way:
ObjectID (12 bytes HEX string) = Date (4 bytes, a timestamp value representing number of seconds since the Unix epoch) + MAC address (3 bytes) + PID (2 bytes) + Counter (3 bytes)