Collection ID length in MongoDB - mongodb

i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?

Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.

Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc

Related

inserting into a mongodb collection with manually specified _id

Suppose I have a collection in mongodb and the objects in the collection have an _id that is an ObjectID that I selected in some random manner completely external to mongoDB, such as starting with ObjectID 0000 ... 0000 and incrementing by 10000, or maybe just used a random number generator to make the ObjectID's.
Suppose I then go to add another item to the collection, but I don't have an ObjectID in mind for the new object, and am satisfied with letting the system pick one. Would the system ever select a ObjectID that was already a part of the collection?
If it is relevant, I am using the java API and the python API to do this.
_id is always unique. There's an implicit unique primary index on _id field.
So rest assured that MongoDB will not choose an _id field that has been taken.
Also the _id field is a 12 byte hexadecimal string in which :
4 bytes is for the date timestamp.
3 bytes is the MAC Address.
2 bytes is the process id.
3 bytes is the counter.
So if MongoDB chooses an _id for your document in a collection, it's definitely going to be unique from the other _id fields in other documents in your collection.
Hope this helps.

Is MongoDB _id (ObjectId) generated in an ascending order?

I know how the _id column contains a representation of timestamp when the document has been inserted into the collection. here is an online utility to convert it to timestamp: http://steveridout.github.io/mongo-object-time/
What I'm wondering is if the object id string itself is guaranteed maintain the ascending order or not? i.e. does this comparison always return true?
"newest object id" > "second newest object id"
No, there is no guarantee whatsoever. From the official documentation (at the time of the original answer):
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
And from the latest docs
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
For mongo version >= 3.4, the Objectid generation is changed a little.
Its structs are:
a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value, and
a 3-byte counter, starting with a random value.
So the first 4 bytes are still the seconds since the Unix epoch, it is still almost ascending but not strictly.
https://docs.mongodb.com/manual/reference/bson-types/#objectid
_id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3 bytes incrementer)
This is the id structure. So only last 3 bytes will increment uniquely. So the answer of your question is yes.

Difference between "id" and "_id" fields in MongoDB

Is there any difference between using the field ID or _ID from a MongoDB document?
I am asking this, because I usually use "_id", however I saw this sort({id:-1}) in the documentation: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Sortbyidtosortbyinsertiontime
EDIT
Turns out the docs were wrong.
I expect it's just a typo in the documentation. The _id field is primary key for every document. It's called _id and is also accessible via id. Attempting to use an id key may result in a illegal ObjectId format error.
That section is just indicating that the automatically generated ObjectIDs start with a timestamp so it's possible to sort your documents automatically. This is pretty cool since the _id is automatically indexed in every collection. See http://www.mongodb.org/display/DOCS/Object+IDs for more information. Specifically under "BSON ObjectID Specification".
A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON.
The _id field is the default field for Bson ObjectId's and it is,by default, indexed.
_id and id are not the same. You may also choose to add a field called id if you want, but it will not be index unless you add an index.
It is just a typo in the docs.
id is an alias for _id in mongoid.id would return the _id of the document.
https://github.com/mongodb/mongoid/blob/master/lib/mongoid/fields.rb#L47
if the _id field is not specified an ObjectedId is generated automatically.
My two cents:
The _id field
MongoDB assigns an _id field to each document and assigns primary index on it. There're ways by which we can apply secondary indices as well. By default, MongoDB creates values for the _id field of type ObjectID. This value is defined in BSON spec and it's structured this way:
ObjectID (12 bytes HEX string) = Date (4 bytes, a timestamp value representing number of seconds since the Unix epoch) + MAC address (3 bytes) + PID (2 bytes) + Counter (3 bytes)

Why mongoDB uses objectID?

{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
what exactly is the purpose of objectId? It's a big number that is generated using a timestamp.
If I see any nosql which is key-value, I query with key the value.
Here we use key and value in the as the data and use find () function.
So, I am trying to understand when we really need the objectid?
What are the reasons behind that giving access to the user to view the value of the object ID?
After reading the docs, one basic question is mongo DB as hash table type implementation?
After readying doc..one basic question is mongo DB as hash table type implementation?
MongoDB used BSON, a binary form of JSON. A JSON object is basically just a "hashtable" or a set of key / value pairs.
what exactly is the use of object id? that is a big number that is generated with time.
In MongoDB, each document you store must have an _id. If you do not set a value for _id, then MongoDB will automatically generate one for you. If you have a unique key when you are inserting the object, you can use that instead. For details on the ObjectId see here.
If I see any nosql which is key-value, I query with key the value.
MongoDB is not just key-value. MongoDB supports multiple indexes on a single collection, you can query on many different fields, not just the "key" or "id".
Object ID is similar to primary key in RDBMS
Whenever u insert a new document, mongodb will generate object ID.
Object ID is a 12 byte BSON Type.
First 4 Byte represents timestamp
next 3 byte unique machine identifier
next 2 byte process id
next 3 byte random increment counter
Returns the equivalent 16 digit hex

How are MongoDB's ObjectIds generated?

Are they somewhat random?
I mean....would people be able to break them apart?
They are not random and can be easily predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs
Heres a javascript implementation of the MongoDB ObjectID (http://jsfiddle.net/icodeforlove/rN3zb/)
function ObjectIdDetails (id) {
return {
seconds: parseInt(id.slice(0, 8), 16),
machineIdentifier: parseInt(id.slice(8, 14), 16),
processId: parseInt(id.slice(14, 18), 16),
counter: parseInt(id.slice(18, 24), 16)
};
}
So if you have enough of them they leak quite a bit of information about your infrastructure. And you also know the object creation dates for everything.
IE: how many servers do you have, and how many processes each server is running.
Generation
They are usually generated on the client side by the driver itself. For example, in ruby, BSON::ObjectID can be used:
https://github.com/mongodb/bson-ruby/blob/master/lib/bson/object_id.rb#L369
You can also generate your own ObjectIds. This is particularly useful if you want to use business identifiers.
Breakability
When using driver generated ObjectIds, is low
When using own business Id, is slightly higher depending on their predictability (login, consecutives identifiers...)
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
ObjectID is a 96-bit number which is composed as follows:
a 4-byte value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
a 3-byte machine identifier (usually derived from the MAC address),
a 2-byte process id, and
a 3-byte counter, starting with a random value.
From the MongoDB Official Document links
it shows :
ObjectId
ObjectIds are small, likely unique, fast to generate, and
ordered. ObjectId values consist of 12 bytes, where the first four
bytes are a timestamp that reflect the ObjectId’s creation.
Specifically:
a 4-byte value representing the seconds since the Unix epoch,
a
5-byte random value, and
a 3-byte counter, starting with a random
value.
In MongoDB, each document stored in a collection requires a
unique _id field that acts as a primary key. If an inserted document
omits the _id field, the MongoDB driver automatically generates an
ObjectId for the _id field.
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
Total 12 bytes:
4-byte timestamp value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
5-byte random value, and
3-byte incrementing counter, starting with a random value.
Example from mongo-go-driver:
var objectId [12]byte
// 4 bytes unix time-stamp second (big endian)
binary.BigEndian.PutUint32(objectId[0:4], uint32(timestamp.Unix()))
// global random number generated by driver
copy(objectId[4:9], processUnique[:])
// global counter by driver
putUint24(objectId[9:12], atomic.AddUint32(&objectIDCounter, 1))