I know how the _id column contains a representation of timestamp when the document has been inserted into the collection. here is an online utility to convert it to timestamp: http://steveridout.github.io/mongo-object-time/
What I'm wondering is if the object id string itself is guaranteed maintain the ascending order or not? i.e. does this comparison always return true?
"newest object id" > "second newest object id"
No, there is no guarantee whatsoever. From the official documentation (at the time of the original answer):
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
And from the latest docs
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
Are generated by clients, which may have differing system clocks.
For mongo version >= 3.4, the Objectid generation is changed a little.
Its structs are:
a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value, and
a 3-byte counter, starting with a random value.
So the first 4 bytes are still the seconds since the Unix epoch, it is still almost ascending but not strictly.
https://docs.mongodb.com/manual/reference/bson-types/#objectid
_id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3 bytes incrementer)
This is the id structure. So only last 3 bytes will increment uniquely. So the answer of your question is yes.
Related
I've been reading about MongoDB using timestamps of object's creation to create ids. Is it valid to simply compare these and find out which object's been created earlier?
You can compare ObjectIDs with the .equals(). See the documentation.
ObjectId is a hexadecimal string which represents a 12-byte number.
a 4-byte timestamp value, representing the ObjectId's creation,
measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Since the time stamp is the most significant part of an ObjectId, yes you can.
Selecting the most significant four bytes of the ObjectId as the time stamp.
Also see ObjectId.getTimestamp() documentation.
generally, it is possible to compare Objects' creation by ObjectId: for more info, refer this link.
-- citing this link: https://steveridout.github.io/
Why generate an ObjectId from a timestamp?
To query documents by creation date.
e.g. to find all comments created after 2013-11-01:
db.comments.find({_id: {$gt: ObjectId("5272e0f00000000000000000")}})
-- another helpful and explanatory link:
uses for mongodb ObjectId creation time
best regards
When performing update/find only by _id should I specify $limit 1 or mongo already implicitly know that there will be only one record with specified id?
yes there will always be unique _id in every document of a collection. An _id is made from following and therefore it is always unique and you will only find 1 document corresponding to one _id
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?
Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.
Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc
I have a mongo collection with about 600k documents. I'm enumerating the collection, sorted by _id. However, documents are not returned in that sorted order. They seem to be sorted correctly according to the timestamp portion of the ObjectId, but not according to the pid field.
This is the c# code I use to repro this:
var cursor = m_collection.FindAll().SetSortOrder(SortBy.Ascending("_id"));
ObjectId previous = ObjectId.Empty;
foreach (var document in cursor)
{
var id = document[IdField].AsObjectId;
Throw.Assert(id > previous, "Sort order is invalid!");
previous = id;
}
At some point, the assert is triggered. I can see that the new id has the same timestamp as the previous one, but a lower pid.
I would have expected that sorting using {"_id":1} would sort using ALL the components of the ObjectIds, not just timestamp.
Does the server use a different comparison algorithm for ObjectIds than the C# client's ObjectId.CompareTo?
The source for the MongoDB C# CompareTo is currently here. The code compares each element of the ObjectId all the way to 3 byte counter. Given the nature of the ObjectId containing:
a 4-byte value representing the seconds since the Unix epoch
a 3-byte machine identifier
a 2-byte process id
and a 3-byte counter, starting with a random value
it doesn't make sense to sort beyond the time stamp. The CompareTo while it is accurate and will produce consistent results, may not sort in a manner that matches with your expectations.
Given that there will be instances where two objects were created at the same timestamp (4 byte value), you'll have some variance then in the results given the way that the CompareTo in C# works. So, performing assertion you made will result in some confusing results and should not be used as a way of detecting an out of sequence result.
Most drivers create the _id/ObjectId value when it's not present (including the C# driver). There's really not much you could sort on besides the timestamp.
You could do:
Throw.Assert(id.Timestamp > previous.Timestamp, "Sort order is invalid!");
Are they somewhat random?
I mean....would people be able to break them apart?
They are not random and can be easily predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs
Heres a javascript implementation of the MongoDB ObjectID (http://jsfiddle.net/icodeforlove/rN3zb/)
function ObjectIdDetails (id) {
return {
seconds: parseInt(id.slice(0, 8), 16),
machineIdentifier: parseInt(id.slice(8, 14), 16),
processId: parseInt(id.slice(14, 18), 16),
counter: parseInt(id.slice(18, 24), 16)
};
}
So if you have enough of them they leak quite a bit of information about your infrastructure. And you also know the object creation dates for everything.
IE: how many servers do you have, and how many processes each server is running.
Generation
They are usually generated on the client side by the driver itself. For example, in ruby, BSON::ObjectID can be used:
https://github.com/mongodb/bson-ruby/blob/master/lib/bson/object_id.rb#L369
You can also generate your own ObjectIds. This is particularly useful if you want to use business identifiers.
Breakability
When using driver generated ObjectIds, is low
When using own business Id, is slightly higher depending on their predictability (login, consecutives identifiers...)
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
ObjectID is a 96-bit number which is composed as follows:
a 4-byte value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
a 3-byte machine identifier (usually derived from the MAC address),
a 2-byte process id, and
a 3-byte counter, starting with a random value.
From the MongoDB Official Document links
it shows :
ObjectId
ObjectIds are small, likely unique, fast to generate, and
ordered. ObjectId values consist of 12 bytes, where the first four
bytes are a timestamp that reflect the ObjectId’s creation.
Specifically:
a 4-byte value representing the seconds since the Unix epoch,
a
5-byte random value, and
a 3-byte counter, starting with a random
value.
In MongoDB, each document stored in a collection requires a
unique _id field that acts as a primary key. If an inserted document
omits the _id field, the MongoDB driver automatically generates an
ObjectId for the _id field.
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
Total 12 bytes:
4-byte timestamp value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
5-byte random value, and
3-byte incrementing counter, starting with a random value.
Example from mongo-go-driver:
var objectId [12]byte
// 4 bytes unix time-stamp second (big endian)
binary.BigEndian.PutUint32(objectId[0:4], uint32(timestamp.Unix()))
// global random number generated by driver
copy(objectId[4:9], processUnique[:])
// global counter by driver
putUint24(objectId[9:12], atomic.AddUint32(&objectIDCounter, 1))