I have read through MongoDB manual but still couldn't find what I need.
Is it only 24 alphabet letters and 0123456789 are involved into autogenerated objectId or "id", Is there a chance that it will generate something like "jkfdfak-123kjsd?" and which exactly symbols are not used.
By default, ObjectId is a 12-byte BSON type, constructed using this data:
4-byte value representing the seconds since the Unix epoch
3-byte machine identifier
2-byte process id
3-byte counter, starting with a random value.
And the string representation is in hexadecimal.
If you want create your own ObjectId you must provide unique hexadecimal (0[xX][0-9a-fA-F]+) string.
Related
Mongodb _id field is defined as:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
what would be most efficient representation of this field in postgresql?
I've used char(24) with a constraint CHECK decode(mongo_id::text, 'hex'::text) > '\x30'::bytea. While this constraint doesn't check the sanity of the ObjectId, it allows only valid format to be stored. This stores the ObjectId in plain text, which keeps the values easily readable.
Other option could be to use bytea type for the column, and input the data as "\xOBJECT_ID" where \x transforms text form of OBJECT_ID to a byte array. This consumes less space than char(24) (might be relevant if you have millions of rows), but accessing the values in a non-binary format requires using eg. encode(mongo_id::bytea, 'hex') (might be burdensome).
Also some platforms such as RedShift might have problems with the bytea data type.
If you need an easy access to the metadata in the ObjectId, you could parse and store it separately (eg. in a jsonb column or a separate column for each relevant attribute). Possibly the "created at" part of the metadata is the only interesting attribute.
i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?
Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.
Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc
Can a MongoDB objectid contain only numbers?
There is a piece of code in a library that i'm using that does the following:
if (is_int($mixed) || ctype_digit($mixed)) {
return;
}
And as a result it is throwing away a record in my DB with an ObjectId of '512417805115179054000022' because it only contains numbers.
Every other record has an ObjectId containing at least one alpha char such as '51241740511517a25b000017'
Is this a problem with the function that it assumes an ObjectId can never contain only numbers when it is in fact possible, or is it some lower level issue with a driver or something that has created an ObjectId with only numeric chars in error?
The regular expression used inside MongoDB to validate object ID's is /^[0-9a-fA-F]{24}$/
That means, an object ID can contain both numbers from 0-9 and uppercase or lowercase letters from a-f. And the ID will always have 24 characters. So an object ID with only numbers is as much possible as an object ID with only letters.
The ObjectId is BSON and you are seeing the HEXADecimal representation. So yes it is correct to assume that some IDs will not have letters.
Are they somewhat random?
I mean....would people be able to break them apart?
They are not random and can be easily predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs
Heres a javascript implementation of the MongoDB ObjectID (http://jsfiddle.net/icodeforlove/rN3zb/)
function ObjectIdDetails (id) {
return {
seconds: parseInt(id.slice(0, 8), 16),
machineIdentifier: parseInt(id.slice(8, 14), 16),
processId: parseInt(id.slice(14, 18), 16),
counter: parseInt(id.slice(18, 24), 16)
};
}
So if you have enough of them they leak quite a bit of information about your infrastructure. And you also know the object creation dates for everything.
IE: how many servers do you have, and how many processes each server is running.
Generation
They are usually generated on the client side by the driver itself. For example, in ruby, BSON::ObjectID can be used:
https://github.com/mongodb/bson-ruby/blob/master/lib/bson/object_id.rb#L369
You can also generate your own ObjectIds. This is particularly useful if you want to use business identifiers.
Breakability
When using driver generated ObjectIds, is low
When using own business Id, is slightly higher depending on their predictability (login, consecutives identifiers...)
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
ObjectID is a 96-bit number which is composed as follows:
a 4-byte value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
a 3-byte machine identifier (usually derived from the MAC address),
a 2-byte process id, and
a 3-byte counter, starting with a random value.
From the MongoDB Official Document links
it shows :
ObjectId
ObjectIds are small, likely unique, fast to generate, and
ordered. ObjectId values consist of 12 bytes, where the first four
bytes are a timestamp that reflect the ObjectId’s creation.
Specifically:
a 4-byte value representing the seconds since the Unix epoch,
a
5-byte random value, and
a 3-byte counter, starting with a random
value.
In MongoDB, each document stored in a collection requires a
unique _id field that acts as a primary key. If an inserted document
omits the _id field, the MongoDB driver automatically generates an
ObjectId for the _id field.
MongoDB database drivers by default generate an ObjectID identifier that is assigned to the _id field of each document. In many cases the ObjectID may be used as a unique identifier in an application.
Total 12 bytes:
4-byte timestamp value representing the seconds since the Unix epoch (which will not run out of seconds until the year 2106)
5-byte random value, and
3-byte incrementing counter, starting with a random value.
Example from mongo-go-driver:
var objectId [12]byte
// 4 bytes unix time-stamp second (big endian)
binary.BigEndian.PutUint32(objectId[0:4], uint32(timestamp.Unix()))
// global random number generated by driver
copy(objectId[4:9], processUnique[:])
// global counter by driver
putUint24(objectId[9:12], atomic.AddUint32(&objectIDCounter, 1))
I need to know what is size of the hash of MongoDB. Can't find it on wikipedia or official site.
MongoDB uses 12-byte binary value (an ObjectId) -- it can be converted to 24-byte hex string.
An ObjectId, the default value for the _id field, is a 12-byte value; it is not a hash nor a string -- it is stored as binary value. Many drivers will show it as a hex string, so it can be easily printed.
It is comprised of a timestamp (in secs), a host id, process id and counter; this means that it is increasing over time of creation, and encodes the time of creation (insertion).
http://www.mongodb.org/display/DOCS/Object+IDs
Most drivers have helper methods to convert to and from the hex string representation, as well as creating one based on just the parts you are interested in -- i.e. a timestamp you might use for a range query. You can also easily extract the timestamp portion.