Can a MongoDB objectid contain only numbers? - mongodb

Can a MongoDB objectid contain only numbers?
There is a piece of code in a library that i'm using that does the following:
if (is_int($mixed) || ctype_digit($mixed)) {
return;
}
And as a result it is throwing away a record in my DB with an ObjectId of '512417805115179054000022' because it only contains numbers.
Every other record has an ObjectId containing at least one alpha char such as '51241740511517a25b000017'
Is this a problem with the function that it assumes an ObjectId can never contain only numbers when it is in fact possible, or is it some lower level issue with a driver or something that has created an ObjectId with only numeric chars in error?

The regular expression used inside MongoDB to validate object ID's is /^[0-9a-fA-F]{24}$/
That means, an object ID can contain both numbers from 0-9 and uppercase or lowercase letters from a-f. And the ID will always have 24 characters. So an object ID with only numbers is as much possible as an object ID with only letters.

The ObjectId is BSON and you are seeing the HEXADecimal representation. So yes it is correct to assume that some IDs will not have letters.

Related

MongoDB: Handling integers too large for Int64

I'm having a problem handling integers with Mongodb that are too large for the standard Int64 format. To give an example of the sorts of numbers we're talking about here: I mean something like 10683171315666225459389678813960790592
Now, I've considered turning this into a Decimal128 and that seems to work. The problem I run into is that I think MongoDB is truncating these values when I convert them. I'm using:
db.NuAIS.find().forEach( function (x) { x.HilbertVal = parseFloat(x.HilbertVal); db.NuAIS.save(x); });
So I read the db, convert each of those string values to Decimal and get: 1.0683171315666225e+37. My assumption was that MongoDB was storing the actual value despite what it declared.
Here's the thing though: these values seem truncated since, when I try to search using the original string on MongoCompass, I get multiple values returned for a supposedly unique string. I assume that mongoDB has truncated the decimals after X strings and so is just returning ones that match the first X digits of the search string, which means you get multiple matches instead of one.
Is my assumption here right? Is there a way to avoid this with decimals or just to skip the whole problem altogether by having ints larger than 64 bits? I'd leave it as a string but I need to work with them as digits, find things that fall within a range and such.
Thank you

What is best representation for mongo _id field in postgresql?

Mongodb _id field is defined as:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
what would be most efficient representation of this field in postgresql?
I've used char(24) with a constraint CHECK decode(mongo_id::text, 'hex'::text) > '\x30'::bytea. While this constraint doesn't check the sanity of the ObjectId, it allows only valid format to be stored. This stores the ObjectId in plain text, which keeps the values easily readable.
Other option could be to use bytea type for the column, and input the data as "\xOBJECT_ID" where \x transforms text form of OBJECT_ID to a byte array. This consumes less space than char(24) (might be relevant if you have millions of rows), but accessing the values in a non-binary format requires using eg. encode(mongo_id::bytea, 'hex') (might be burdensome).
Also some platforms such as RedShift might have problems with the bytea data type.
If you need an easy access to the metadata in the ObjectId, you could parse and store it separately (eg. in a jsonb column or a separate column for each relevant attribute). Possibly the "created at" part of the metadata is the only interesting attribute.

Collection ID length in MongoDB

i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?
Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.
Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc

Referencing Other Documents by String rather than ObjectId

Let's say I have two collections:
Products and Categories.
The latter collection's documents have 2 fields:
_id (BSON ObjectId)
Name (String)
The latter collection's documents have 3 fields:
_id (BSON ObjectId)
Name (String)
Products (Array of Strings)
Assume I have the following Product document:
{ "_id" : ObjectId("AAA"), "name" : "Shovel" }
Let's say I have the following Category document:
{ "_id" : ObjectId("BBB"), "Name" : "Gardening", "Products" : ["AAA"] }
For purposes of this example, assume that AAA and BBB are legitimate ObjectId's - example: ObjectId("523c7df5c30cc960b235ddee") where they would equal the inner ObjectId's string.
Should the Products field be stored as ObjectId(...)'s rather than as Strings?
I don't think it really matters that much.
I'm pretty sure that the ObjectId format encodes a hex number, so it is probably slightly more efficient with memory and bandwidth. I have done it both ways. As long as you decide, for each field, how you are going to encode it, either will work just fine.
As long as you consistently use the same type (so that comparisons happen correctly), the difference is:
An ObjectId cannot be compared to a String representation of the same ObjectId value. Thus, ObjectId("523c7df5c30cc960b235ddee") is not equal to "523c7df5c30cc960b235ddee".
ObjectIds, when stored natively, will be stored as 12 bytes, plus field name
An ObjectId, when stored as a string, will be commonly stored in 24 bytes (as it will be converted to a hexadecimal number), plus field name
Comparisons can be made more SLIGHTLY more efficiently with the 12 byte number, as it's comparing fewer bytes. It won't matter in most types of usage though, so it's a micro-optimization (but something you should know)
Bonus -- if you don't use short abbreviated field names, the size benefit of using an ObjectId natively as 12 bytes really won't matter, as the field names will far outweigh the size of bytes when stored as a string.
I'd recommend storing them as native ObjectIds. Some drivers can optionally and transparently translate to an ObjectId to a String and back so that the client code can more easily manipulate it. The C# driver for example can do this, and I've used it so that when serializing to JSON, the ObjectId is in a simple format that is easily consumed in JavaScript.
This will matter most when you try to find the details of a product starting from the Categories collection.
Since there are no server side JOIN in Mongo, your code will have to match documents together. ObjectIDs are encoded as 12 bytes, which you can easilly compare in any language. Using either strings or object ids does not really matter.
The real issue you are facing is one of data normalization (or lack thereof). If you store the Name field in your Categories documents, instead of the ObjectID, you will be able to return the products names in a single call (instead of multiple calls, 1 for each products of the category).
It feels wrong the first time you do it. After all, you will have to update many documents if you ever change the name of a product, which might or might not be frequent. You have to model your data by thinking of the way your application will use it.
Finally, index the Name attribute in the Prodcuts collection. Getting the details of a product, starting with the string you found in a Categories document will be fast.
Another way to do it is to not to have a Categories collection at all, but to add a Category attribute to your Products document. You can find documents that have the {'Category':'Gardening'}. Indexing the Category field will probably be a good idea.
Again, ObjectID or String does not matter much. It is about modeling your data thinking of how your application will use it.

What's MongoDB hash's size?

I need to know what is size of the hash of MongoDB. Can't find it on wikipedia or official site.
MongoDB uses 12-byte binary value (an ObjectId) -- it can be converted to 24-byte hex string.
An ObjectId, the default value for the _id field, is a 12-byte value; it is not a hash nor a string -- it is stored as binary value. Many drivers will show it as a hex string, so it can be easily printed.
It is comprised of a timestamp (in secs), a host id, process id and counter; this means that it is increasing over time of creation, and encodes the time of creation (insertion).
http://www.mongodb.org/display/DOCS/Object+IDs
Most drivers have helper methods to convert to and from the hex string representation, as well as creating one based on just the parts you are interested in -- i.e. a timestamp you might use for a range query. You can also easily extract the timestamp portion.