I want to shard a collection by their "foreign key" (userID) and not by their id field. I only need that the combination of userID and id is unique. But I am not sure if that is ok with mongodb.
Warning In any sharded collection where you are not sharding by the
_id field, you must ensure uniqueness of the _id field. The best way to ensure _id is always unique is to use ObjectId, or another
universally unique identifier (UUID.)
This is taken from: http://docs.mongodb.org/manual/tutorial/enforce-unique-keys-for-sharded-collections/#enforce-unique-keys-for-sharded-collections
Do I have to ensure that _id is unique? Or is it good enough if I always query by both userID and _id?
Unless you manually replace them, the auto-generated _id's are UUID's which, according to the documentation, consist of "a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter".
As you can see, an unique machine-ID is part of the UUID. That ensures that no two machines in the shard ever create the same UUID independently (unless they have the same machine-id - the likeliness for that is 1:16777215 and when it happens it can be easily verified). The only situation where you could theoretically have a duplicated UUID is when a single process creates more than 2^24 (over 16 million) UUIDs in a single second.
tl;dr: You don't have to worry about duplicate UUIDs - they are, as the documentation puts it, "designed to have a reasonably high probability of being unique when allocated".
Related
is it somehow possible, to define one compound key, consisting of two mongoDB objectID's or numeric types, so to make one key out of it?
This is necessary, because I have lots of participants creating documents which they save into one big collection together, so I cannot be sure, that the MongoDB Object ID for each document is distinct. So I wanted to add some additional key, maybe one userID's number or email or something similar...
maybe 2 ObjectID's
ObjectId in MongoDb is hexadecimal value.
ObjectId() Returns a new ObjectId value. The 12-byte
ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
https://docs.mongodb.com/manual/reference/method/ObjectId/
Hence, the object Id will be uniquely auto-generated when you insert a document.
However, you can make a custom combination of hexadecimal value of length 24, when you insert a document.
For example,
1DCD6500 -- this can be custom hex identifier
A98AC7 -- another custom hex identifier
2B67 -- another custom hex identifier
A981CE -- Incremental custom hex identifier
Now if you try to insert a document with _id as 1DCD6500A98AC72B67A981CE. The document will be saved.
e.g. { "_id" : ObjectId("1DCD6500A98AC72B67A981CE"), "name" : "sample", "personid" : 39 }
So based on definition of the ObjectId you can make custom ObjectId.
But in that case you will be responsible to make sure ObjectId is unique, otherwise the mongodb will throw error
"E11000 duplicate key error collection:
You can use anything for your _id field. So this is possible:
db.collection.insertOne({
_id: {
"first": new ObjectId(),
"second": new ObjectId(),
}
})
The default unique index on the _id field also guarantees uniqueness on this kind of field.
However, I would doubt that this is a good solution to your problem as it would probably just defer the underlying problem (which really doesn't exist - kindly see this answer, too: How to generate unique object id in mongodb). Instead, I would suggest you have your clients create documents without specifying an _id explicitly and let MongoDB create the _id (on the server side or on the client side depending on your driver and your settings where client-side generation should be preferred). This will guarantee uniqueness (even when you do sharding).
There always is a unique index on your _id field anyway so to be on the super safe side with respect to run-time behaviour you could put a retrying exception handler in place on the client side for the (pretty much impossible) case that you end up with two identical _ids and hence an exception.
Also see this answer: Mongodb - must _id be globally unique when sharding
Say I have a collection that records correlation values between brands (nevermind how such a correlation would be generated or interpreted.) Then the fields in this collection would include: 'brand1', 'brand2', and 'correlation'.
For the sake of an example, let's say that brands can take on string values such as "google", "microsoft", etc., so that each document records the correlation between various brand names.
I would want to create a unique index on the 'brand1' and 'brand2' fields so that each document records the correlation between a pair of brands only once in the collection. In order to do this, the ordering of the key in the index must be taken into account when determining uniqueness in the collection. A key of ['google', 'microsoft'] should be considered the same as a key of ['microsoft', 'google'], so that if a document already exists with the former key, an insertion of a document with the latter key would be prohibited.
Is this kind of index possible?
There is no way to enforce that kind of constraint on a MongoDB collection.
What you can do, however, is enforce a constraint in your software that the two components of this index are always stored in sorted order. (For instance, always store ["a","b"], not ["b","a"].) This makes it so that there's only one "canonical" version of any pair in the collection.
i am new to mongodb and stack overflow.
I want to know why on mongodb collection ID is of 24 hex characters?
what is importance of that?
Why is the default _id a 24 character hex string?
The default unique identifier generated as the primary key (_id) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
The 12 bytes of an ObjectId are constructed using:
a 4 byte value representing the seconds since the Unix epoch
a 3 byte machine identifier
a 2 byte process id
a 3 byte counter (starting with a random value)
What is the importance of an ObjectId?
ObjectIds (or similar identifiers generated according to a GUID formula) allow unique identifiers to be independently generated in a distributed system.
The ability to independently generate a unique ID becomes very important as you scale up to multiple application servers (or perhaps multiple database nodes in a sharded cluster). You do not want to have a central coordination bottleneck like a sequence counter (eg. as you might have for an auto-incrementing primary key), and you will want to insert new documents without risk that a new identifier will turn out to be a duplicate.
An ObjectId is typically generated by your MongoDB client driver, but can also be generated on the MongoDB server if your client driver or application code or haven't already added an _id field.
Do I have to use the default ObjectId?
No. If you have a more suitable unique identifier to use, you can always provide your own value for _id. This can either be a single value or a composite value using multiple fields.
The main constraints on _id values are that they have to be unique for a collection and you cannot update or remove the _id for an existing document.
Now mongoDB current version is 4.2. ObjectId size is still 12 bytes but consist of 3 parts.
ObjectIds are small, likely unique, fast to generate, and ordered.
ObjectId values are 12 bytes in length, consisting of:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
Create ObjectId and get timestamp from it
> x = ObjectId()
ObjectId("5fdedb7c25ab1352eef88f60")
> x.getTimestamp()
ISODate("2020-12-20T05:05:00Z")
Reference
Read MongoDB official doc
I have a situation in which I have a User schema that contains a unique field called "username." At the same time, mongo automatically creates its own unique key, "_id."
I've noticed that for a lot of my schemas I need both an array of "usernames" as well as "ids". This is quite redundant sometimes so my question is:
Is a lookup via "_id" faster than a lookup for a field "username" (let's say a 10 character string)? If they are the same, is it viable to use my unique identifier username for the value of _id?
If your data naturally has a required, unique field, then it's perfectly fine to use that value as your _id.
As long as the field's data is comparable in size to an ObjectId (which is 12 bytes), then performance should be the same. A 10 character string is 20 bytes, so the index for username will take a bit more memory, but probably not enough to make a difference performance-wise.
Since you're using Mongoose, you could also create a virtual field (named username) that exposes the _id field using that more descriptive name, as well.
I think this is fine, UNLESS you will be changing the structure of your usernames in the future. Thus I think it's better to just stick with ObjectId() for the ID and then stick an extra field username if you need it.
Is MongoDB _id unique by default, or do I have to set it to unique?
For the most part, _id in mongodb is unique enough. There is one edge case where it's possible to generate a duplicate id though:
If you generate more than 16,777,215 object ids in the same single second, while running in one process while on the same machine then you will get a duplicate id due to how object ids are generated. If you change any one of those things like generating an id in a different process or on a different machine, or at a different unix time, then you won't get a duplicate.
Let me know if you ever manage to pull this off with a realistic use case. Apparently google only gets around 70,000 searches a second and those are all processed on different machines.
All documents contain an _id field. All collections (except for capped ones) automatically create unique index on _id.
Try this:
db.system.indexes.find()
ok .. short version
YES YES YES
_id uniqid by default , mongoDB creates index on _id by default and you do not need any settings
According to MongoDB's manual the answer is yes, it's unique by default:
MongoDB creates the _id index, which is an ascending unique index on the _id field, for all collections when the collection is created. You cannot remove the index on the _id field.
Share what I learned about this topic here:
Different from other RDBMS, Mongodb document's Id is generated on the client side. This functionality is usually implemented in the drivers of various programming languages.
The Id is string with 12 bytes length, which consists of several parts as follows:
TimeStamp(4 bytes) + MachineId(3 bytes) + ProcessId(2 bytes) + Counter(3 bytes)
Based on this pattern, it's extremely unlikely to have two Ids duplicated.