Define MongoDB compound keys as one key - mongodb

is it somehow possible, to define one compound key, consisting of two mongoDB objectID's or numeric types, so to make one key out of it?
This is necessary, because I have lots of participants creating documents which they save into one big collection together, so I cannot be sure, that the MongoDB Object ID for each document is distinct. So I wanted to add some additional key, maybe one userID's number or email or something similar...
maybe 2 ObjectID's

ObjectId in MongoDb is hexadecimal value.
ObjectId() Returns a new ObjectId value. The 12-byte
ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
https://docs.mongodb.com/manual/reference/method/ObjectId/
Hence, the object Id will be uniquely auto-generated when you insert a document.
However, you can make a custom combination of hexadecimal value of length 24, when you insert a document.
For example,
1DCD6500 -- this can be custom hex identifier
A98AC7 -- another custom hex identifier
2B67 -- another custom hex identifier
A981CE -- Incremental custom hex identifier
Now if you try to insert a document with _id as 1DCD6500A98AC72B67A981CE. The document will be saved.
e.g. { "_id" : ObjectId("1DCD6500A98AC72B67A981CE"), "name" : "sample", "personid" : 39 }
So based on definition of the ObjectId you can make custom ObjectId.
But in that case you will be responsible to make sure ObjectId is unique, otherwise the mongodb will throw error
"E11000 duplicate key error collection:

You can use anything for your _id field. So this is possible:
db.collection.insertOne({
_id: {
"first": new ObjectId(),
"second": new ObjectId(),
}
})
The default unique index on the _id field also guarantees uniqueness on this kind of field.
However, I would doubt that this is a good solution to your problem as it would probably just defer the underlying problem (which really doesn't exist - kindly see this answer, too: How to generate unique object id in mongodb). Instead, I would suggest you have your clients create documents without specifying an _id explicitly and let MongoDB create the _id (on the server side or on the client side depending on your driver and your settings where client-side generation should be preferred). This will guarantee uniqueness (even when you do sharding).
There always is a unique index on your _id field anyway so to be on the super safe side with respect to run-time behaviour you could put a retrying exception handler in place on the client side for the (pretty much impossible) case that you end up with two identical _ids and hence an exception.
Also see this answer: Mongodb - must _id be globally unique when sharding

Related

Can mongo document ID format be customised?

I would like to encode some meaning behinds first N characters of every document ID i.e. make first three characters determine a document type sensible to the system being used in.
You can have a custom _id when you insert the document. If the document to be inserted doesn't contain _id, then MongoDB will insert a ObejctId for you.
The _id can be of any type but keeping it uniform for all the documents makes sense if you are accessing from application layer.
You can refer one of the old questions at SO - How to generate unique object id in mongodb

How can I set multiple fields as primary key in MongoDB?

I am trying to create a collection with 50+ fields. I understand that the purpose of the primary key is to uniquely identify a record. Since the primary key is the _id in MongoDB that gets created automatically, isn't it obvious that all my records including duplicate would go into my DB with unique _id for evert record? Tell me where I'm going wrong.Other articles and discussions are more confusing.
How to set any one/more of the other fields as a primary key? But I don't want the default _id as primary key.
In what way, compound indexes are different from compound/primary key?
There is no such notion as a primary key in MongoDB. Terminology matters. Not knowing the terminology is a sure sign someone hasn't read the docs or at least not carefully.
A document in a collection must have an _id field which may be and by default is an ObjectId. This field has an index on it which enforces a unique constraint, so there can not be any two documents with the same value or combination of values in the _id field. Which, by what you describe, presumably is what you want.
My suggestion is to reuse the default _id as often as you can. Additional indices are expensive (RAM-wise). You have two options here: either use a different single value as _id or use multiple values, if the cardinality of the single field isn't enough.
Let us assume you want a clickstream per user recorded. Obviously, you need to have the unique user. But that would not be enough, since a user only could only have one entry. But since you need a timestamp fo each click anyway, you move it to the _id field:
{
_id:{
user: "some user",
ts: new ISODate()
},
...
}
Unless your Mongo installation is sharded, you can you create a unique compound index on multiple fields and use this as a surrogate composite primary key.
db.collection.createIndex( { a: 1, b: 1 }, { unique: true } )
Alternatively you could create your own _id values. However, as the default ObjectId is also a timestamp, personally I find it useful for auditing purposes.
Regarding the difference between compound index and composite primary key, by definition primary keys cannot be defined on a missing (null) fields and there can only be one primary key per document. In MongoDB only the _id field can be used as a primary key, as it is added by default when missing. In contrast, a compound index can be applied on missing fields by defining it as parse and you can define multiple compound indices on the same document.

Does replacing the "_id" field in mongodb with a custom unique key decrease performance?

I have a situation in which I have a User schema that contains a unique field called "username." At the same time, mongo automatically creates its own unique key, "_id."
I've noticed that for a lot of my schemas I need both an array of "usernames" as well as "ids". This is quite redundant sometimes so my question is:
Is a lookup via "_id" faster than a lookup for a field "username" (let's say a 10 character string)? If they are the same, is it viable to use my unique identifier username for the value of _id?
If your data naturally has a required, unique field, then it's perfectly fine to use that value as your _id.
As long as the field's data is comparable in size to an ObjectId (which is 12 bytes), then performance should be the same. A 10 character string is 20 bytes, so the index for username will take a bit more memory, but probably not enough to make a difference performance-wise.
Since you're using Mongoose, you could also create a virtual field (named username) that exposes the _id field using that more descriptive name, as well.
I think this is fine, UNLESS you will be changing the structure of your usernames in the future. Thus I think it's better to just stick with ObjectId() for the ID and then stick an extra field username if you need it.

Mongodb - must _id be globally unique when sharding

I want to shard a collection by their "foreign key" (userID) and not by their id field. I only need that the combination of userID and id is unique. But I am not sure if that is ok with mongodb.
Warning In any sharded collection where you are not sharding by the
_id field, you must ensure uniqueness of the _id field. The best way to ensure _id is always unique is to use ObjectId, or another
universally unique identifier (UUID.)
This is taken from: http://docs.mongodb.org/manual/tutorial/enforce-unique-keys-for-sharded-collections/#enforce-unique-keys-for-sharded-collections
Do I have to ensure that _id is unique? Or is it good enough if I always query by both userID and _id?
Unless you manually replace them, the auto-generated _id's are UUID's which, according to the documentation, consist of "a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter".
As you can see, an unique machine-ID is part of the UUID. That ensures that no two machines in the shard ever create the same UUID independently (unless they have the same machine-id - the likeliness for that is 1:16777215 and when it happens it can be easily verified). The only situation where you could theoretically have a duplicated UUID is when a single process creates more than 2^24 (over 16 million) UUIDs in a single second.
tl;dr: You don't have to worry about duplicate UUIDs - they are, as the documentation puts it, "designed to have a reasonably high probability of being unique when allocated".

Why mongoDB uses objectID?

{ "_id" : ObjectId("4c2209f9f3924d31102bd84a"), "name" : "mongo" }
what exactly is the purpose of objectId? It's a big number that is generated using a timestamp.
If I see any nosql which is key-value, I query with key the value.
Here we use key and value in the as the data and use find () function.
So, I am trying to understand when we really need the objectid?
What are the reasons behind that giving access to the user to view the value of the object ID?
After reading the docs, one basic question is mongo DB as hash table type implementation?
After readying doc..one basic question is mongo DB as hash table type implementation?
MongoDB used BSON, a binary form of JSON. A JSON object is basically just a "hashtable" or a set of key / value pairs.
what exactly is the use of object id? that is a big number that is generated with time.
In MongoDB, each document you store must have an _id. If you do not set a value for _id, then MongoDB will automatically generate one for you. If you have a unique key when you are inserting the object, you can use that instead. For details on the ObjectId see here.
If I see any nosql which is key-value, I query with key the value.
MongoDB is not just key-value. MongoDB supports multiple indexes on a single collection, you can query on many different fields, not just the "key" or "id".
Object ID is similar to primary key in RDBMS
Whenever u insert a new document, mongodb will generate object ID.
Object ID is a 12 byte BSON Type.
First 4 Byte represents timestamp
next 3 byte unique machine identifier
next 2 byte process id
next 3 byte random increment counter
Returns the equivalent 16 digit hex