How can I set multiple fields as primary key in MongoDB? - mongodb

I am trying to create a collection with 50+ fields. I understand that the purpose of the primary key is to uniquely identify a record. Since the primary key is the _id in MongoDB that gets created automatically, isn't it obvious that all my records including duplicate would go into my DB with unique _id for evert record? Tell me where I'm going wrong.Other articles and discussions are more confusing.
How to set any one/more of the other fields as a primary key? But I don't want the default _id as primary key.
In what way, compound indexes are different from compound/primary key?

There is no such notion as a primary key in MongoDB. Terminology matters. Not knowing the terminology is a sure sign someone hasn't read the docs or at least not carefully.
A document in a collection must have an _id field which may be and by default is an ObjectId. This field has an index on it which enforces a unique constraint, so there can not be any two documents with the same value or combination of values in the _id field. Which, by what you describe, presumably is what you want.
My suggestion is to reuse the default _id as often as you can. Additional indices are expensive (RAM-wise). You have two options here: either use a different single value as _id or use multiple values, if the cardinality of the single field isn't enough.
Let us assume you want a clickstream per user recorded. Obviously, you need to have the unique user. But that would not be enough, since a user only could only have one entry. But since you need a timestamp fo each click anyway, you move it to the _id field:
{
_id:{
user: "some user",
ts: new ISODate()
},
...
}

Unless your Mongo installation is sharded, you can you create a unique compound index on multiple fields and use this as a surrogate composite primary key.
db.collection.createIndex( { a: 1, b: 1 }, { unique: true } )
Alternatively you could create your own _id values. However, as the default ObjectId is also a timestamp, personally I find it useful for auditing purposes.
Regarding the difference between compound index and composite primary key, by definition primary keys cannot be defined on a missing (null) fields and there can only be one primary key per document. In MongoDB only the _id field can be used as a primary key, as it is added by default when missing. In contrast, a compound index can be applied on missing fields by defining it as parse and you can define multiple compound indices on the same document.

Related

Validate uniqueness of a relationship model

I have an application where users can follow each other. Once this relationship is made a document is added into the collection. That document has two fields follower and followee. I want to prevent insertions of duplicate relationships. I do not want to query the db, wait for a promise, then insert as this seems like an inefficient approach. I'd rather stop it from saving a new document if the new document's follower and followee matches an existing document.
Look into creating a Unique Compound Index index:
db.members.createIndex( { follower: 1, followee: 1 }, { unique: true } )
The created index enforces uniqueness for the combination of follower and followee values.
A unique index ensures that the indexed fields do not store duplicate
values; i.e. enforces uniqueness for the indexed fields. By default,
MongoDB creates a unique index on the _id field during the creation of
a collection

Define MongoDB compound keys as one key

is it somehow possible, to define one compound key, consisting of two mongoDB objectID's or numeric types, so to make one key out of it?
This is necessary, because I have lots of participants creating documents which they save into one big collection together, so I cannot be sure, that the MongoDB Object ID for each document is distinct. So I wanted to add some additional key, maybe one userID's number or email or something similar...
maybe 2 ObjectID's
ObjectId in MongoDb is hexadecimal value.
ObjectId() Returns a new ObjectId value. The 12-byte
ObjectId value consists of:
4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.
https://docs.mongodb.com/manual/reference/method/ObjectId/
Hence, the object Id will be uniquely auto-generated when you insert a document.
However, you can make a custom combination of hexadecimal value of length 24, when you insert a document.
For example,
1DCD6500 -- this can be custom hex identifier
A98AC7 -- another custom hex identifier
2B67 -- another custom hex identifier
A981CE -- Incremental custom hex identifier
Now if you try to insert a document with _id as 1DCD6500A98AC72B67A981CE. The document will be saved.
e.g. { "_id" : ObjectId("1DCD6500A98AC72B67A981CE"), "name" : "sample", "personid" : 39 }
So based on definition of the ObjectId you can make custom ObjectId.
But in that case you will be responsible to make sure ObjectId is unique, otherwise the mongodb will throw error
"E11000 duplicate key error collection:
You can use anything for your _id field. So this is possible:
db.collection.insertOne({
_id: {
"first": new ObjectId(),
"second": new ObjectId(),
}
})
The default unique index on the _id field also guarantees uniqueness on this kind of field.
However, I would doubt that this is a good solution to your problem as it would probably just defer the underlying problem (which really doesn't exist - kindly see this answer, too: How to generate unique object id in mongodb). Instead, I would suggest you have your clients create documents without specifying an _id explicitly and let MongoDB create the _id (on the server side or on the client side depending on your driver and your settings where client-side generation should be preferred). This will guarantee uniqueness (even when you do sharding).
There always is a unique index on your _id field anyway so to be on the super safe side with respect to run-time behaviour you could put a retrying exception handler in place on the client side for the (pretty much impossible) case that you end up with two identical _ids and hence an exception.
Also see this answer: Mongodb - must _id be globally unique when sharding

Can I make a unique compound index in MongoDB that enforces uniqueness on key permutation?

Say I have a collection that records correlation values between brands (nevermind how such a correlation would be generated or interpreted.) Then the fields in this collection would include: 'brand1', 'brand2', and 'correlation'.
For the sake of an example, let's say that brands can take on string values such as "google", "microsoft", etc., so that each document records the correlation between various brand names.
I would want to create a unique index on the 'brand1' and 'brand2' fields so that each document records the correlation between a pair of brands only once in the collection. In order to do this, the ordering of the key in the index must be taken into account when determining uniqueness in the collection. A key of ['google', 'microsoft'] should be considered the same as a key of ['microsoft', 'google'], so that if a document already exists with the former key, an insertion of a document with the latter key would be prohibited.
Is this kind of index possible?
There is no way to enforce that kind of constraint on a MongoDB collection.
What you can do, however, is enforce a constraint in your software that the two components of this index are always stored in sorted order. (For instance, always store ["a","b"], not ["b","a"].) This makes it so that there's only one "canonical" version of any pair in the collection.

Does Mongodb automatically updates indexed items? [duplicate]

Lets say you have a collection with a field called "primary_key",
{"primary_key":"1234", "name":"jimmy", "lastname":"page"}
and I have an index on "primary_key".
This collection has millions of rows, I want to see how expensive is to change primary_key for one of the records. Does it trigger a reindex of the entire table? or does it just reindex the changed record? in either case is that expensive to do?
Updating an indexed field in mongodb causes an update of the index (or indices if you have more than one) that use it. It does not "reindex". Shouldn't be all that expensive - effectively you will delete the old entry and insert a new one.
This document has a fair amount of detail on mongodb indexes:
http://docs.mongodb.org/master/MongoDB-indexes-guide.pdf
BTW, keep in mind that there is one special field, _id, that mongodb uses as it's primary key
_id
A field required in every MongoDB document. The _id field must have a unique value. You can think of the _id field as the document’s
primary key. If you create a new document without an _id field,
MongoDB automatically creates the field and assigns a unique BSON
ObjectId.
You cannot update the _id field.

Does mongodb reindex if you change the field that it is used in index?

Lets say you have a collection with a field called "primary_key",
{"primary_key":"1234", "name":"jimmy", "lastname":"page"}
and I have an index on "primary_key".
This collection has millions of rows, I want to see how expensive is to change primary_key for one of the records. Does it trigger a reindex of the entire table? or does it just reindex the changed record? in either case is that expensive to do?
Updating an indexed field in mongodb causes an update of the index (or indices if you have more than one) that use it. It does not "reindex". Shouldn't be all that expensive - effectively you will delete the old entry and insert a new one.
This document has a fair amount of detail on mongodb indexes:
http://docs.mongodb.org/master/MongoDB-indexes-guide.pdf
BTW, keep in mind that there is one special field, _id, that mongodb uses as it's primary key
_id
A field required in every MongoDB document. The _id field must have a unique value. You can think of the _id field as the document’s
primary key. If you create a new document without an _id field,
MongoDB automatically creates the field and assigns a unique BSON
ObjectId.
You cannot update the _id field.