Can I make a unique compound index in MongoDB that enforces uniqueness on key permutation? - mongodb

Say I have a collection that records correlation values between brands (nevermind how such a correlation would be generated or interpreted.) Then the fields in this collection would include: 'brand1', 'brand2', and 'correlation'.
For the sake of an example, let's say that brands can take on string values such as "google", "microsoft", etc., so that each document records the correlation between various brand names.
I would want to create a unique index on the 'brand1' and 'brand2' fields so that each document records the correlation between a pair of brands only once in the collection. In order to do this, the ordering of the key in the index must be taken into account when determining uniqueness in the collection. A key of ['google', 'microsoft'] should be considered the same as a key of ['microsoft', 'google'], so that if a document already exists with the former key, an insertion of a document with the latter key would be prohibited.
Is this kind of index possible?

There is no way to enforce that kind of constraint on a MongoDB collection.
What you can do, however, is enforce a constraint in your software that the two components of this index are always stored in sorted order. (For instance, always store ["a","b"], not ["b","a"].) This makes it so that there's only one "canonical" version of any pair in the collection.

Related

Sharding with array in Cloud Firestore with composite index

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I can add a shard field to avoid this.
Therefore I should add the shard field before the sequential field in a composite index.
But what if my sequential field is an array?
An array must always be the first field in a composite index.
For example:
I have a Collection "users" with an array field "reminders".
The field reminders contains time strings like ["12:15", "17:45", "20:00", ...].
I think these values could result in hot spotting but maybe I am wrong.
I don't know how Firestore handles arrays in composite indexes.
Clould my array reminders slow down the writes per second? And if so how could I implement a shard field? Or is there a completely different solution?

Validate uniqueness of a relationship model

I have an application where users can follow each other. Once this relationship is made a document is added into the collection. That document has two fields follower and followee. I want to prevent insertions of duplicate relationships. I do not want to query the db, wait for a promise, then insert as this seems like an inefficient approach. I'd rather stop it from saving a new document if the new document's follower and followee matches an existing document.
Look into creating a Unique Compound Index index:
db.members.createIndex( { follower: 1, followee: 1 }, { unique: true } )
The created index enforces uniqueness for the combination of follower and followee values.
A unique index ensures that the indexed fields do not store duplicate
values; i.e. enforces uniqueness for the indexed fields. By default,
MongoDB creates a unique index on the _id field during the creation of
a collection

DB Compound indexing best practices Mongo DB

How costly is it to index some fields in MongoDB,
I have a table where i want uniqueness combining two fields, Every where i search they suggested compound index with unique set to true. But what i was doing is " Appending both field1_field2 and making it a key, so that field2 will be always unique for field1.(and add Application logic) As i thought indexing is costly.
And also as MongoDB documentation advices us not to use Custom Object ID like auto incrementing number, I end up giving big numbers to Models like Classes, Students etc, (where i could have used easily used 1,2,3 in sql lite), I didn't think to add a new field for numbering and index that field for querying.
What are the best practices advice for production
The advantage of using compound indexes vs your own indexed field system is that compound indexes allows sorting quicker than regular indexed fields. It also lowers the size of every documents.
In your case, if you want to get the documents sorted with values in field1 ascending and in field2 descending, it is better to use a compound index. If you only want to get the documents that have some specific value contained in field1_field2, it does not really matter if you use compound indexes or a regular indexed field.
However, if you already have field1 and field2 in seperate fields in the documents, and you also have a field containing field1_field2, it could be better to use a compound index on field1 and field2, and simply delete the field containing field1_field2. This could lower the size of every document and ultimately reduce the size of your database.
Regarding the cost of the indexing, you almost have to index field1_field2 if you want to go down that route anyways. Queries based on unindexed fields in MongoDB are really slow. And it does not take much more time adding a document to a database when the document has an indexed field (we're talking 1 millisecond or so). Note that adding an index on many existing documents can take a few minutes. This is why you usually plan the indexing strategy before adding any documents.
TL;DR:
If you have limited disk space or need to sort the results, go with a compound index and delete field1_field2. Otherwise, use field1_field2, but it has to be indexed!

How can I set multiple fields as primary key in MongoDB?

I am trying to create a collection with 50+ fields. I understand that the purpose of the primary key is to uniquely identify a record. Since the primary key is the _id in MongoDB that gets created automatically, isn't it obvious that all my records including duplicate would go into my DB with unique _id for evert record? Tell me where I'm going wrong.Other articles and discussions are more confusing.
How to set any one/more of the other fields as a primary key? But I don't want the default _id as primary key.
In what way, compound indexes are different from compound/primary key?
There is no such notion as a primary key in MongoDB. Terminology matters. Not knowing the terminology is a sure sign someone hasn't read the docs or at least not carefully.
A document in a collection must have an _id field which may be and by default is an ObjectId. This field has an index on it which enforces a unique constraint, so there can not be any two documents with the same value or combination of values in the _id field. Which, by what you describe, presumably is what you want.
My suggestion is to reuse the default _id as often as you can. Additional indices are expensive (RAM-wise). You have two options here: either use a different single value as _id or use multiple values, if the cardinality of the single field isn't enough.
Let us assume you want a clickstream per user recorded. Obviously, you need to have the unique user. But that would not be enough, since a user only could only have one entry. But since you need a timestamp fo each click anyway, you move it to the _id field:
{
_id:{
user: "some user",
ts: new ISODate()
},
...
}
Unless your Mongo installation is sharded, you can you create a unique compound index on multiple fields and use this as a surrogate composite primary key.
db.collection.createIndex( { a: 1, b: 1 }, { unique: true } )
Alternatively you could create your own _id values. However, as the default ObjectId is also a timestamp, personally I find it useful for auditing purposes.
Regarding the difference between compound index and composite primary key, by definition primary keys cannot be defined on a missing (null) fields and there can only be one primary key per document. In MongoDB only the _id field can be used as a primary key, as it is added by default when missing. In contrast, a compound index can be applied on missing fields by defining it as parse and you can define multiple compound indices on the same document.

How does mongodb index lists

For example: If I had a db collection called Stores, and each store document has a list of the items they sell, and stores generally share items, then how would mongodb build an index on that?
Would it build a btree index on all possible items and then on each leaf of that tree (each item) will reference the documents which contain it?
Background:
I'm trying to perform queries like this using an index:
db.store.find({merchandise:{$exists:true}}) // where 'merchandise' is a list
db.store.find()[merchandise].count()
would an index on 'merchandise' help me?
If not, is my only option creating a separate meta field on 'merchandise' size, and index that?
Schema:
{ _id: 123456,
name: Macys
merchandise: [ 248651234564, 54862101248, 12450184, 1256001456 ]
}
From your document sample if you build your index on merchandise it will be multikey index and that index will be on every item on the array. See Multikey Indexes section in here.
If merchandise is an array of subdocuments, indexing over merchandise will put the index on all field of subdocument in the array. With index you can make queries like
db.store.find("merchandise":248651234564) and it will retrieve all document having merchandise 248651234564
For getting count of merchandise, you can get only get the size of merchandise field of one document like db.store.find()[index].merchandise.length. So creating a seperate field on merchandise size and indexing is a feasible option, if you want to run queries based on merchandise size.
Hope this helps
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a multikey index. When you have 4 documents inside an array, each will act as a key in the index and point to the mentioned document(s).
You can use multikey indexes to index fields within objects embedded in arrays. That means, in your array, you can index a specific field in each document. For example: stuffs.thing : 1.
Read more about Multikey Indexes
Whether you need these indexes would depend on:
How many queries rely on that specific field?
How many updates, inserts hit that specific field (array)?
How many items will that array contain?
...
Remember that indexes slow writes as they need to be updated as well. I'd consider an explain on my queries to measure performance.