MongoDB - Weird difference in _id Index size - mongodb

I have two sharded collections on 12 shards, with the same number of documents. The shard key of Collection1 is compound (two fields are used), and its document consists of 4 fields. The shard key of Collection2 two is single, and its documents consists of 5 fields.
Via db.collection.stats() command, I get the information about the indexes.
What seems strange to me, is that for the Collection1, the total size of _id index is 1342MB.
Instead, the total size of the _id index for Collection2 is 2224MB. Is this difference reasonable? I was awaiting that the total size would be more less the same because of the same number of docucments. Note that the sharding key for both collections, does not integrate the _id field.

MongoDB uses prefix compression for indexes.
This means that if sequential values in the index begin with the same series of bytes, the bytes are stored for the first value, and subsequent values contain a tag indicating the length of the prefix.
Depending on the datatype of the _id value, this could be quite a bit.
There may also be orphaned documents causing one node to have more entries in its _id index.

Related

Sharding with array in Cloud Firestore with composite index

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I can add a shard field to avoid this.
Therefore I should add the shard field before the sequential field in a composite index.
But what if my sequential field is an array?
An array must always be the first field in a composite index.
For example:
I have a Collection "users" with an array field "reminders".
The field reminders contains time strings like ["12:15", "17:45", "20:00", ...].
I think these values could result in hot spotting but maybe I am wrong.
I don't know how Firestore handles arrays in composite indexes.
Clould my array reminders slow down the writes per second? And if so how could I implement a shard field? Or is there a completely different solution?

Get Number of Documents in MongoDB Index

As per the title, I would like to know if there is a way to get the number of documents in a MongoDB index.
To be clear, I am not looking for either of the following:
How to get the number of documents in a collection -- .count().
How to get the size of an index -- .stats().
An index references all of the documents in its collection unless the index is a sparse index or a partial index. From the docs:
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.
Partial indexes only index the documents in a collection that meet a specified filter expression
So ...
The answer for non sparse and non partial indexes is db.collection.count()
The answer for sparse and partial indexes could be inferred by running a query with no criteria, hinting on that index and then counting the results. For example:
db.collection.find().hint('index_name_here').count()

MongoDB add fields of low cardinality to compound indexes?

I have read putting indexes on low cardinality fields is pointless.
Would this hold true for a compound index as such:
db.perms.createIndex({"owner": 1, "object_type": 1, "target": 1});
With queries as such:
db.perms.find({"owner": "me", "object_type": "square"});
db.perms.find({"owner": "me", "object_type": "circle", "target": "you"});
The amount of distinct object_type's would grow over time (probably no more than 10 or 20 max) but would only start out with about 2 or 3.
Similarly would a hash index be worth looking into?
UPDATE:
owner and target would grow immensely. Think of this like a file system wherein the owner would "own" a target (i.e. file). But, like unix systems, a file could be a folder, a symlink, or a regular file (hence the type). So although there are only 3 object_type's, a owner and target combination could have thousands of entries with an even distribution of types.
I may not be able to answer your question, but giving my cents for index cardinality:
Index cardinality: it refers to the number of index points for each different type of index that MongoDB supports.
Regular - for every single key that we put in the index, there's certainly going to be an index point. And in addition, if there is no key, then there's going to be an index point under the null entry. We get 1:1 relative to the number of documents in the collection in terms of index cardinality. That makes the index a certain size. It's proportional to the collection size in terms of it's end pointers to documents
Sparse - when a document is missing a key being indexed, it's not in the index because it's a null and we don't keep nulls in the index for a sparse index. We're going to have index points that could be potentially less than or equal to the number of documents.
Multikey - this is an index on array values. There'll be multiple index points (for each element of the array) for each document. So, it'll be greater than the number of documents.
Let's say you update a document with a key called tags and that update causes the document to need to get moved on disk. Assume you are using the MMAPv1 storage engine. If the document has 100 tags in it, and if the tags array is indexed with a multikey index, 100 index points need to be updated in the index to accommodate the move?

MongoDB Sharded collection shard key confusion

Suppose I have a DB called 'mydb' and a collection in that DB called 'people' and documents in mydb.people all have a 5 digit US zip code field: ex) 90210. If I set up a sharded collection by splitting up this collection in to 2 shards using the zip code as the shard key, how would document insertion be handled?
So if I insert a document with zipcode = 00000 would that go to the first shard because this zip code value is less than the center zipcode value of 50000? And if I insert a document with zipcode = 99999 would it be inserted into the second shard?
I setup a sharded cluster according to http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/ with a collection with common key of zipcode sharded across 2 DB's and am not finding this even distribution of documents.
Also what do they mean by Chunk size? A chunk is basically a range of the shard index, right? Why do they talk about chunk sizes in sizes of MB and not in terms of range of the shard key?
Confusing

Can I increase the maximum number of indexes per collection in mongodb?

I have a collection that has different query params to support,as such I create indexes for every keys. I just added a few more and get an error:
pymongo.errors.OperationFailure: add index fails, too many indexes for collection key:{ foo: 1 }
and then I notice that the maximum number of indexes per collection in mongodb is just 64, Can I change this number ?
The max is built into MongoDB:
http://docs.mongodb.org/manual/reference/limits/
Number of Indexes per Collection
A single collection can have no more than 64 indexes.