What is difference between partial indexes and sparse indexes mongodb? - mongodb

I've read official docs of MongoDB but really can't understand the difference between sparse and partial indexes. I wanted to have an explanatory view with examples.

Sparse index is an optimized index which only contains pointers to documents that have value(s) in the indexed fields.
For example, let's say you would like to add an index on lastname field
{ _id: 1, firstname: 'John', lastname: 'Black', age: 20 }
{ _id: 2, firstname: 'Stive', lastname: 'White', age: 17 }
{ _id: 3, firstname: 'Tom', age: 22 }
if you run
db.users.createIndex({ lastname: 1 });
command, it will add indexes on 3 documents, but you don't need to have an index on a document where is no lastname value (_id: 3); it's a waste of space and memory.
To avoid empty fields' indexing, mongodb has sparse index, which is simply "check for non-empty value".
So when you add sparse: true
db.users.createIndex({ lastname: 1, sparse: true });
Mongodb will add indexes only for 2 documents (_id: 1, _id:2). Its great, but what if you want to index only those users' documents which are older than 18 years?
You cant use sparse index because it only checks documents for value existence.
This is why partial indexes were created.
db.person.createIndex(
{ age: 1},
{ partialFilterExpression: { age: { $gte: 18 }, lastname: { $exists: true }}
);
This example will put index only for 1 document(id: 1). Partial index is complex version of sparse, it will filter documents not only checking their existence, but using conditions provided in partialFilterExpression field.

Related

MongoDB: index enums and nullable fields for search?

I have a "log" type of collection, where depending on the source, there might be some id-fields. I want to search those fields with queries, but not sort by them. should I index them to improve search performance? To visualize the problem:
[{
_id: ObjectID("...") // unique
userId: ObjectID("...") // not unique
createdAt: ...
type: 'USER_CREATED'
},
{
_id: ObjectID("...") // unique
basketId: ObjectID("...") // not unique
createdAt: ...
type: 'BASKET_CREATED'
},
...]
I want to filter by (nullable) userId or basketId as well as the type-enum. I am not sure if those fields need an index. createdAT certainly does since it it is sortable. But sparse fields containing enums or null (and simply non-unique) values: how should those be treated as a rule of thumb?

MongoDB: Must every index be prefixed with the shardkey

Imaging we have documents like this:
{
_id: ObjectId(""),
accountId: ObjectId(""),
userId: ObjectId(""),
someOtherFieldA: ["some", "array", "values"],
someOtherFieldB: ["other", "array", "values"],
...
}
Furthermore there are multiple compound indices, ex.:
{ userId: 1, someOtherFieldA: 1, ... }
{ userId: 1, someOtherFieldB: 1, ... }
We want to shard by accountId.
Would it be enough to add a single field index for accountId, so that the existing indices still work? Or would all indices need the accountId as prefix (first part)?
When you run the sh.shardCollection() command then MongoDB automatically creates an index on the shard key field (unless such an index exist already), so you don't need to care about this question.

MongoDB compound index order of the fields

I have collection schema
1) user
2) age
3) role
I have created compound index ( { age: 1, user: 1 } ). When I find documents with criteria { age: { $gt: 21, $lt: 50 }, user: 'user124' }, the index is properly used ( I am watching in explain()), but when I change order to { user: '124', age: { $gt: 21, $lt: 50 } } results and index usage is identical. When I have compound index on two fields, order in criteria doesn't matter?
This is correct, the order does not matter.
In fact, only arrays in the query are ordered and dictionarys are not.
http://json.org/

MongoDB: Upsert document in array field

Suppose, I have the following database:
{
_id: 1,
name: 'Alice',
courses: [
{
_id: 'DB103',
credits: 6
},
{
_id: 'ML203',
credits: 4
}
]
},
{
_id: 2,
name: 'Bob',
courses: []
}
I now want to 'upsert' the document with the course id 'DB103' in both documents. Although the _id field should remain the same, the credits field value should change (i.e. to 4). In the first document, the respective field should be changed, in the second one, {_id: 'DB103', credits: 4} should be inserted into the courses array.
Is there any possibility in MongoDB to handle both cases?
Sure, I could search with $elemMatch in courses for 'DB103' and if I haven't found it, insert, otherwise update the value. But these are two steps and I would like to do both in just one.

Matching on compound _id fields in MongoDB aggregate

I'm a MongoDB novice so please forgive me if this question has an obvious answer...
Context:
I've followed the example in the MongoDB docs to implement hierarchical aggregation using map-reduce. The example uses a "compound" _id field as a map-reduce key producing aggregate documents like this...
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z") },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
This is all well and good. My particular use case requires that values for several similar keys be emitted each map step. For example...
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), hobby: "wizardry" },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), gender: "male" },
value: {
ts: ISODate('2010-10-10T15:01:00Z'),
total: 254,
count: 10,
mean: 25.4 }
}
(The values are the same, but the _id keys are slightly different.)
This is also well and good.
Question:
Now I'd like to aggregate over my hierarchical collections (views), which contain documents having several different compound _id fields, but only over documents with $matching _id fields. For example, I'd like to aggregate over just the documents possessing the {u: String, d: Date, hobby: String} type _id or just the documents with an _id of type {u: String, d: Date}.
I'm aware that I can use the $exists operator to restrict which _id fields should and shouldn't be permitted, but I don't want to have to create a separate aggregation for each _id (potentially many).
Is there a simple way of programmatically restricting $matching documents to those containing (or not containing) particular fields in an aggregate?
I think the best way to address this issues is by storing your data differently. Your "_id" sort of has arbitrary values as key and that is something you should avoid. I would probably store the documents as:
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), type: hobby, value: "wizardry" }
}
{
_id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z"), type: gender, value: "male" },
}
And then your match because simple even without having to create a different match for each type.