Multi-key Indexing on an Entire Array - mongodb

MongoDB's docs explain multi-key indexing. Consider this comment document.
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
The docs explain that it's possible to index on the comments.text, or any comments' field. However, is it possible to index on the comments key itself?
This post demonstrates indexing on an array of strings, however, the above comments field is an array of JSON objects.
Based on Antoine Girbal's article, it appears possible to index on an array of JSON objects where each JSON object has a different key name. However, it doesn't appear possible where each JSON object in the array shares the same key names.
Example - https://gist.github.com/kman007us/6797422

Yes you can index subdocuments and they can be in a multikey index. When indexing a whole subdocuments, it will only match when searching against the whole document eg:
db.test.find({records: {hair: "brown"}})
Searches for records that match documents that are exactly {hair: "brown"} and it can use the index to find it.
If you want to find any sub documents that have hair="brown" and any other fields the dot notation is needed eg:
db.test.find({"records.hair": "brown"})
However, there is no index to use for that - so its a full table scan.
Please note: There are limitations on index size and whole documents could easily exceed that size.

Related

Difference between `{id: 1, title: "text"}` and {title: "text", id: 1} MongoDB Indexes

I am designing a schema where a User document stores the Skill ids. A Skill document look like this
Skill = {
id: ObjectId
title: String
description: String
}
User = {
id: ObjectId
skills: [ObjectId]
}
And on the frontend side, a user can add Skills by searching it's title. Therefore, I indexed skills collection by {id: 1, title: "text"}. I want to know whether ordering matters when we combine text indexes with the numeric ones.
Yes, it makes a big difference, the documentation about composite text indexes says
If the compound text index includes keys preceding the text index key, to perform a $text search, the query predicate must include equality match conditions on the preceding keys.
That means if you do {id: 1, title: 'text'} you can only use the text index on title if you also constrain the search to within a single id.
If you do {title: 'text', id: 1} you will be able to text-search title by itself, or further constrained by id, or retrieve the matching id for the text search results.
Are you sure you need that id column in the index? Why not just the text index?
Yes, it does matter. Straight from the docs:
The order of the fields listed in a compound index is important. The index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field. See Sort Order for more information.
In your current structure if you only query the title field without an restrictions (no query) on the id field you will not be able to utilize the index.

MongoDB schema design: reference by ID vs. reference by name?

With this simple example
(use short ObjectId to make it read easier)
Tag documents:
{
_id: ObjectId('0001'),
name: 'JavaScript',
// other data
},
{
_id: ObjectId('0002'),
name: 'MongoDB',
// other data
},
...
Assume that we need a individual tag collection, e.g. we need to store some information on each tag.
If reference by ID:
// a book document
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: [ObjectId('0001'), ObjectId('0002'), ...]
}
If reference by name:
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: ['JavaScript', 'MongoDB', ...]
}
It's known that "reference by ID" is feasible.
I'm thinking if use "reference by name", a query for book's info only need to find within the book collection, we could know the tags' name without a join ($lookup) operation, which should be faster.
If the app performs a tag checking before book creating and modifying, this should also be feasible, and faster.
I'm still not very sure:
Is there any hider on "reference by name" ?
Will "reference by name" slower on "finding all books with a given tag" ? Maybe ObjectId is somehow special ?
Thanks.
I would say it depends on what your use case is for tags. As you say, it will be more expensive to do a $lookup to retrieve tag names if you reference by id. On the other hand, if you expect that tag names may change frequently, all documents in the book collection containing that tag will need to be updated every change.
The ObjectID is simply a 12 byte value, which is autogenerated by a driver if no _id is present in inserted documents. See the MongoDB docs for more info. The only "special behavior" would be the fact that _id has an index by default. An index will speedup lookups in general, but indexes can be created on any field, not just _id.
In fact, the _id does not need to be an ObjectID. It is perfectly legal to have documents with integer _id values for instance:
{
_id: 1,
name: 'Javascript'
},
{
_id: 2,
name: 'MongoDB'
},

Get text words from query

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

Matching an array field which contains any combination of the provided array in MongoDB

I would like to query with a specified list of array elements such that documents returned can only contain the elements I pass, but need not contain all of them.
Given documents like:
{
name: "Article 1",
tags: ["Funny", "Rad"]
}
{
name: "Article 2",
tags: ["Cool", "Rad"]
}
{
name: "Article 3",
tags: ["Rad"]
}
Here are some example arrays and their respective results.
["Rad"] should return Article 3
["Rad", "Cool"] should return Article 2 and Article 3
["Funny", "Cool"] should return nothing, since there are no articles with only one of those tags or both
I'm sure I can pull this off with $where but I'd like to avoid that for obvious reasons.
You can do this by combining multiple operators:
db.test.find({tags: {$not: {$elemMatch: {$nin: ['Rad', 'Cool']}}}})
The $elemMatch with the $nin is finding the docs where a single tags element is neither 'Rad' nor 'Cool', and then the parent $not inverts the match to return all the docs where that didn't match any elements.
However, this will also return docs where tags is either missing or has no elements. To exclude those you need to add a qualifier that ensures tags has at least one element:
db.test.find({
tags: {$not: {$elemMatch: {$nin: ['Rad', 'Cool']}}},
'tags.0': {$exists: true}
})
The accepted answer works, but isn't optimised. Since this is the top result on Google, here's a better solution.
I went all the way back to version 2.2 in the docs, which is the oldest version available, and all of them state:
If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (e.g. <value1>, <value2>, etc.)
Source
So you can just do
db.test.find({tags: {$in: ['Rad', 'Cool']}})
which will return any entries where the tags contain either 'Rad', 'Cool', or both and use an index if available.

MongoDB "filtered" index: is it possible?

Is it possible to index some documents of the collection "only if" one of the fields to be indexed has a particular value?
Let me explain with an example:
The collection "posts" has millions of documents, ALL defined as follows:
{
    "network": "network_1",
    "blogname": "blogname_1",
    "post_id": 1234,
    "post_slug": "abcdefg"
}
Let's assume that the distribution of the post is equally split on network_1 and network_2
My application OFTEN select the type of query based on the value of "network" (although sometimes I need the data from both networks):
For example:
www.test.it/network_1/blog_1/**postid**/1234/
-> db.posts.find ({network: "network_1" blogname "blog_1", post_id: 1234})
www.test.it/network_2/blog_4/**slug**/aaaa/
-> db.posts.find ({network: "network_2" blogname "blog_4" post_slug: "yyyy"})
I could create two separate indexes (network / blogname / post_id and network / blogname / post_slug) but I would get a huge waste of RAM, since 50% of the data in the index will never be used.
Is there a way to create an index "filtered"?
Example:
(Note the WHERE parameter)
db.posts.ensureIndex ({network: 1 blogname: 1, post_id: 1}, {where: {network: "network_1"}})
db.posts.ensureIndex ({network: 1 blogname: 1, post_slug: 1}, {where: {network: "network_2"}})
Indeed it's possible in MongoDB 3.2+ They call it partialFilterExpression where you can set a condition based on which index will be created.
Example
db.users.createIndex({ "userId": 1, "project": 1 },
{ unique: true, partialFilterExpression:{
userId: { $exists: true, $gt : { $type : 10 } } } })
Please see Partial Index documentation
As of MongoDB v3.2, partial indexes are supported. Documentation: https://docs.mongodb.org/manual/core/index-partial/
It's possible, but it requires a workaround which creates redundancy in your documents, requires you to rewrite your find-queries and limits find-queries to exact matches.
MongoDB supports sparse indexes which only index the documents where the given field exists. You can use this feature to only index a part of the collection by adding this field only to those documents you want to index.
The bad news is that sparse indexes can only include a single field. But the good news is, that this field can also contain an object with multiple fields, so you can still store all the data you want to search for in this field.
To do this, add a new field to the included documents which includes an object with the fields you search for:
{
"network": "network_1",
"blogname": "blogname_1",
"post_id": 1234,
"post_slug": "abcdefg"
"network_1_index_key": {
"blogname": "blogname_1",
"post_id": 1234
}
}
Your ensureIndex command would index the field network_1_index_key:
db.posts.ensureIndex( { network_1_index_key: 1 }, { sparse: true } )
A find-query which is supposed to use this index, must now query for the exact object of the field network_1_index_key:
db.posts.find ({
network_1_index_key: {
blogname: "blogname_1",
post_id: 1234
}
})
Doing this would likely only make sense when the documents you want to index are a very small part of the collection. When its about half, I would just create a regular index and live with it because the larger document-size could mitigate the gains from the reduced index size.
You can try create index on all field (network / blogname / post_id / post_slug)