Mongo indexing on object arrays vs objects - mongodb

I'm implementing a contact database that handles quite a few fields. Most of them are predefined and can be considered bound, but there are a couple that aren't. We'll call one of these fields 'groups'. The way we currently implement it is (each document/contact has 'groups' field):
'groups' : {
152 : 'hi',
111 : 'group2'
}
but after some reading I've it would seem I should be doing it:
'groups' : [
{ 'id' : 152, 'name' : 'hi' },
{ 'id' : 111, 'name' : 'group2' }
...
]
and then apply the index db.contact.ensureIndex({'groups.id':1});
My question is in regard to functionality. What are the differences between the 2 structures and how is the index actually built (is it simply indexing within each document/contact or is it building a full-scale index that has all the groups from all the documents/contacts?).
I'm kind of going in under the assumption that this is structurally the best way, but if I'm incorrect, let me know.

Querying will certainly be a lot easier in the second case, where 'groups' is an array of sub-documents, each with an 'id' and a 'name'.
Mongo does not support "wildcard" queries, so if your documents were structured the first way and you wanted to find a sub-document with the value "hi", but did not know that the key was 152, you would not be able to do it. With the second document structure, you can easily query for {"groups.name":"hi"}.
For more information on querying embedded objects, please see the documentation titled "Dot Notation (Reaching into Objects)" http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
The "Value in an Array" and "Value in an Embedded Object" sections of the "Advanced Queries" documentation are also useful:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanArray
For an index on {'groups.id':1}, an index entry will be created for every "id" key in every "groups" array in every document. With an index on "groups", only one index entry will be created per document.
If you have documents of the second type, and an index on groups, your queries will have to match entire sub-documents in order to make use of the index. For example, given the document:
{ "_id" : 1, "groups" : [ { "id" : 152, "name" : "hi" }, { "id" : 111, "name" : "group2" } ] }
The query
db.<collectionName>.find({groups:{ "id" : 152, "name" : "hi" }})
will make use of the index, but the queries
db.<collectionName>.find({"groups":{$elemMatch:{name:"hi"}}})
or
db.<collectionName>.find({"groups.name":"hi"})
will not.
The index(es) that you create should depend on which queries you will most commonly be performing.
You can experiment with which (if any) indexes your queries are using with the .explain() command. http://www.mongodb.org/display/DOCS/Explain The first line, "cursor" will tell you which index is being used. "cursor" : "BasicCursor" indicates that a full collection scan is being performed.
There is more information on indexing in the documentation:
http://www.mongodb.org/display/DOCS/Indexes
The "Indexing Array Elements" section of the above links to the document titled "Multikeys":
http://www.mongodb.org/display/DOCS/Multikeys
Hopefully this will improve your understanding of how to query on embedded documents, and how indexes are used. Please let us know if you have any follow-up questions!

Related

MongoDB match a subdocument inside array (not positional reference)

My MongoDB has a key-value pair structure, inside my document has a data field which is an array that contains many subdocuments of two fields: name and value.
How do I search for a subdocument e.g ( {"name":"position", "value":"manager"}) and also multiple (e.g. {"name":"age", "value" : {$ge: 30}})
EDIT: I am not looking for a specific subdocument as I mentioned in title (not positional reference), rather, I want to retrieve the entire document but I need it to match the two subdocuments exactly.
Here are 2 queries to find the following record:
{
"_id" : ObjectId("sometobjectID"),
"data" : [
{
"name" : "position",
"value" : "manager"
}
]
}
// Both value and name (in the same record):
db.demo.find({$elemMatch: {"value": "manager", "name":"position"}})
// Both value and name (not necessarily in the same record):
db.demo.find({"data.value": "manager", "data.name":"position"})
// Just value:
db.demo.find({"data.value": "manager"})
Note how the . is used, this works for all subdocuments, even if they are in an array.
You can use any operator you like here, including $gte
edit
$elemMatch added to answer because of #Veeram's response
This answer explains the difference between $elemMatch and .

MongoDB finding an item in an array and how to index properly

I have documents stored like so:
{
...
"users" : [
{
"id" : "123456",
"username" : "John",
"address" : "fake st",
}
],
...
}
What is the best way of being able to retrieve all the documents with the username "john". Also, what are the proper ways of indexing this for performance considering its inside an array. Do I want to index "users", or is there a better way? This is inside a database with 50+ million documents.
To find all the documents which contain at least one "john" in the users array, use db.collection.find({"users.username":"john"});
To make this faster, create an index with db.collection.createIndex({"users.username":1});. Indexes in MongoDB can reach inside arrays and index individual array entries (this is called a multikey index).

Search full document in mongodb for a match

Is there a way to match a value with every array and sub document inside the document in mongodb collection and return the document
{
"_id" : "2000001956",
"trimline1" : "abc",
"trimline2" : "xyz",
"subtitle" : "www",
"image" : {
"large" : 0,
"small" : 0,
"tiled" : 0,
"cropped" : false
},
"Kytrr" : {
"count" : 0,
"assigned" : 0
}
}
for eg if in the above document I am searching for xyz or "ab" or "xy" or "z" or "0" this document should be returned.
I actually have to achieve this at the back end using C# driver but a mongo query would also help greatly.
Please advice.
Thanks
You could probably do this using '$where'
db.mycollection({$where:"JSON.stringify(this).indexOf('xyz')!=-1"})
I'm converting the whole record to a big string and then searching to see if your element is in the resulting string. Probably won't work if your xyz is in the fieldnames!
You can make it iterate through the fields to make a big string and then search it though.
This isn't the most elegant way and will involve a full tablescan. It will be faster if you look through the individual fields!
While Malcolm's answer above would work, when your collection gets large or you have high traffic, you'll see this fall over pretty quickly. This is because of 2 things. First, dropping down to javascript is a big deal and second, this will always be a full table scan because $where can't use an index.
MongoDB 2.6 introduced text indexing which is on by default (it was in beta in 2.4). With it, you can have a full text index on all the fields in the document. The documentation gives the following example where a text index is created for every field and names the index "TextIndex".
db.collection.ensureIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)

Querying sub array with $where

I have a collection with following document:
{
"_id" : ObjectId("51f1fd2b8188d3117c6da352"),
"cust_id" : "abc1234",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 27,
"items" : [{
"sku" : "mmm",
"qty" : 5,
"price" : 2.5
}, {
"sku" : "nnn",
"qty" : 5,
"price" : 2.5
}]
}
I want to use "$where" in the fields of "items", so something like this:
{$where:"this.items.sku==mmm"}
How can I do it? It works when the field is not of array type.
You don't need a $where operator to do this; just use a query object of:
{ "items.sku": mmm }
As for why your $where isn't working, the value of that operator is executed as JavaScript, so that's not going to check each element of the items array, it's just going to treat items as a normal object and compare its sku property (which is undefined) to mmm.
You are comparing this.items.sku to a variable mmm, which isn't initialized and thus has the value unefined. What you want to do, is iterate the array and compare each entry to the string 'mmm'. This example does this by using the array method some which returns true, when the passed function returns true for at least one of the entries:
{$where:"return this.items.some(function(entry){return entry.sku =='mmm'})"}
But really, don't do this. In a comment to the answer by JohnnyHK you said "my service is just a interface between user and mongodb, totally unaware what the field client want's to store". You aren't really explaining your use-case, but I am sure you can solve this better.
The $where operator invokes the Javascript engine even though this
trivial expression could be done with a normal query. This means unnecessary performance overhead.
Every single document in the collection is passed to the function, so when you have an index, it can not be used.
When the javascript function is generated from something provided by the client, you must be careful to sanetize and escape it properly, or your application gets vulnerable to code injection.
I've been reading through your comments in addition to the question. It sounds like your users can generically add some attributes, which you are storing in an array within a document. Your client needs to be able to query an arbitrary pair from the document in a generic manner. The pattern to achieve this is typically as follows:
{
.
.
attributes:[
{k:"some user defined key",
v:"the value"},
{k: ,v:}
.
.
]
}
Note that in your case, items is attributes. Now to get the document, your query will be something like:
eg)
db.collection.find({attributes:{$elemMatch:{k:"sku",v:"mmm"}}});
(index attributes.k, attributes.v)
This allows your service to provide a way to query the data, and letting the client specify what the k,v pairs are. The one caveat with this design is always be aware that documents have a 16MB limit (unless you have a use case that makes GridFS appropriate). There are functions like $slice which may help with controlling this.

How do you get around missing values in a unique index using mongo db?

The mongo documentation states that "When a document is saved to a collection with unique indexes, any missing indexed keys will be inserted with null values. Thus, it won't be possible to insert multiple documents missing the same indexed key."
So is it impossible to create a unique index on an optional field? Should I create a compound index with say a userId as well to solve this? In my specific case I have a user collection that has an optional embedded oauth object.
e.g.
>db.users.ensureIndex( { "name":1, "oauthConnections.provider" : 1, "oauthConnections.providerId" : 1 } );
My sample user
{ name: "Bob"
,pwd: "myPwd"
,oauthConnections [
{
"provider":"Facebook",
"providerId" : "12345",
"key":"blah"
}
,{
"provider":"Twitter",
"providerId" : "67890",
"key":"foo"
}
]
}
I believe that this is possible: You can have an index that is sparse and unique. This way, non-existant values never make it to the index, hence they can't be duplicate.
Caveat: This is not possible with compound indexes. I'm not quite sure about your question. Your citing a part of the documentation that concerns compound indexes -- there, missing values will be inserted, but from your question I guess you're not looking for a solution w/ compound indexes?
Here's a sample:
> db.Test.insert({"myId" : "1234", "string": "foo"});
> show collections
Test
system.indexes
>
> db.Test.find();
{ "_id" : ObjectId("4e56e5260c191958ad9c7cb1"), "myId" : "1234", "string" : "foo" }
>
> db.Test.ensureIndex({"myId" : 1}, {sparse: true, unique: true});
>
> db.Test.insert({"myId" : "1234", "string": "Bla"});
E11000 duplicate key error index: test.Test.$myId_1 dup key: { : "1234" }
>
> db.Test.insert({"string": "Foo"});
> db.Test.insert({"string": "Bar"});
> db.Test.find();
{ "_id" : ObjectId("4e56e5260c191958ad9c7cb1"), "myId" : "1234", "string" : "foo" }
{ "_id" : ObjectId("4e56e5c30c191958ad9c7cb4"), "string" : "Foo" }
{ "_id" : ObjectId("4e56e5c70c191958ad9c7cb5"), "string" : "Bar" }
Also note that compound indexes can't be sparse
It is not impossible to index an optional field. The docs are talking about a unique index. Once you've specified a unique index, you can only insert one document per value for that field, even if that value is null.
If you want a unique index on an optional field but still allow multiple nulls, you could try making the index both unique and sparse, although I have no idea if that's possible. I couldn't find an answer in the documentation.
There's no good way to uniquely index an optional field. You can either fill it with a default (the _id on the user would work), let your access layer enforce uniqueness, or change your "schema" a bit.
We have a separate collection for oauth login tokens, partially for this reason. We never really need to access those in a context where having them as embedded docs is an obvious win. If this is a relatively easy change to make, it's probably your best bet.
----edit----
As the other answers points, you can achieve this with a sparse index. It's even a documented use. You should probably accept one of those answers instead of mine.
http://www.mongodb.org/display/DOCS/Indexes#Indexes-SparseIndexes