MongoDB finding an item in an array and how to index properly - mongodb

I have documents stored like so:
{
...
"users" : [
{
"id" : "123456",
"username" : "John",
"address" : "fake st",
}
],
...
}
What is the best way of being able to retrieve all the documents with the username "john". Also, what are the proper ways of indexing this for performance considering its inside an array. Do I want to index "users", or is there a better way? This is inside a database with 50+ million documents.

To find all the documents which contain at least one "john" in the users array, use db.collection.find({"users.username":"john"});
To make this faster, create an index with db.collection.createIndex({"users.username":1});. Indexes in MongoDB can reach inside arrays and index individual array entries (this is called a multikey index).

Related

Return MongoDB documents that don't contain specific inner array items

How can I return a set of documents, each not containing a specific item in an inner array?
My data scheme is:
Posts:
{
"_id" : ObjectId("57f91ec96241783dac1e16fe"),
"votedBy" : [
{
"userId" : "101",
"vote": 1
},
{
"userId" : "202",
"vote": 2
}
],
"__v" : NumberInt(0)
}
I want to return a set of posts, non of which contain a given userId in any of the votedBy array items.
The official documentation implies that this is possible:
MongoDB documentation: Field with no specific array index
Though it returns an empty set (for the more simple case of finding a document with a specific array item).
It seems like I have to know the index for a correct set of results, like:
votedBy.0.userId.
This Question is the closest I found, with this solution (Applied on my scheme):
db.collection.find({"votedBy": { $not: {$elemMatch: {userId: 101 } } } })
It works fine if the only inner document in the array matches the one I wish not to return, but in the example case I specified above, the document returns, because it finds the userId=202 inner document.
Just to clarify: I want to return all the documents, that NONE of their votedBy array items have the given userId.
I also tried a simpler array, containing only the userId's as an array of Strings, but still, each of them receives an Id and the search process is just the same.
Another solution I tried is using a different collection for uservotes, and applying a lookup to perform a SQL-similar join, but it seems like there is an easier way.
I am using mongoose (node.js).
User $ne on the embedded userId:
db.collection.find({'votedBy.userId': {$ne: '101'}})
It will filter all the documents with at least one element of userId = "101"

Is it possible to make a "not modify " constrain on MongoDb subdocuments at creation?

I'd like to make a specific subdocument value from a MondoDb document fixed, so it can not be possible to modify it at a next update, or any other MongoDb operations that can modify documents.
For example, if a document like the one bellow is inserted, I will like that "eyesColor" value can not be changed.
{
"id" : "someId",
"name": "Jane",
"eyesColor" : "blue"
}
A possible update can be:
{
"id" : "someId",
"name": "Amy",
"eyesColor" : "green"
}
And the result I need after this update is :
{
"id" : "someId",
"name": "Amy",
"eyesColor" : "blue"
}
I'd like to do this because the possibility of using $set and $unset operators is not present in the project I'm creating. A read on the existing document before the update, in order to get the value of the subdocument ("eyesColor") will decrease the performance of the application I work on.
Actually the constrain I need is similar to the fixed size on collections (capped collections). The difference is that it is on a subdocument instead of collection and on the value contained in the subdocument instead of the size.
Is there any solution to this type of constrain?
There are no constraints in MongoDB (only exception: unique indexes). There is no way to make fields "read-only" on the database-layer.
When you want to use upsert's (db.collection.update with upsert: true) which add certain fields on inserting new documents but don't affect these fields on updates of existing documents, you can place these fields behind the $setOnInsert-operator.

Mongo indexing on object arrays vs objects

I'm implementing a contact database that handles quite a few fields. Most of them are predefined and can be considered bound, but there are a couple that aren't. We'll call one of these fields 'groups'. The way we currently implement it is (each document/contact has 'groups' field):
'groups' : {
152 : 'hi',
111 : 'group2'
}
but after some reading I've it would seem I should be doing it:
'groups' : [
{ 'id' : 152, 'name' : 'hi' },
{ 'id' : 111, 'name' : 'group2' }
...
]
and then apply the index db.contact.ensureIndex({'groups.id':1});
My question is in regard to functionality. What are the differences between the 2 structures and how is the index actually built (is it simply indexing within each document/contact or is it building a full-scale index that has all the groups from all the documents/contacts?).
I'm kind of going in under the assumption that this is structurally the best way, but if I'm incorrect, let me know.
Querying will certainly be a lot easier in the second case, where 'groups' is an array of sub-documents, each with an 'id' and a 'name'.
Mongo does not support "wildcard" queries, so if your documents were structured the first way and you wanted to find a sub-document with the value "hi", but did not know that the key was 152, you would not be able to do it. With the second document structure, you can easily query for {"groups.name":"hi"}.
For more information on querying embedded objects, please see the documentation titled "Dot Notation (Reaching into Objects)" http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
The "Value in an Array" and "Value in an Embedded Object" sections of the "Advanced Queries" documentation are also useful:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanArray
For an index on {'groups.id':1}, an index entry will be created for every "id" key in every "groups" array in every document. With an index on "groups", only one index entry will be created per document.
If you have documents of the second type, and an index on groups, your queries will have to match entire sub-documents in order to make use of the index. For example, given the document:
{ "_id" : 1, "groups" : [ { "id" : 152, "name" : "hi" }, { "id" : 111, "name" : "group2" } ] }
The query
db.<collectionName>.find({groups:{ "id" : 152, "name" : "hi" }})
will make use of the index, but the queries
db.<collectionName>.find({"groups":{$elemMatch:{name:"hi"}}})
or
db.<collectionName>.find({"groups.name":"hi"})
will not.
The index(es) that you create should depend on which queries you will most commonly be performing.
You can experiment with which (if any) indexes your queries are using with the .explain() command. http://www.mongodb.org/display/DOCS/Explain The first line, "cursor" will tell you which index is being used. "cursor" : "BasicCursor" indicates that a full collection scan is being performed.
There is more information on indexing in the documentation:
http://www.mongodb.org/display/DOCS/Indexes
The "Indexing Array Elements" section of the above links to the document titled "Multikeys":
http://www.mongodb.org/display/DOCS/Multikeys
Hopefully this will improve your understanding of how to query on embedded documents, and how indexes are used. Please let us know if you have any follow-up questions!

How to Retrieve any element value from mongoDB?

Suppose I have following collection :
{ _id" : ObjectId("4f1d8132595bb0e4830d15cc"),
"Data" : "[
{ "id1": "100002997235643", "from": {"name": "Joannah" ,"id": "100002997235643"} , "label" : "test" } ,
{ "id1": "100002997235644", "from": {"name": "Jon" ,"id": "100002997235644"} , "label" : "test1" }
]" ,
"stat" : "true"
}
How can I retrieve id1 , name , id ,label or any other element?
I am able to get _id field , DATA (complete array) but not the inner elements in DATA.
You cannot query for embedded structures. You always query for top level documents. If you want to query for individual elements from your array you will have to make those element top level documents (so, put them in their own collection) and maintain an array of _ids in this document.
That said, unless the array becomes very large it's almost always more efficient to simply grab your entire document and find the appropriate element in your app.
I don't think you can do that. It is explained here.
If you want to access specific fields, then following MongoDB Documentation,
you could add a flag parameter to your query, but you should redesign your documents for this to be useful:
Field Selection
In addition to the query expression, MongoDB queries can take some additional arguments. For example, it's possible to request only certain fields be returned. If we just wanted the social security numbers of users with the last name of 'Smith,' then from the shell we could issue this query:
// retrieve ssn field for documents where last_name == 'Smith':
db.users.find({last_name: 'Smith'}, {'ssn': 1});
// retrieve all fields *except* the thumbnail field, for all documents:
db.users.find({}, {thumbnail:0});

Can a MongoDB collection have inside it another collection?

I need to store a recursive tree structure. A linked list.
So all the objects are the same. Each has a pointer to a parent object and each has an array of child objects.
Can I store such a structure in Mongo.
i.e. A Mongo collection of parent objects, each object holds within it a Mongo collection of child objects.
$a = $MyCollection->findOne(**some conditions)->Childs->find(...)
You cant store collections in collections. But you can store ids that reference objects in other collections. You would have to resolve the id to the document or element and then if that element stores more ids you would need to resolve those on and on. Documents are meant to be rich and duplicate data but in the docs they do explain that instead of embedding you can just use ids
MongoDB can store subdocuments:
Node
{
"value" : "root"
"children" : [ { "value" : "child1", "children" : [ ... ] },
{ "value" : "child2", "children" : [ ... ] } ]
}
However, I don't recommend to use subdocuments for tree structures or anything that is rather complex. Subdocuments are not first-level citizens; they are not collection items.
For instance, suppose you wanted to be able to quickly find the nodes with a given value. Through an index on value, that lookup would be fast. However, if the value is in a subdocument, it won't be indexed because it is not a collection element's value.
Therefore, it's usually better to do the serialization manually and store a list of ids instead:
Node
{
"_id" : ObjectId("..."),
"parentId" : ObjectId("..."), // or null, for root
}
You'll have to do some of the serialization manually to fetch the respective element's ids.
Hint
Suppose you want to fetch an entire branch of the tree. Instead of storing only the direct parent id, you can store all ancestor ids instead:
"ancestorIds": [id1, id2, id3]