Optimizing mongo queries - _id or traverse whole collection - mongodb

I'm using mongodb for a project. Need to know which would be a better implementation for queries.
Consider I have to search for 10 documents out of a total 1000 documents based on a condition (not id).
Would it be better to query using document _id's (after storing the required id's in another collection beforehand by checking for the condition whenever insertion is done)
OR
Would it better to traverse all the documents and get the required documents using the condition
The main aim here is to split documents into different categories and display the documents belonging to a particular category. So storing id's of documents belonging to each category or search for documents in that category by traversing through all the documents?
I have heard that mongodb uses hashed indexing (so feel option 1 would be faster), but I couldnt find anything regarding that. So a small description regarding document storage and queries would also be good.

The optimum way to query for the cuisine type example would be to store what the restaurant serves in an array of strings or objects, and index that field.
For example:
{
name: "International House"
cuisine: [
{ name: "Chinese", subtype: "Kowloon"},
{ name: "Japanese", subtype: "Yakitori"},
{ name: "American", subtype: "TexMex" }
]
}
Then create an index on { "cuisine.name": 1 }.
When you need to find all restaurants that serve Chinese food, the query:
db.collection.find({"cuisine.name":"Chinese")
will use that index, and only scan the documents that match.

Related

How to create a collection in a document in MongoDB?

I am using MongoDB for the first time, and I have some experience with NoSQL databases.
I am attempting to replicate behaviour that I have managed to achieve on Google's Cloud Firestore:
I want to create a collection within a document. I have not been able to replicate this behaviour using MongoDB as I cannot find code in the documentation. Is this behaviour even possible please?
Thanks in advance.
Edit:
Here is a screenshot of a sample document in biometric_data :
MongoDB has embedded documents which can be used to store the same data. You can try creating an array of sub-documents (each having name and data property):
{
name: "",
email: "",
...otherFields,
biometric_data: [
{
name: "glucose",
data: {
preferred_unit: "mg/dL"
// Add new properties as required
}
},
{
name: "weight",
data: {
preferred_unit: "KG"
}
}
],
...templateData
}
However, a document's size in MongoDB cannot exceed 16 MB. If number of fields in biometric_data are limited then you can use sub-documents otherwise you might have to create another collection to store those as documents (generally preferred for chat apps or where number of sub-documents can be really high).
Sub-collections (in Firestore) allow you to structure data hierarchically, making data easier to access. For example, users and posts collections can be structured in either of the ways below:
With sub-collection
users -> {userId} -> posts -> {postId}
Root level collections
users -> {userId}
posts -> {postId}
Though if you use root level collections, you must add a userId in posts document to identify who the owner of a post is.
If you use nested documents way in MongoDB, you are likely to hit the 16 MB document limit if any of the users decides to add many posts. Similarly if the biometric_data array can have many documents, it'll be best to create another collection.
Firestore's sub-collections and documents do not count towards 1 MB max doc size of parent document but nested documents in MongoDB do.
Also checkout:
Firestore - proper NoSQL structure for user-specific data
Is mongodb sub documents equivalent to Firestore subcollections?

Mongodb: searching embedded documents by the '_id' field

If I have a data with a structure like this as a single document in a collection:
{
_id: ObjectId("firstid"),
"name": "sublimetest",
"child": {
_id: ObjectId("childid"),
"name": "materialtheme"
}
}
is there a way to search for the embedded document by the id "childid" ?
because mongo doesn't index the _id fields of embedded documents (correct me if I am wrong here),
as this query doesn't work :
db.collection.find({_id:"childid"});
Also please suggest me if there is any other document database that would be suitable for this kind of retreiving data that is structured as a tree, where the requirement is to :
query children without having to issue joins
find any node in the tree as fast as you would find the root node, as if all these nodes were stored as separate documents in a collection.
Why this is not a duplicate of question(s) suggested :
the potential-duplicate-question, queries document by using dot notation. But what if the document is nested 7 levels deep ? In such case it would not be suitable to write a query using dot notation. what I want is that, all documents, whether top level, or nested, if they have the _id field, should be in the bucket of _id indexes, so that when you search db.collection.find({_id: "asdf"}), it should take into account documents that are nested too that have the _id field matching "asdf". In short, it should be as if the inner document weren't nested, but present parallel to the outer one.
You can use the dot notation:
db.posts.find({"child._id": "childid"})

How to query all documents, filter for a specific field and return the value for each document in Elasticsearch?

I'm currently running an Elasticsearch instance which is synchronizing from a MongoDB via river. The MongoDB contains entries like this:
{field1: "value1", field2: "value2", cars: ["BMW", "Ford", "Porsche"]}
Not every entry in Mongo does have a cars field.
Now I want to create an ElasticSearch query which is searching over every document and return just the cars field from every single document indexed in Elasticsearch.
Is it even possible? Elasticsearch must touch every single document to return the cars field. Maybe querying with Mongo is just easier and as fast as Elasticsearch. What do you think?
The following query POSTed to hostname:9200/_search should get you started:
{
"filter": {
"exists": {
"field": "cars"
}
},
"fields": ["cars"]
}
The filter clause limits the results to documents with a cars field.
The fields clause says to only return the cars field. If you wanted the entire document returned, you would leave this section out.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_response_filtering
Make elasticsearch only return certain fields?
Elasticsearch (from my understanding) is not intended to be a SSoT database. It is very good at text searching, and analytics aggregations, but it isn't necessarily intended to be your primary database.
However, your use case isn't necessarily non performant in elasticsearch, it sounds like you just want to filter for your cars field, which you can do as documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fields.html
Lastly, I would actually venture that elasticsearch is faster than mongo in this case (assuming that the cars field is NOT indexed and elasticsearch is, which is their respective defaults), since you probably want to filter out the case in which the cars field is not set.
tl;dr elasticsearch isn't intended for your particularly use-case, but it probably is faster than mongo assuming you filter out the cars field being 'missing'

Mongo _id for subdocument array

I wish to add an _id as property for objects in a mongo array.
Is this good practice ?
Are there any problems with indexing ?
I wish to add an _id as property for objects in a mongo array.
I assume:
{
g: [
{ _id: ObjectId(), property: '' },
// next
]
}
Type of structure for this question.
Is this good practice ?
Not normally. _ids are unique identifiers for entities. As such if you are looking to add _id within a sub-document object then you might not have normalised your data very well and it could be a sign of a fundamental flaw within your schema design.
Sub-documents are designed to contain repeating data for that document, i.e. the addresses or a user or something.
That being said _id is not always a bad thing to add. Take the example I just stated with addresses. Imagine you were to have a shopping cart system and (for some reason) you didn't replicate the address to the order document then you would use an _id or some other identifier to get that sub-document out.
Also you have to take into consideration linking documents. If that _id describes another document and the properties are custom attributes for that document in relation to that linked document then that's okay too.
Are there any problems with indexing ?
An ObjectId is still quite sizeable so that is something to take into consideration over a smaller, less unique id or not using an _id at all for sub-documents.
For indexes it doesn't really work any different to the standard _id field on the document itself and a unique index across the field should work across the collection (scenario dependant, test your queries).
NB: MongoDB will not add an _id to sub-documents for you.

MongoDB - Query embbeded documents

I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.