How to annotate the result documents with a count of child documents? - mongodb

I have a collection of categories, each category document containing a link to its parent (except the root categories). Pretty simple so far.
I want to list the categories, and add a subcategory_count field to every document with the count of direct descendants.
How should I go about doing this? Could Map/Reduce be of use?

There are no "calculated columns" in MongoDB, so you can't select data and count subdocuments at the same time.
This is also the reason why most people store array length along with the array.
{friends_list: [1, 3, 234, 555],
friends_count: 4}
This helps for easier retrieval, filtering, sorting, etc. But it requires a little bit more of manual work.
So, you are basically limited to these options:
Store everything in one document.
Store subcategory count in the category.
Count subcategories on the client-side.

find() to get all of them, count the number of subcategories on each level and then update().
But it seems like your domain object should be doing this for you, so you end up with one category object that can contain categories (which could also contain categories...), and hence one mongodb document (you're listing all of them anyway, so it makes sense to retrive the whole thing in one query).

Related

Mongoose, how to handle document category/type fields?

Let's say I have an item model that has the following fields: _id, title, category, sub_category. What would be a more efficient or better way to handle the category and sub_category fields?
The most obvious one would be just to make them as String types and add an enum validation. The other way would to make two separate collections of Categories and child collection SubCategories and then reference the sub category in the item which will also lead to the actual category.
From my understanding, when querying, it is more efficient to search by ObjectId rather than by String types, as well as that, it is better for indexing. However, this method also requires the use of .populate() which would slow down the querying.
So in the end, what would be more efficient? Or is one approach is better even if it less efficient?
Best way to implement it in a Graph DB,
Every element is category
now relationship is required between two categories
category1 <--- [sub-category-of] --- category2
https://www.mongodb.com/presentations/mongodb-days-silicon-valley-implementing-graph-databases-with-mongodb

View MongoDB array in order of indices with Compass

I am working on a database of Polish verbs and I'd like to find out how to display my results such that each verb conjugation appears in the following order: 1ps (1st person singular), 2ps, 3ps, 1ppl (1st person plural, etc.), 2ppl, 3ppl. It displays fine when I insert documents:
verb "żyć/przeżyć" conjugation as array and nested document
But when I go to perform queries it jumbles all the array elements up, in the first case (I want to see them in order of array indices), and sorts the nested document elements into alphabetical order (whereas I want to see them in the order in which they were inserted).
verb "żyć/przeżyć" conjugation array/document query
This should be an easy one to solve, I hope this comes across as a reasonable beginner's question. I have searched for answers but couldn't find much info on this topic. Any and all help is greatly appreciated!
Cheers,
LC.
Your screenshots highlight two different views in MongoDB Compass.
The Schema view is based on a sampling of multiple documents and the order of the fields displayed cannot be specified. The schema analysis (as at Compass 1.7) lists fields in case-insensitive alphabetical order with the _id field at the top. Since this is an aggregate schema view based on multiple documents, the ordering of fields is not expected to reflect individual document order.
If you want to work with individual documents and field ordering you need to use the Documents view, as per your second screenshot. In addition to displaying the actual documents, this view allows you to include sort and skip options for queries:

Mongo/No SQL solution to second tier data query?

I have an existing PostgreSQL database, which contains roughly 500,000 entries each of which is essentially a category in a huge tree of categories (each category has different schemas of elements).
I also have a MySQL database, which contains roughly 100,000 documents, each of which can be categories in one or more categories.
I need to be able to search for documents, which match attribute filters which are set in the categories the document is linked to.
As I understand it, I'd have to store all the data relating to all the categories a document links to, in each document, in mongo, and that just seems insane. How can I make this work?
As an example, imagine a category, which represents a red car, made in 1964, and a document which was written in 1990 about that red car. I need to be able to search for 1964 and fine the document about the car, as well as the car itself.
n:m relations in MongoDB can be expressed with arrays of database referencs (DBRef) or arrays of object IDs.
So each document would have a field "categories" which has an array with the IDs or database references of the categories it belongs to.
See this article for further information:
http://docs.mongodb.org/manual/applications/database-references/
An alternative which avoids to perform multiple database queries just to show the names of the categories would be to put the category names in that array instead of the IDs. Then you should also add an index (with the ensureIndex function) to the name field of your category collection for faster lookup (you might want to create a unique index on this field anyway to avoid duplicate category names).
About the data an object has because it belongs to a category, like cars having a manufacturer and a document having a list of other objects mentioned in the document: this data should be put directly into the document of the object. The advantage of a document-oriented database is that not every entity must have the same fields.

Mongo query for number of items in a sub collection

This seems like it should be very simple but I can't get it to work. I want to select all documents A where there are one or more B elements in a sub collection.
Like if a Store document had a collection of Employees. I just want to find Stores with 1 or more Employees in it.
I tried something like:
{Store.Employees:{$size:{$ne:0}}}
or
{Store.Employees:{$size:{$gt:0}}}
Just can't get it to work.
This isn't supported. You basically only can get documents in which array size is equal to the value. Range searches you can't do.
What people normally do is that they cache array length in a separate field in the same document. Then they index that field and make very efficient queries.
Of course, this requires a little bit more work from you (not forgetting to keep that length field current).

How to deal with Many-to-Many relations in MongoDB when Embedding is not the answer?

Here's the deal. Let's suppose we have the following data schema in MongoDB:
items: a collection with large documents that hold some data (it's absolutely irrelevant what it actually is).
item_groups: a collection with documents that contain a list of items._id called item_groups.items plus some extra data.
So, these two are tied together with a Many-to-Many relationship. But there's one tricky thing: for a certain reason I cannot store items within item groups, so -- just as the title says -- embedding is not the answer.
The query I'm really worried about is intended to find some particular groups that contain some particular items (i.e. I've got a set of criteria for each collection). In fact it also has to say how much items within each found group fitted the criteria (no items means group is not found).
The only viable solution I came up with this far is to use a Map/Reduce approach with a dummy reduce function:
function map () {
// imagine that item_criteria came from the scope.
// it's a mongodb query object.
item_criteria._id = {$in: this.items};
var group_size = db.items.count(item_criteria);
// this group holds no relevant items, skip it
if (group_size == 0) return;
var key = this._id.str;
var value = {size: group_size, ...};
emit(key, value);
}
function reduce (key, values) {
// since the map function emits each group just once,
// values will always be a list with length=1
return values[0];
}
db.runCommand({
mapreduce: item_groups,
map: map,
reduce: reduce,
query: item_groups_criteria,
scope: {item_criteria: item_criteria},
});
The problem line is:
item_criteria._id = {$in: this.items};
What if this.items.length == 5000 or even more? My RDBMS background cries out loud:
SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs)
is definitely not a good way to go.
Thank you sooo much for your time, guys!
I hope the best answer will be something like "you're stupid, stop thinking in RDBMS style, use $its_a_kind_of_magicSphere from the latest release of MongoDB" :)
I think you are struggling with the separation of domain/object modeling from database schema modeling. I too struggled with this when trying out MongoDb.
For the sake of semantics and clarity, I'm going to substitute Groups with the word Categories
Essentially your theoretical model is a "many to many" relationship in that each Item can belong Categories, and each Category can then possess many Items.
This is best handled in your domain object modeling, not in DB schema, especially when implementing a document database (NoSQL). In your MongoDb schema you "fake" a "many to many" relationship, by using a combination of top-level document models, and embedding.
Embedding is hard to swallow for folks coming from SQL persistence back-ends, but it is an essential part of the answer. The trick is deciding whether or not it is shallow or deep, one-way or two-way, etc.
Top Level Document Models
Because your Category documents contain some data of their own and are heavily referenced by a vast number of Items, I agree with you that fully embedding them inside each Item is unwise.
Instead, treat both Item and Category objects as top-level documents. Ensure that your MongoDb schema allots a table for each one so that each document has its own ObjectId.
The next step is to decide where and how much to embed... there is no right answer as it all depends on how you use it and what your scaling ambitions are...
Embedding Decisions
1. Items
At minimum, your Item objects should have a collection property for its categories. At the very least this collection should contain the ObjectId for each Category.
My suggestion would be to add to this collection, the data you use when interacting with the Item most often...
For example, if I want to list a bunch of items on my web page in a grid, and show the names of the categories they are part of. It is obvious that I don't need to know everything about the Category, but if I only have the ObjectId embedded, a second query would be necessary to get any detail about it at all.
Instead what would make most sense is to embed the Category's Name property in the collection along with the ObjectId, so that pulling back an Item can now display its category names without another query.
The biggest thing to remember is that the key/value objects embedded in your Item that "represent" a Category do not have to match the real Category document model... It is not OOP or relational database modeling.
2. Categories
In reverse you might choose to leave embedding one-way, and not have any Item info in your Category documents... or you might choose to add a collection for Item data much like above (ObjectId, or ObjectId + Name)...
In this direction, I would personally lean toward having nothing embedded... more than likely if I want Item information for my category, i want lots of it, more than just a name... and deep-embedding a top-level document (Item) makes no sense. I would simply resign myself to querying the database for an Items collection where each one possesed the ObjectId of my Category in its collection of Categories.
Phew... confusing for sure. The point is, you will have some data duplication and you will have to tweak your models to your usage for best performance. The good news is that that is what MongoDb and other document databases are good at...
Why don't use the opposite design ?
You are storing items and item_groups. If your first idea to store items in item_group entries then maybe the opposite is not a bad idea :-)
Let me explain:
in each item you store the groups it belongs to. (You are in NOSql, data duplication is ok!)
for example, let's say you store in item entries a list called groups and your items look like :
{ _id : ....
, name : ....
, groups : [ ObjectId(...), ObjectId(...),ObjectId(...)]
}
Then the idea of map reduce takes a lot of power :
map = function() {
this.groups.forEach( function(groupKey) {
emit(groupKey, new Array(this))
}
}
reduce = function(key,values) {
return Array.concat(values);
}
db.runCommand({
mapreduce : items,
map : map,
reduce : reduce,
query : {_id : {$in : [...,....,.....] }}//put here you item ids
})
You can add some parameters (finalize for instance to modify the output of the map reduce) but this might help you.
Of course you need to have another collection where you store the details of item_groups if you need to have it but in some case (if this informations about item_groups doe not exist, or don't change, or you don't care that you don't have the most updated version of it) you don't need them at all !
Does that give you a hint about a solution to your problem ?