mongoDB many to many with one query? - mongodb

in mysql i use JOIN and one query is no problem. what about mongo?
imagine categories and products.
products may have more categories.
categories may have more product.
(many to many structure)
and administrator may edit categories in administration (categories must be separated)
its possible write product with categories names in one query?
i used this structure
categories {
name:"categoryName",
product_id:["4b5783300334000000000aa9","5783300334000000000aa943","6c6793300334001000000006"]
}
products {
name:"productName",
category_id:["4b5783300334000000000bb9","5783300334000000000bb943","6c6793300334001000000116"]
}
now i can simply get all product categories, and product in some category and categories alone for editation. but if i want write product with categories names i need two queries - one to get product categories id and second to get categories names from categories by that ids.
is this the right way? or this structure is unsuitable? i would like to have only one query but i dont know if its possible.

Yep, MongoDB is specifically bad at this particular type of operation. However, it's also a matter of scope. If you have 30 million products and you want to join the 3 million Products to their Category, you'll find that it's not very quick in many Databases (even though it's one line of code).
What MongoDB requires here is some de-normalization.
Your collections will probably look like this:
categories {
_id:"mongoid",
name:"categoryName",
...
}
products {
_id:"mongoid",
name:"productName",
categories:[
{"4b5783300334000000000bb9":'categoryName'},
{"5783300334000000000bb943":'CategoryName2'},
{"6c6793300334001000000116":'categoryName3'}
]
}
Here's how it will work (rough estimate, don't have my Mongo instance handy):
Grab products and their categories
db.products.find({"_id": {$in:[1,2,3]}, {name:1,categories:1})
Grab products with a certain category:
db.products.find({"categories.0": {$in:[x,y,z]}},{categories:1,name:1} }
Yes, it's not quite the same. And you will have to will have to update all of the product records when you change the name of a category. I know this seems weird, but this is standard for denormalization.
BTW, I'm assuming here that you're going to have lots of products and significantly less categories. This method is probably best if you have say 100 items in each category. If you have 5 items in each category, then you'll probably want to store more of the category information in each item.

Yes it's the only way you have. There are no join possibility.
But you can add a lot of information in your categories or product document like :
categories {
name:"categoryName",
product_id:[ {"4b5783300334000000000aa9": 'productName'},{"5783300334000000000aa943":'productName2'},{"6c6793300334001000000006":'productName3'}]
}
products {
name:"productName",
category_id:[{"4b5783300334000000000bb9":'categoryName'},{"5783300334000000000bb943":'CategoryName2',{"6c6793300334001000000116":'categoryName3'}]
}
But you need made some callback for update all document when on is change.

Related

MongoDB Querying Large Datasets

Lets say I have simple document structure like:
{
"item": {
"name": "Skittles",
"category": "Candies & Snacks"
}
}
On my search page, whenever user searches for product name, I want to have a filter options by category.
Since categories can be many (like 50 types) I cannot display all of the checkboxes on the sidebar beside the search results. I want to only show those which have products associated with it in the results. So if none of the products in search result have a category, then do not show that category option.
Now, the item search by name itself is paginated. I only show 30 items in a page. And we have tens of thousands of items in our database.
I can search and retrieve all items from all pages, then parse the categories. But if i retrieve tens of thousands of items in 1 page, it would be really slow.
Is there a way to optimize this query?
You can use different approaches based on your workflow and see what works the best in your situation. Some good candidate for the solution are
Use distinct prior to running the query on large dataset
Use Aggregation Pipeline as #Lucia suggested
[{$group: { _id: "$item.category" }}]
Use another datastore(either redis or mongo itselff) to store intelligence on categories
Finally based on the approach you choose and the inflow of requests for filters, you may want to consider indexing some fields
P.S. You're right about how aggregation works, unless you have a match filter as first stage, it will fetch all the documents and then applies the next stage.

How to design product category and products models in mongodb?

I am new to mongo db database design,
I am currently designing a restaurant products system, and my design is similar to a simple eCommerce database design where each productcategory has products so in a relational database system this will be a one (productcategory) to many products.
I have done some research I do understand that in document databses.
Denationalization is acceptable and it results in faster database reads.
therefore in a nosql document based database I could design my models this way
//product
{
name:'xxxx',
price:xxxxx,
productcategory:
{
productcategoryName:'xxxxx'
}
}
my question is this, instead of embedding the category inside of each product, why dont we embed the products inside productcategory, then we can have all the products once we query the category, resulting in this model.
//ProductCategory
{
name:'categoryName',
//array or products
products:[
{
name:'xxxx',
price:xxxxx
},
{
name:'xxxx',
price:xxxxx
}
]
}
I have researched on this issue on this page http://www.slideshare.net/VishwasBhagath/product-catalog-using-mongodb and here http://www.stackoverflow.com/questions/20090643/product-category-management-in-mongodb-and-mysql both examples I have found use the first model I described (i.e they embed the productCategory inside product rather than embed array of products inside productCategory), I do not understand why, please explain. thanks
For the DB design, you have to consider the cardinality (one-to-few, one-to-many & one-to-gazillions) and also your data access patterns (your frequent queries, updates) while designing your DB schema to make sure that you get optimum performance for your operations.
From your scenario, it looks like each Product has a category, however it also looks like you need to query to find out Products for each category.
So, in this case, you could do with something like :
Product = { _id = "productId1", name : "SomeProduct", price : "10", category : ObjectId("111") }
ProductCategory = { _id = ObjectId("111"), name : "productCat1", products : ["productId1", productId2", productId3"]}
As i said about data access patterns, if you always read the Category-name and "Category-name" is something which is very infrequently updated then you can go for denormalizing with this two-way referencing by adding the product-category-name in the product:
Product = { _id = "productId1", name : "SomeProduct", price : "10", category : { ObjectId("111"), name:"productCat1" }
So, with embedding the documents, specific queries would be faster if no joins would be required, however other queries in which you need to access embedded details as stand-alone entities would be difficult.
This is a link from MongoDB which explains DB design for one-to-many scenario like you have with very nice examples in which you would realize that there are much more ways of doing it and many other things to think about.
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3 (also has links for parts 1 & 2)
This link also describes the pros & cons for each scenario enabling you to reach a narrow down on a DB schema design.
It all depends on what queries you have in mind.
Suppose a product can only belong to one category, and one category applies to many products. Then, if you expect to retrieve the category together with the product, it makes sense to store it directly:
// products
{_id:"aabbcc", name:"foo", category:"bar"}
and if you expect to query all the products in a given category then it makes sense to create a separate collection
// categories
{_id:"bar", products=["aabbcc"]}
Remember that you cannot atomically update both the products and categories database (MongoDB is eventually consistent), but you can occasionally run a batch job which will make sure all categories are up-to-date.
I would recommend to think in terms of what kind of information will I often need, as opposed to how to normalize/denormalize this data, and make your collections reflect what you actually want.

mongodb - one document in many collections

I'am looking for a good solution on MongoDB for this problem:
There are some Category's and every Category has X items.
But some items can be in in "many" Category's!
I was looking for something like a symbolic link on Unix systems but I could't not find it.
What i thought is a good idea is:
"Category1/item1" is the Object and "category2/item44232" is only a reference to "item1" so when i change "item1" it also changes "item44232".
I looked into the MongoDB Data models documentation but there is no real solution for this.
Thank you for your response !
In RDBMSs, you use a join table to represent one-to-many relationships; in MongoDB, you use array keys. For example each product contains an array of category IDs, and both products and categories get their own collections. If you have two simple category documents
{ _id: ObjectId("4d6574baa6b804ea563c132a"),
title: "Epiphytes"
}
{ _id: ObjectId("4d6574baa6b804ea563c459d"),
title: "Greenhouse flowers"
}
then a product belonging to both categories will look like this:
{ _id: ObjectId("4d6574baa6b804ea563ca982"),
name: "Dragon Orchid",
category_ids: [ ObjectId("4d6574baa6b804ea563c132a"),
ObjectId("4d6574baa6b804ea563c459d") ]
}
for more: http://docs.mongodb.org/manual/reference/database-references/
Try looking at the problem inside-out: Instead of having items inside categories, have the items list the categories they belong into.
You'll be able to easily find all items that belong to a category (or even multiple categories), and there is no duplication nor any need to keep many instances of the same item updated.
This can be very fast and efficient, especially if you index the list of categories. Check out multikey indexes.

What is a good way to handle "complex" mongodb documents in meteor

I want to store "carpool_debts" which is basically going to hold the number of days owed to other users. It looks like this:
carpool_debts{
_id,
owner,
owner_id,
creditors:[{name,
id,
amount},
{name,
id,
amount}
]}
Does that data structure look reasonable for what I want to store? Also implementing that data structure seemed cumbersome to maintain. I found it cumbersome mainly because there isn't an upsert type of function available in meteor yet. Instead of creditors being a list of sub documents would I be better off storing the creditors as a delimited string? I would like to know if I am on the right path or if I am missing something? Thanks.
You can structure mongo documents just like you would in a relational database, for example, having separate collections for creditors and owners and using carpool_debts as a link table with the amount attached:
carpool_debts{
_id,
owner_id,
creditor_id,
amount}
creditors{
_id,
name}
owners{
_id,
name}
However, this is not using mongodb to its full potential. Especially if this is a database with masses of data, you may want to optimise it for the most used queries, otherwise it'll be slow. For example, to optimise for looking up an owner's debt, you can add the data needed right there in the owners collection, using sub documents for creditors, and sub documents again for individual debts, similar to what you've already done:
owners{
_id,
name,
creditors: {id,
name,
debts: {
amount,
due_date}
}
}
and similarly, add the debt information on the creditors collection if you often look up the outstanding debt of creditors:
creditors{
_id,
name,
debtors: {
owner_id,
owner_name,
debts: {
aount,
due_date
}
}
}
This way, you only need to look up one record to get all the information you need. Of course, there are catches. First of all, this is not very DRY, but that's intentional. But you have to remember to update the other table(s) when something changes. If you change the name of a creditor for example, you'll need to update every owner document that has debts with this creditor (make sure you index that). This of course makes updates much slower (and the database bigger), but if you don't update very often, and look up much more often, this is not going to be a problem.
Also if for example creditors can have thousands of individual outstanding debts, you may have to separate that into a link table, or rather, link collection, like this, so you don't exceed mongodb's maximum document size:
creditors{
_id,
name,
}
debtors: {
owner_id,
creditor_id,
debts: {
amount,
due_date
}
}
Then you have one document for each creditor-owner connection. This means more documents to look up when looking at a creditor, but still just one for looking up an owner.
This looks fine, but you could also consider separating creditors into its own collection and just storing an array of creditor_id's in the debts collection. That would reduce complexity and make finding and filtering information easier. And it would be more DRY since if there are multiple debts with the same creditor, you only have the creditor stored in a single place.
You could also consider just having each document in the debts collection be a single debt by an owner to a single creditor. Then you'd just have id, owner_id and creditor_id - like a link table in a relational database.

How to deal with Many-to-Many relations in MongoDB when Embedding is not the answer?

Here's the deal. Let's suppose we have the following data schema in MongoDB:
items: a collection with large documents that hold some data (it's absolutely irrelevant what it actually is).
item_groups: a collection with documents that contain a list of items._id called item_groups.items plus some extra data.
So, these two are tied together with a Many-to-Many relationship. But there's one tricky thing: for a certain reason I cannot store items within item groups, so -- just as the title says -- embedding is not the answer.
The query I'm really worried about is intended to find some particular groups that contain some particular items (i.e. I've got a set of criteria for each collection). In fact it also has to say how much items within each found group fitted the criteria (no items means group is not found).
The only viable solution I came up with this far is to use a Map/Reduce approach with a dummy reduce function:
function map () {
// imagine that item_criteria came from the scope.
// it's a mongodb query object.
item_criteria._id = {$in: this.items};
var group_size = db.items.count(item_criteria);
// this group holds no relevant items, skip it
if (group_size == 0) return;
var key = this._id.str;
var value = {size: group_size, ...};
emit(key, value);
}
function reduce (key, values) {
// since the map function emits each group just once,
// values will always be a list with length=1
return values[0];
}
db.runCommand({
mapreduce: item_groups,
map: map,
reduce: reduce,
query: item_groups_criteria,
scope: {item_criteria: item_criteria},
});
The problem line is:
item_criteria._id = {$in: this.items};
What if this.items.length == 5000 or even more? My RDBMS background cries out loud:
SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs)
is definitely not a good way to go.
Thank you sooo much for your time, guys!
I hope the best answer will be something like "you're stupid, stop thinking in RDBMS style, use $its_a_kind_of_magicSphere from the latest release of MongoDB" :)
I think you are struggling with the separation of domain/object modeling from database schema modeling. I too struggled with this when trying out MongoDb.
For the sake of semantics and clarity, I'm going to substitute Groups with the word Categories
Essentially your theoretical model is a "many to many" relationship in that each Item can belong Categories, and each Category can then possess many Items.
This is best handled in your domain object modeling, not in DB schema, especially when implementing a document database (NoSQL). In your MongoDb schema you "fake" a "many to many" relationship, by using a combination of top-level document models, and embedding.
Embedding is hard to swallow for folks coming from SQL persistence back-ends, but it is an essential part of the answer. The trick is deciding whether or not it is shallow or deep, one-way or two-way, etc.
Top Level Document Models
Because your Category documents contain some data of their own and are heavily referenced by a vast number of Items, I agree with you that fully embedding them inside each Item is unwise.
Instead, treat both Item and Category objects as top-level documents. Ensure that your MongoDb schema allots a table for each one so that each document has its own ObjectId.
The next step is to decide where and how much to embed... there is no right answer as it all depends on how you use it and what your scaling ambitions are...
Embedding Decisions
1. Items
At minimum, your Item objects should have a collection property for its categories. At the very least this collection should contain the ObjectId for each Category.
My suggestion would be to add to this collection, the data you use when interacting with the Item most often...
For example, if I want to list a bunch of items on my web page in a grid, and show the names of the categories they are part of. It is obvious that I don't need to know everything about the Category, but if I only have the ObjectId embedded, a second query would be necessary to get any detail about it at all.
Instead what would make most sense is to embed the Category's Name property in the collection along with the ObjectId, so that pulling back an Item can now display its category names without another query.
The biggest thing to remember is that the key/value objects embedded in your Item that "represent" a Category do not have to match the real Category document model... It is not OOP or relational database modeling.
2. Categories
In reverse you might choose to leave embedding one-way, and not have any Item info in your Category documents... or you might choose to add a collection for Item data much like above (ObjectId, or ObjectId + Name)...
In this direction, I would personally lean toward having nothing embedded... more than likely if I want Item information for my category, i want lots of it, more than just a name... and deep-embedding a top-level document (Item) makes no sense. I would simply resign myself to querying the database for an Items collection where each one possesed the ObjectId of my Category in its collection of Categories.
Phew... confusing for sure. The point is, you will have some data duplication and you will have to tweak your models to your usage for best performance. The good news is that that is what MongoDb and other document databases are good at...
Why don't use the opposite design ?
You are storing items and item_groups. If your first idea to store items in item_group entries then maybe the opposite is not a bad idea :-)
Let me explain:
in each item you store the groups it belongs to. (You are in NOSql, data duplication is ok!)
for example, let's say you store in item entries a list called groups and your items look like :
{ _id : ....
, name : ....
, groups : [ ObjectId(...), ObjectId(...),ObjectId(...)]
}
Then the idea of map reduce takes a lot of power :
map = function() {
this.groups.forEach( function(groupKey) {
emit(groupKey, new Array(this))
}
}
reduce = function(key,values) {
return Array.concat(values);
}
db.runCommand({
mapreduce : items,
map : map,
reduce : reduce,
query : {_id : {$in : [...,....,.....] }}//put here you item ids
})
You can add some parameters (finalize for instance to modify the output of the map reduce) but this might help you.
Of course you need to have another collection where you store the details of item_groups if you need to have it but in some case (if this informations about item_groups doe not exist, or don't change, or you don't care that you don't have the most updated version of it) you don't need them at all !
Does that give you a hint about a solution to your problem ?