How to design many to many relation in mongodb? - mongodb

The current scene is this:
There is a category table [category], the number of records is only more than 50, almost no increase, and the modification is rare.
There is a product table [product] currently millions of levels, will always increase.
These two are many-to-many relationships. One category will have more products, and each product will have multiple categories.
The category list is almost not changed, and there are about 1000 products in a category, and the list of a category will be changed not frequently.
Query requirements:
Query all categories (excluding the list of products under the category)
Query the category list by product_id
Query the product list by category_id
Operational requirements:
Modify the product list in category (add/delete a product to a category, sort the product list in a category, so the product list in category needs order.)
How many-to-many design of this kind of scene is better, there are some points:
1. If you follow the design of the SQL database, add a Category<-->Product relation table.
[Question] The order of each category of products is not well maintained. For example, the front-end performs a large-scale adjustment order on a category of products, and then requests it. The Category<-->Product relation table also needs to add an index field to indicate the order, and needs to update a lot of records. It is not particularly friendly to the operation requirements, is there any What can be optimized?
2. The way of NOSQL. Add a products:[] directly to the category to indicate a list of items in this category.
[Evaluation] In the query requirement, there is a requirement to query all categories (excluding the list of products under the category), which will pull out a lot of unnecessary data (products) at one time. Not applicable.
3. Add products:[] in the Category<-->Product association table
[Question] This can meet the operational requirements, but if you want to meet the Query requirments-2 [Query the category list by product_id], how to query it and will there be performance problems?

You need a third table (junction table) to complete the relationship. The keys must be primary keys along with a foreign key constraint.
tblProductCategories
PK product_id FK
PK category_id FK

Related

DynamoDB modeling relational data (restaurant menus example)

I'm creating a web platform on AWS to enable restaurant owners in my community to create menus. I'm moving from a relational database model to a NoSQL solution and wondering the best way to organize this data. My current relational model is as follows:
Table 'restaurants': id (int / primary key), name, owner (int)
Table 'categories': id (int / primary key), restaurant (int), parent (int)
Table 'items': id (primary key), name, category (int)
Only the owner should be allowed to create/update/delete places, categories, and items.
What would you recommend as a de-normalized solution given the ownership constraint? I was thinking of doing the following:
Table 'restaurants': id (primary key), owner (sort key), categories (list of ids)
Table 'categories': id (primary key), restaurant (id), items (list of item objects), subcategories (list of category ids)
Wondering if it'd be better to have all category data contained within the restaurant table. As an example, a user should only be able to add an item to a category if they are the owner of the associated restaurant, which would take an additional query, per above.
Depends mostly how you use your data . If usually the Restaurant is read full, is ok to have all in the restaurants table.
If you have a lot of operations only on one category , for example many are interested only in food and not interested in drinks , then it would be good to have this done on categories.
I think for some restaurants would be better to have it split in categories and keep common data on restaurant level , address, phone , opening hours and so on .
I don't think write is important , seems to be over 90% read web site.0
Perhaps a cache solution ? Redis ? Memcache ? this would speed up even more.

Database design with a single entity with many different units

I'm new to database design and I am working on a project that requires the use of a single entity (medication) that could be tied to any number of patients and each patient could have a different dosage. What would be the best way to layout a table for this type of situation. I could use a single table and just store each individual medication and dosage and tie that to the unique patient. But that would give me duplicate entries in the medication table (same med with just different dosage).
What I would like is to have a single entry for each medication name and have each patient have a unique dosage for that particular med. Of course a single patient could also have many different medications so I would have to be able to have a unique dosage for each med for different patients.
I using entity framework model first approach. Would I use a single table T_Patient_Medication and use each of the two table IDs as the primary key combo and then use a dosage field for that combination? If so how would I create the association to tie this table to the other two. Any suggestions?
Off the top of my head:
-a medication table(MedicineId, MedicineName, etc).
-a patient table(PatientId, PatientName, etc)
-a patient-medicine table(MedicineId, PatientId, Dosage, date, notes etc).
In other words, the medication table contains on row per unique med, a patient contains one row for each unique patient.
The patient-medicine table is where these two things meet: it contains a patientId, a medicineId and then anything else unique about that patient getting that medicine (i.e. Dr. name, dosage, date started etc). Personally I would make each row in the patient-medicine table also have its own unqiue ID separate from the combination of the patientid and medicine id (what are you going to do when the same patient goes back on the same medicine at a different time, if your primary key is Patientid+Medicineid). Each record should have its own unique id in my way of thinking.
There would be foreign keys between the tables to enforce this relationship: i.e. you can't add a row to the patient-medicine table unless the patientid exists in the patient table, and the medicine exists in the medicine table; and equally important prevent you from deleting a rows from tables where there are dependent records in other tables. If you take the time and setup all those foreign keys (relationships), it will be a breeze in EF to navigate the related records.
It is without a doubt more complicated than this, but that is the basics idea of a relational table.

Mongo Schema Design

I'm pretty new to Mongo. Just started a project using Mongodb as the database.
I'm not sure how should i design the following use-case to a document base database.
User-Case
1. Vendor/Distributor has a list of product on our system.
2. There's a standard price list of each product for any customers.
3. Vendor/Distributor also has customize price list of each of the product for each customer.
eg. CustA have a productA at different pricing from the standard and it's only available to him.
4. Some of the Product are only available through customize price, and I match those product with attribute public = false.
How should i work this out in document base database?
Current design i have is.
1. [Product Document] with embedded document of standard price list.
2. [Product_Price Document] with oneToMany link [Product Document] and oneToMany to [Customer Document]
3. [Customer Document].
With this Model, I'm facing problem with querying by paging.
Example I query the first 30 Product sorted by name. Then query [Product_Price Document] with the 30 ProductId that match, so that I have those customize price for that customer who login.
The problems come where by I couldn't query item that are customize to the user that is not available for everyone.
Is there a better way or design the schema or what should i do with the query?
I'm using PHP, Doctrine2, Symfony2
When you query the Product_Price_Document query it using both ProductID and current CustomerID. Or am I missing something?
Here's how I would structure it.
Have two collections:
- Products
- Vendors
Your products table would have the list of all your products and their standard price. Your vendors page would have an array of product ID's along with an override price in the case that they have a different price for that particular product.
If you are also tracking customers then you could make that a collection too and have a belongs to relationship almost to the vendors.
so in short:
collection.vendor:
{"name":'foo',"products":[{"_id":mongoId,"priceOveride":15.50},..]}
collection.products:
{"name":"bar","price":15.40}
Excellent resource for reading a bit more into the relationships which you can use:
Learn Mongo Interactively

Entity Framework many-to-many question

Please help an EF n00b design his database.
I have several companies that produce several products, so there's a many-to-many relationship between companies and products. I have an intermediate table, Company_Product, that relates them.
Each company/product combination has a unique SKU. For example Acme widgets have SKU 123, but Omega widgets have SKU 456. I added the SKU as a field in the Company_Product intermediate table.
EF generated a model with a 1:* relationship between the company and Company_Product tables, and a 1:* relationship between the product and Company_Product tables. I really want a : relationship between company and product. But, most importantly, there's no way to access the SKU directly from the model.
Do I need to put the SKU in its own table and write a join, or is there a better way?
I just tested this in a new VS2010 project (EFv4) to be sure, and here's what I found:
When your associative table in the middle (Company_Product) has ONLY the 2 foreign keys to the other tables (CompanyID and ProductID), then adding all 3 tables to the designer ends up modeling the many to many relationship. It doesn't even generate a class for the Company_Product table. Each Company has a Products collection, and each Product has a Companies collection.
However, if your associative table (Company_Product) has other fields (such as SKU, it's own Primary Key, or other descriptive fields like dates, descriptions, etc), then the EF modeler will create a separate class, and it does what you've already seen.
Having the class in the middle with 1:* relationships out to Company and Product is not a bad thing, and you can still get the data you want with some easy queries.
// Get all products for Company with ID = 1
var q =
from compProd in context.Company_Product
where compProd.CompanyID == 1
select compProd.Product;
True, it's not as easy to just navigate the relationships of the model, when you already have your entity objects loaded, for instance, but that's what a data layer is for. Encapsulate the queries that get the data you want. If you really want to get rid of that middle Company_Product class, and have the many-to-many directly represented in the class model, then you'll have to strip down the Company_Product table to contain only the 2 foreign keys, and get rid of the SKU.
Actually, I shouldn't say you HAVE to do that...you might be able to do some edits in the designer and set it up this way anyway. I'll give it a try and report back.
UPDATE
Keeping the SKU in the Company_Product table (meaning my EF model had 3 classes, not 2; it created the Company_Payload class, with a 1:* to the other 2 tables), I tried to add an association directly between Company and Product. The steps I followed were:
Right click on the Company class in the designer
Add > Association
Set "End" on the left to be Company (it should be already)
Set "End" on the right to Product
Change both multiplicities to "* (Many)"
The navigation properties should be named "Products" and "Companies"
Hit OK.
Right Click on the association in the model > click "Table Mapping"
Under "Add a table or view" select "Company_Product"
Map Company -> ID (on left) to CompanyID (on right)
Map Product -> ID (on left) to ProductID (on right)
But, it doesn't work. It gives this error:
Error 3025: Problem in mapping fragments starting at line 175:Must specify mapping for all key properties (Company_Product.SKU) of table Company_Product.
So that particular association is invalid, because it uses Company_Product as the table, but doesn't map the SKU field to anything.
Also, while I was researching this, I came across this "Best Practice" tidbit from the book Entity Framework 4.0 Recipies (note that for an association table with extra fields, besides to 2 FKs, they refer to the extra fields as the "payload". In your case, SKU is the payload in Company_Product).
Best Practice
Unfortunately, a project
that starts out with several,
payload-free, many-to-many
relationships often ends up with
several, payload-rich, many-to-many
relationships. Refactoring a model,
especially late in the development
cycle, to accommodate payloads in the
many-to-many relationships can be
tedious. Not only are additional
entities introduced, but the queries
and navigation patterns through the
relationships change as well. Some
developers argue that every
many-to-many relationship should start
off with some payload, typically a
synthetic key, so the inevitable
addition of more payload has
significantly less impact on the
project.
So here's the best practice.
If you have a payload-free,
many-to-many relationship and you
think there is some chance that it may
change over time to include a payload,
start with an extra identity column in
the link table. When you import the
tables into your model, you will get
two one-to-many relationships, which
means the code you write and the model
you have will be ready for any number
of additional payload columns that
come along as the project matures. The
cost of an additional integer identity
column is usually a pretty small price
to pay to keep the model more
flexible.
(From Chapter 2. Entity Data Modeling Fundamentals, 2.4. Modeling a Many-to-Many Relationship with a Payload)
Sounds like good advice. Especially since you already have a payload (SKU).
I would just like to add the following to Samuel's answer:
If you want to directly query from one side of a many-to-many relationship (with payload) to the other, you can use the following code (using the same example):
Company c = context.Companies.First();
IQueryable<Product> products = c.Company_Products.Select(cp => cp.Product);
The products variable would then be all Product records associated with the Company c record. If you would like to include the SKU for each of the products, you could use an anonymous class like so:
var productsWithSKU = c.Company_Products.Select(cp => new {
ProductID = cp.Product.ID,
Name = cp.Product.Name,
Price = cp.Product.Price,
SKU = cp.SKU
});
foreach (var
You can encapsulate the first query in a read-only property for simplicity like so:
public partial class Company
{
public property IQueryable<Product> Products
{
get { return Company_Products.Select(cp => cp.Product); }
}
}
You can't do that with the query that includes the SKU because you can't return anonymous types. You would have to have a definite class, which would typically be done by either adding a non-mapped property to the Product class or creating another class that inherits from Product that would add an SKU property. If you use an inherited class though, you will not be able to make changes to it and have it managed by EF - it would only be useful for display purposes.
Cheers. :)

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.