Unsure how to model many-to-many relationship in MongoDB - mongodb

I'm new to MongoDB and I'm trying to model a Many to Many relationship into at least 2 collections (I need two collections for the project). What I'm having is a collection of universities, faculties and specializations, and another collection for students and their gradebook (this was the middle entity between specializations and students in SQL, not sure if it's needed in Mongo anymore). I tried to use This as an inspiration but it limits me as I can only search students by university id (I want for example to search students from a certain specialization or a certain faculty). I could put every row from university, faculty and specialization in the student collection and vice versa but I really don't think it's ideal. Here's what I have so far:
db.students.insertOne({_id:1, firstname: 'John', lastname: 'Silas', ethnicity:'english', civilstatus:'single', residence:'London', email:'johnSilas#gmail.com', gradebook:[{ year:2018, registrationyear:2017, formofeducation:'traditional'}], universities:[1]})
db.universities.insertOne({_id:1, name:'University of London', city:'London', adress:'whatever', phone: 'whatever', email: 'whatever#gmail.com', faculty:[{name: 'Law', adress:'whatever', phone: 'whatever', email: 'whatever#gmail.com'}], specialization:[{name:'criminal rights', yearlytax:5000, duration: 3, level:'bachelordegree', language:'english'}], students: [1,2]})
I'm sorry if I don't understand basic noSQL concepts, I am new to it. Thanks in advance.

Basic patterns for many to many association between A and B:
Inline references
On A, store the list of B ids in a field like b_ids
On B, store the list of A ids in a field like a_ids
This requires two writes whenever an association is created or destroyed, but requires either zero or one joins at query time to traverse the association (if you just want the id of Bs for a given A and you have the A already no further queries are needed).
Join model
Create a model C which has two fields: a_id and b_id. Each association is represented by a single instance of C.
This requires one write whenever an association is created or destroyed, but requires joins on all queries involving association (potentially two joins per query).

Related

MongoDb conditional relationship

Suppose I have following 4 collections:
1- posts
2- companies
3- groups
4- users
Bellow is my current structure in post:
and their relation is:
A company has an owner and many other members (user collection).
A group has many members (users).
A user has many posts.
A group has many posts that published by one of its members.
A company has many posts that published by its owner or members.
Now i have a problem on storing relation of users, company, and group with posts collection.
Bellow is my current structure:
I have decided to have a field postable inside my post document, and has a type field that will be 'user', or 'group', or 'company', and two other fields name, and id that will be company/group id and company/group name in cases that post is belonged to company or group but not user means type="group" || type="company".
Now how i can handle this to map id as FK of group and company collection (one field FK of two collection) ?
Is it the right structure ?
What you have here is a polymorphic association. In relational databases, it is commonly implemented with two fields, postable_id and postable_type. The type column defines which table to query and id column determines the record.
You can do the same in mongodb (in fact, that is what you came up with, minus the naming convention). But mongodb has a special field type precisely for this type of situations: DBRef. Basically, it's an upgraded id field. It carries not only the id, but also collection name (and database name).
how i can handle this to map id as FK of group and company collection (one field FK of two collection)?
Considering that mongodb doesn't have joins and you have to load all references manually, I don't see how this is any different from a regular FK field. Just the collection name is stored in the type field now, instead of being hardcoded.

Understanding Mongoose Schema better

I am relatively new to the MongoDb world, coming from a MS Sql / Entity framework environment.
I am excited about Mongo, because of:
MongoDb's ability to dynamically change the shape of the class/table/collection at run time.
Entity framework does not offer me that.
Why is that so important?
Because I would like to create a generic inventory app and have the product class/collection/table be dynamic for clients to add fields pertinent to their business that cannot be used by everyone, eg. Vin Number, ISBN number, etc.
Now I have come to learn about Mongoose and how it offers a schema, which to me detracts from the flexibility of MongoDb described above.
I have read in a few sections that there is such an animal as mixed-schema, but that appears to be dynamic relative to the data type and not the collection of properties for the given class/collection/table.
So this is my question:
If I am looking at developing a generic class/collection /table that affords clients to shape it to include whatever fields/properties they want that pertain to their business, dynamically, should I abandon the whole notion of mongoose?
I found a benefit today as to where a Schema may be warranted:
Allow me to preface though and say I still thoroughly am excited about the whole idea that Mongo allows a collection to be reshaped at run time in circumstances where I may need ti to be. As mentioned above, a perfect example would be an Inventory app where I would want each client to add respective fields that pertain to their business as opposed to other clients, such as a Car dealership needing a VIN Number field, or a Book store needing a ISBN Number field.
Mongo will allow me to create one generic table and let the client shape it according to his own wishes for his own database at run time - SWEET!
But I discovered today where a schema would be appropo:
If in another table that will not be 're-shapeable', say a user table, I can create a Schema for pre-determined fields and make them required, as such:
var dbUserSchema = mongoose.Schema({
title: {type:String, required:'{PATH} is required!'},
FullName: {
FirstName: {type: String, required: '{PATH} is required!'},
LastName: {type: String, required: '{PATH} is required!'}
}
});
By having the respective first-name and last-name required from the schema, the database will not add any records for a user if they are not both included in the insert.
So, I guess one gets the best of both worlds: Tables that can be re-shaped and thru a schema, tables that can be rigid.

Mongodb Schema considering sub-document to avoid multiple reads

I am trying to come up with a MongoDB document model and would like others opinions. I want to have a Document that represents an Employee. This table will contain all attributes of an employee (I.e. firstName, LastName). Now where I am stuck coming from the relational realm, is the need to store a list of employees an employee can access. In other words lets say Employee A is a Manager. I need to store the direct reports that he manages, in order to use this in various applications. In relational I would have a mapping table that tied an employee to many employees. In mongo not being able join documents, do you think I should utilize an embeded (sub-document) to store the list of accessible employees as part of the Employee document? Any other ideas ?
Unless your using employee groups (Accounting, HR, etc) You'll probably be fine adding the employee name, mongo Object ID, and any other information unique to that manager / employee relationship as a sub document to the managers document.
With that in place you could probably do your reporting on these relationships through a simple aggregation.
This is all IMHO, and begs the question; Is simple aggregation another oxymoron like military intelligence?

Entity Framework many-to-many question

Please help an EF n00b design his database.
I have several companies that produce several products, so there's a many-to-many relationship between companies and products. I have an intermediate table, Company_Product, that relates them.
Each company/product combination has a unique SKU. For example Acme widgets have SKU 123, but Omega widgets have SKU 456. I added the SKU as a field in the Company_Product intermediate table.
EF generated a model with a 1:* relationship between the company and Company_Product tables, and a 1:* relationship between the product and Company_Product tables. I really want a : relationship between company and product. But, most importantly, there's no way to access the SKU directly from the model.
Do I need to put the SKU in its own table and write a join, or is there a better way?
I just tested this in a new VS2010 project (EFv4) to be sure, and here's what I found:
When your associative table in the middle (Company_Product) has ONLY the 2 foreign keys to the other tables (CompanyID and ProductID), then adding all 3 tables to the designer ends up modeling the many to many relationship. It doesn't even generate a class for the Company_Product table. Each Company has a Products collection, and each Product has a Companies collection.
However, if your associative table (Company_Product) has other fields (such as SKU, it's own Primary Key, or other descriptive fields like dates, descriptions, etc), then the EF modeler will create a separate class, and it does what you've already seen.
Having the class in the middle with 1:* relationships out to Company and Product is not a bad thing, and you can still get the data you want with some easy queries.
// Get all products for Company with ID = 1
var q =
from compProd in context.Company_Product
where compProd.CompanyID == 1
select compProd.Product;
True, it's not as easy to just navigate the relationships of the model, when you already have your entity objects loaded, for instance, but that's what a data layer is for. Encapsulate the queries that get the data you want. If you really want to get rid of that middle Company_Product class, and have the many-to-many directly represented in the class model, then you'll have to strip down the Company_Product table to contain only the 2 foreign keys, and get rid of the SKU.
Actually, I shouldn't say you HAVE to do that...you might be able to do some edits in the designer and set it up this way anyway. I'll give it a try and report back.
UPDATE
Keeping the SKU in the Company_Product table (meaning my EF model had 3 classes, not 2; it created the Company_Payload class, with a 1:* to the other 2 tables), I tried to add an association directly between Company and Product. The steps I followed were:
Right click on the Company class in the designer
Add > Association
Set "End" on the left to be Company (it should be already)
Set "End" on the right to Product
Change both multiplicities to "* (Many)"
The navigation properties should be named "Products" and "Companies"
Hit OK.
Right Click on the association in the model > click "Table Mapping"
Under "Add a table or view" select "Company_Product"
Map Company -> ID (on left) to CompanyID (on right)
Map Product -> ID (on left) to ProductID (on right)
But, it doesn't work. It gives this error:
Error 3025: Problem in mapping fragments starting at line 175:Must specify mapping for all key properties (Company_Product.SKU) of table Company_Product.
So that particular association is invalid, because it uses Company_Product as the table, but doesn't map the SKU field to anything.
Also, while I was researching this, I came across this "Best Practice" tidbit from the book Entity Framework 4.0 Recipies (note that for an association table with extra fields, besides to 2 FKs, they refer to the extra fields as the "payload". In your case, SKU is the payload in Company_Product).
Best Practice
Unfortunately, a project
that starts out with several,
payload-free, many-to-many
relationships often ends up with
several, payload-rich, many-to-many
relationships. Refactoring a model,
especially late in the development
cycle, to accommodate payloads in the
many-to-many relationships can be
tedious. Not only are additional
entities introduced, but the queries
and navigation patterns through the
relationships change as well. Some
developers argue that every
many-to-many relationship should start
off with some payload, typically a
synthetic key, so the inevitable
addition of more payload has
significantly less impact on the
project.
So here's the best practice.
If you have a payload-free,
many-to-many relationship and you
think there is some chance that it may
change over time to include a payload,
start with an extra identity column in
the link table. When you import the
tables into your model, you will get
two one-to-many relationships, which
means the code you write and the model
you have will be ready for any number
of additional payload columns that
come along as the project matures. The
cost of an additional integer identity
column is usually a pretty small price
to pay to keep the model more
flexible.
(From Chapter 2. Entity Data Modeling Fundamentals, 2.4. Modeling a Many-to-Many Relationship with a Payload)
Sounds like good advice. Especially since you already have a payload (SKU).
I would just like to add the following to Samuel's answer:
If you want to directly query from one side of a many-to-many relationship (with payload) to the other, you can use the following code (using the same example):
Company c = context.Companies.First();
IQueryable<Product> products = c.Company_Products.Select(cp => cp.Product);
The products variable would then be all Product records associated with the Company c record. If you would like to include the SKU for each of the products, you could use an anonymous class like so:
var productsWithSKU = c.Company_Products.Select(cp => new {
ProductID = cp.Product.ID,
Name = cp.Product.Name,
Price = cp.Product.Price,
SKU = cp.SKU
});
foreach (var
You can encapsulate the first query in a read-only property for simplicity like so:
public partial class Company
{
public property IQueryable<Product> Products
{
get { return Company_Products.Select(cp => cp.Product); }
}
}
You can't do that with the query that includes the SKU because you can't return anonymous types. You would have to have a definite class, which would typically be done by either adding a non-mapped property to the Product class or creating another class that inherits from Product that would add an SKU property. If you use an inherited class though, you will not be able to make changes to it and have it managed by EF - it would only be useful for display purposes.
Cheers. :)

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.