MongoDb: embeding vs reference - mongodb

I have the following collections:
Client contains Product contains Project contains Task and
Company contains Subsidiary contains Department contains Users (and user contains custom properties)
What is the best practice? How to use mongodb more efficiently?
As for me, Users and Projects will be changed more often and should be defined as separate collections.
What is your advise?

There is no unified answer for you question, as MongoDB allows you to do both embedding and reference.
But here are some tips: don't bloat a collection, because querying the elements deep below might be hard, especially if you're going to use lists.
If you are able to prototype, try embedding the collections first and then write a unit test which will add/remove/query projects and users. If that would work, then there's no need to reference them.

Related

In a nosql database like MongoDB or Couchbase how to model many to many relationship?

Consider a scenario of an application where I have users and projects and the requirement is users shall be assigned to projects. One user can be assigned to multiple projects. This is a many to many relationship. So what is the best way to model such a requirement.
I will like to discuss few approaches to model such a requirement :
- Embeded data model
In this approach I will embedd the user documents inside projects document.
Advantages : you get all the required data in one API call OR by fetching one single document.
Disadvantages : Data duplicacy which is OK
Real problem is if you update user information for eg user mobile no or name from users screen then this updated information should also be reflected under all embedded user documents. For this some bulk update query should be fired.
But is this the right way ???
- Embedding object references instead of objects (which is normalised)
In this case if we embedd user id's instead of user objects then the problem mentioned above wont be there but then we will have to make multiple network calls to get required data or make a seperate relation kond of document as we do in SQL.
Is this the best way ??
We have a same scenario, so i embed objectId. and for fill data for clients, populate users data in find function.
contract.find({}).populate('user').then(function(){});
There are few hard and fast rules, but usually with many-to-many relationships you would prefer references over embedding. This doesn't mean your data is totally flat/normalized.
For example, you could have a user document with an array of project ids. You could have the reverse for projects.
Think about your queries and how you will structure them. That can give you other hints about how to structure your documents.

Mutual dependency MongoDB

Let's say I'm making a social app with MongoDB database, and I want users to be able befriend each other. Of course friendship is a mutual relation and user ids are integers. What would be the best approach?
Every user has a list of friend ids. Every time a bond is created/severed, both users' lists have to be updated.
Create join table 'friendship' containing IDs of 2 users. Every time bond is created I have to create two entries. 1->2 and 2->1
As no. 2, but always create only 1 bond with rule: lower_usr_id -> higher_usr_id. Assuming there are a lot of people and friendships. Wouldn't it save a lot of space and time?
It sounds like you're rather unclear about how MongoDB works. Joins aren't something that appears in MongoDB, and if you're trying to use MongoDB like a relational database you're doing it wrong.
I'm no expert on MongoDB, but I believe there are two common methods of modelling a one-to-many relationship:
Embedding one document inside another
Using references
Embedding a document inside another makes sense where the parent document in some sense "owns" the child document. For instance, in the context of a blogging application, a comment is owned by a post, so it might make sense to embed the comment inside the post.
For your use case, I don't believe that would be appropriate since the relationship is between objects of the same type. It would therefore make sense to record friendships as a reference to another object in the same collection.
Check out this link for further details.

mongodb: how to express "schema"?

Once many teams work with the same mongodb database there needs to be some way to express what each document may contain. Otherwise the document will end be having "email", "mail", "email_addr" fields added by each team. What's the best way to represent this for the purpose of communication across teams?
Obviously, the best way is what the team is most comfortable with. It can be UML, whiteboard drawings, XML mappings, model code files, maybe even haiku poems :)
I personally prefer using an ODM (mongoid). It encourages you to specify all fields in the model class. Then you just need one glance at it to understand the schema.
What you can do is create your Objects first in a set of commons that all team members import into their projects. If you change schema design, you update Commons project and all team members import latest.
It's more about process and project management and less about technology given Mongo's schema-less design. One thing we find helpful is design your Tests first and lately, SoapUI and LoadUI have been excellent tools. Once you define your tests, it can stub the returns for you and produces HTML documentation you can distribute to team.
Check out: http://www.soapui.org/REST-Testing/working-with-rest-services.html
When you create collection, just add to it some first "reference" object that would have all the fields/sub-objects that object of this collection can possibly have and use it as "schema". You can even write validator that would check that new objects conform to this reference object.

Best practices to design classes to represent database tables

This may be a dumb question, but I've always wondered what's the best way to do this.
Suppose we have a database with two tables: Users and Orders (one user can have many orders), and in any OOP language you have two classes to represent those tables User and Order. In the database it's evident that the 'order' will have the 'user' ID because it's a one to many relationship (because one user can have many orders) and the user won't have any order ID. But in code what's the best practice out of the following three?
a) Should the user have an array of Orders?
b) Should the order have the user ID?
c) Should the order have a reference to the user object?
Or are there more efficient ways to tackle this? I've always done it in different ways, they all have both pros and cons, but I've never asked an expert's opinion.
Thanks in advance!
In this instance, the User could have an array of orders if you're performing operations on the User that also involves orders that they own.
Whenever I design my classes, objects that are related contain pointers to each other, so I can access the Orders from the User and the User from an Order.
I don't believe there is a best practice as it really depends on what you're trying to accomplish. With Users and Orders, I could see you starting with an Order and needing to access the User and vice versa; therefore, in your situation it sounds like you should map the objects both ways.
One word of warning, just be careful not to create a circular reference. If you delete both objects without removing the reference, it could create a memory leak.
You are asking about what is known as "object relational mapping" (ORM). I think the best way to learn what you want to learn is to look at some well established ORM libraries [such as ActiveRecord(Ruby) or Hibernate (Java)] and see how they do it.
With that in mind:
a) If the application requires it there should be access to an array (or similar enumeration) of objects representing the users orders through the user object. However this will usually best involve lazy loading (i.e. the orders will usually not be pulled from the database when the user pulled from the database....the orders will be subsequently queried when the application needs access to them). After objects are lazy loaded they can be cached by the ORM to eliminate the need for further queries on that istantiation.
b) Unless for performance reasons you only pull specific columns you're usually going to pull all columns when pulling an order. So it would include the user id.
c) Answer a applies to this as well.

How do I use entity framework with hierarchical data?

I'm working with a large hierarchical data set in sql server - modelled using the standard "EntityID, ParentID" kind of approach. There are about 25,000 nodes in the whole tree.
I often need to access subtrees of the tree, and then access related data that hangs off the nodes of the subtree. I built a data access layer a few years ago based on table-valued functions, using recursive queries to fetch an arbitrary subtree, given the root node of the subtree.
I'm thinking of using Entity Framework, but I can't see how to query hierarchical data like
this. AFAIK there is no recursive querying in Linq, and I can't expose a TVF in my entity data model.
Is the only solution to keep using stored procs? Has anyone else solved this?
Clarification: By 25,000 nodes in the tree I'm referring to the size of the hierarchical dataset, not to anything to do with objects or the Entity Framework.
It may the best to use a pattern called "Nested Set", which allows you to get an arbitrary subtree within one query. This is especially useful if the nodes aren't manipulated very often: Managing hierarchical data in MySQL.
In a perfect world the entity framework would provide possibilities to save and query data using this data pattern.
Everything IS possible with Entity Framework but you have to hack and slash your way in to it. The database I am currently working against has too many "holder tables" since Points for instance is shared with both teams and users. Both users and teams can also have a blog.
When you say 25 000 nodes do you mean navigational properties? If so I think it could be tricky to get the data access in place. It's not hard to navigate, search etc with entity framework but I tend to model on paper then create the database based on how I want to navigate while using entity framework. Sounds like you don't have that option.
Thanks for these suggestions.
I'm beginning to realise that the answer is to remodel the data in the database - either along the lines of nested sets as Georg suggests, or maybe a transitive closure table, which I've just come across.
That way, I'm hoping to get two key benefits:
a) faster querying aginst arbitrary subtrees
b) a data model which no longer requires recursive querying - so perhaps bringing it within easy reach of the Entity Framework!
It's always amazing how so often the right answer to a difficult problem is not to answer it, but to do something else instead!