I have 3 collections interconnected through many-to-many relationships. Therefore, I have 2 concerns:
Should I have 2 arrays with 2 ids in each of the 3 collections, or one join collection with 3 ids?
How to perform reads, inserts, updates and deletes, so everything is in sync and integrity is ensured?
For most scenarios, I'd probably have the referencing IDs on each of the objects, like this:
{
"_id": "123",
"firstReferenceCollectionId": "abc",
"secondReferenceCollectionId": "def"
}
If your application is going to have massive scale, I'd probably further denormalize the data based on how it's actually being used.
To answer your second question, you really shouldn't need to worry about that with the above approach, since the internal id of the reference won't change when other properties on those objects change. If you go the route of additional denormalization, use meteor add matb33:collection-hooks to sync up data on upserts. Here's the documentation link: https://github.com/matb33/meteor-collection-hooks
Related
Every user in our system (Like Facebook and twitter) has an option to add other users to his predefined lists like: *"Favorites", "Follow", "Blocked", "Closed Friends". Than, we want to allow him to search the list, filter and see commutative data from all the above list. for example:
UserA {
IsFollow: 1,
IsFavorite: 0
...
IsBlocked: 0
}
We also want to keep some additional information when user adding another user to one of the above list such addingDate.
Option One is to manage different collections like "Favorites", "Follow", "Blocked", "Closed Friends"
Option Two - to manage one collection like "Relations" and keep all the data on that collection without the needs of using lookup...
Option Three - Use option One but create a flat collection with all the relevant data from each table (RabbitMQ, transaction update, etc).
Since I'm new in MongoDB (I'm migrating the system from MS SQL), I'm wondering what is the best approach for high scale system.
I would suggest you go with option 2, where all the keys will be present in one document.
MongoDB recommends a schema design where all the data are embedded into a single document. They claim that this will lead to less read-write operations to DB and faster CRUD operations compared to the relational mapping approach.
But, there is a catch here. The data should be embedded in a single document only if the relations are One-to-One, One-to-Few, or One-to-Many.
DO NOT GO WITH DOCUMENT EMBEDDING APPROACH IF YOUR DATA MAPPING RELATION IS One-to-Squillions. I recommend you to read this article
The reason why I am not recommending Option-1 to have a separate collection is you will have to make more requests to a DB for each and every collection linkage. Although the $lookup stage is fast, it is not as efficient compared to the embedding approach.
As far as option 3 goes, it's a viable approach (If you use transactions properly and effectively), but it adds up complexity in the coding side.
I have personally used both Option-1 & Option-2 approaches, and option-1 has always left the AWS-EC2 instance running MongoDB to higher CPU and RAM usage. As far as option-2 goes, I have a collection that has almost 1000 array elements (With key indexed) and 15K keys in each records (I am not joking) and MongoDB had no issues processing it. Just make sure that you use the Projection of return documents everywhere.
So, go for Option-2 as a standard approach and Option-3 for One-to-Squillions relation mapping.
For referencing two or more collections, make sure that you use MongoDB generated ObjectId instead of your own custom referencing since have seen a minor performance impact on using multi-document relation-mapping other than ObjectId (Even if that particular key is indexed)
Hope this helps. Reach me out if you have additional queries
I have a collection in mongodb by the name Order. It has many fields some of which are mentioned below:
Order
{
"id": "1",
"name": "Hello1",
"orderType": "Type1",
"date": "2016-09-23T15:07:38.000Z"
...... //11 more fields
}
In the above collection, I have mentioned only 4 fields but in actual collection, I have around 15 fields. Order can be of 2 types: Type1 or Type2. It is mentioned in orderType field.
Now, for the Type2 order, I have all the 15 fields but it also have additional 5-7 fields. I have a single Order collection and I was wondering should I create 2 different collections for each type of orders; or whether I could keep this collection only and add additional fields here only. I have already written most of the logic considering only 1 collection Will it be worth the effort making it 2 different collections? If I keep a single collection, am I losing anything in terms of performance?
Keeping data on single collection is usually better in terms of performance since you can get required data mostly in a single query.
Since from the question I can mostly think it is a one-to-one relationship, embedded documents would be beneficial. Secondly, you are yourself saying that you have already written most of the logic considering only one collection then you should go forward with one collection only. Unless you have one-to-many relationships.
I would recommend you to go through Data Model Design and understand when and which model is better.
Embedded document would be best as you have one-to-one relationship. For more info please check here.
Assuming the following "schema/relationship" design what is the recommended practice for handling deletion with cascade delete like operation?
Relational Schema:
+---------+ +--------+
| Student |-*--------1-[Enrollment]-1--------*-| Course |
+---------+ +--------+
MongoDB:
+---------+ +--------+
| Student |-*----------------*-| Course |
+---------+ +--------+
Given this classic design of enrollment of students to courses, having a collection of courses in students and vice versa seems to be an appropriate data model when using MongoDB (that is nothing for the relationship/enrollment table). But coming from a relational world how should I handle the semantics of deleting a course? That is, when a course is deleted, all the "enrollment" records should be deleted too. That is, I should delete the course from the collection of each student record. It looks like I have to fire 2 queries: one for deleting the course and then to delete it from each student's collection. Is there a way to have a single query to perform this "cascade delete" like semantic without the additional query? Does the data model need to change?
NOTE: For all other use cases the above data model works just fine:
Deleting a student => just delete that student and associated collection of courses deleted along with it.
Student willing to drop a course => just delete it from the student collection of courses
Add student/course => just add it to corresponding 'table' in essence.
The only tricky thing is handling the deletion of a course. How should I handle this scenario in MongoDB, since I hail from a relational background and am unable to figure this one out.
What you are doing is the best and most optimal way of doing it in Mongo. I am in a similar situation and after going all possible implementations of the N:M design pattern, have also arrived to this same solution.
Apparently, This is not a mongodb thing, but more of a concept of NoSQL, wherein, the less changing data (Courses) can be kept separately. And since deleting a Course is not going to be a very frequent operation, its feasible enough to go through all the records to remove it.
On the other hand, you could let it be as it is.
In your application logic, just ignore the values of Courses in the Student document that don't have a reference_id in the Course document at all. But in that case, you must make sure that old deleted Course_id's are not being reused.
OR just use the deleted flags on the Course document and handle everything else in your application logic.
I'm going to answer based on Mongo team recommendations. I also came from the relational database and I had some issues at the beginning understanding the concepts. Mongo team recommends to design with the idea of "Application-Driven" schema, so you have to figure out first what pieces of data go together. Remember there's not such a transaction concept in any possible way in Mongo, even if we invent a driver that handles transactions we should implement our own solution for this. It means if I have two business objects that requires to be updated at the same time always and I cannot tolerate a failure in this operation, I have to join them into a single document (atomic).
In your case you have two documents, Student and Courses, and a relation between then (A student enrolls to N courses). I assume courses are not required to be altered all the time, so they can be stored in a different collection.
But the point is the relation between them, in this case you need to atomically delete a Student and all the courses he enrolled in.
So the best suitable solution for this is to embed the relation into Student, and keep a separated Course collection. When you delete the student, the relation is dropped at the same time:
Student Json:
{ _id: ObjectId('...'), name:"John", lastname:"Smith",
courses: [ 1, 100, 50, 67 ], ...
}
Courses can be a separated collection between them.
This is the way to handle it in Mongo. Atomic operations must be embedded into a single document. I assumed Courses is a list of courses that don't change so much, in case they're designed by Student we could change a bit the solution.
I have users table like:
{ _id: kshjfhsf098767, email: email#something name: John joshua }
{ _id: dleoireofd9888, email: email#hhh name: Terry Holdman }
And I have other collection "game"
{_id: gsgrfsdgf8898, home_user_id: kshjfhsf098767, guest_user_id: dleoireofd9888, result: "0:1"}
Then what I want is to join (like it was in mysql), game two times with users with because I know home_user_id and guest_user_id and take name email etc.
I could place all of that in table game but that will be duplicated content. and if they change name or email I need to update whole game table....
Any help on design and query to call that game with two users that are playing game would be great...Tnx
There are two ways to manage this, manually or using a DBRef. From the preceeding documentation link:
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
So it is a case of mange the link yourself or use the built-in DBRef. For the DBRef case see How to query mongodb with DBRef
Alternatively, it may be easier to manage with a different schema design. For example the game collection could just store the result and game_id and instead add the game_id reference to each of the relevant users. Of course you will still need to query both collections and the linked SO question has an example of how to do this.
MongoDB has no JOINs (NoSQL).
Just do a lazy join here where by you query your user row and then query all games that user is a part of. It will be ultra fast with the right indexes and since the commands would be two small ones MongoDB would barely notice them.
I would not recommend embedding here. Taking the reason you state, for example, that will make the data a pain to update across the 100's of users that could be in a single game "room". In this case it is better to do a single atomic update even if it means you have to put a little overhead on querying another collection.
I have to choose a database for implementing a sharing system.
My system will have users and documents. I have to share a document with a few users.
Example:
There are 2 users, and there is one document.
So if I have to share that one document with both the users, I could do these possible solutions:
The current method I'm using is with MySQL (I don't want to use this):
Relational Databases (MySQL)
Users Table = user1, user2
Docs Table = doc1
Docs-User Relation Table = doc1, user1
doc1, user2
And I would like to use something like this:
NoSQL Document Stores (MongoDB)
Users Documents:
{
_id: user1,
docs_i_have_access_to: {doc1}
}
{
_id: user2,
docs_i_have_access_to: {doc1}
}
Document's Document:
{
_id: doc1
members_of_this_doc: {user1, user2}
}
And I don't yet know how I would implement in a key-value store like Redis.
So I just wanted to know, would the MongoDB way I have given above, the best solution?
And is there any other way I could implement this? Maybe with another database solution?
Should I try to implement it with Redis or not?
Which database and which method should I choose and will be the best to share the data and why?
Note: I want something highly scalable and persistent. :D
Thanks. :D
Actually, you need to represent a many-to-many relationship. One user can have several documents. One document can be shared among several users.
See my previous answer to this question: how to have relations many to many in redis
With Redis, representing relationship with the set datatype is a pretty common pattern. You can expect to get better performance than with MongoDB for this kind of data model. And as a bonus, you can easily and efficiently find which users have a given list of documents in common, or which documents are shared by a given set of users.
Considering only this simple example (you just need to keep who owns what) SQL seems to be the most appropriate, as it will give additional options for free, such as reporting who has how many docs, the most popular documents, most active user etc with almost zero cost + the data will be more consistent (no duplication, possibly foreign keys). This is valid unless you have millions of documents of course.
If I chose between document-oriented and relational DB, I'd make a decision based mostly on the structure of the document itself. Whether they're all uniform or may have different fields for different types, do you nested sub-documents or arrays with the ability to search by their contents.