Mongodb ways of implementing many to many? - mongodb

I started reading up on MongoDB (which got me very excited) as I understand one of their flaws is the self explanatory lack of relation. Especially when it comes to large or ever growing on both sides, many to many relationships.
And, as I read around the best way to avoid ever growing arrays inside some document is either try avoiding it by creating buckets of documents and then referencing the buckets (that does not guarantee total prevention of overgrowth). Or to create the both document referencing a third many to many document.
Since I could not found a final answer to this dilemma or at least one the wouldn't be a few years old, could someone explain if this is the dead end (in case the project uses a few big(ever growing) many to many relationships) and I should switch to RDBMS?

It depends on your usecase.
The main question is do you actually know why you want to use MongoDB in the first place? Hopefully, the reason is not because of the trend. RDBMS's are still relevant and have their own usecases. For some applications RDBMS is the way to go for some it isn't.
Now back to your original question about many-to-many relations. As you have already researched there are ways to model those relationships in MongoDB. So that doesn't disqualify MongoDB as a database on its own. For example, to you need transactionality or referential integrity checks when you insert or delete records for those many to many relationships? If the answer to that is yes, then MongoDB may not be the perfect fit for your case.

When i first started working on MongoDB this exact question crossed my mind and during searching for the answer i read something very interesting (hope i had the link to that for you, but unfortunately i dont).
think of a real world problem where you have a many to many relation that just keeps on growing ? there may be very exceptional cases of such kind.
lets say many students are registered for many courses. Now a course may be registered by 100 students but for sure a student wont register for 100 courses, so you can simply in the student collection keep a array field for registered course ID's..
let's deep dive and say there are a bunch of super brilliant students who actually registered for 100 courses in such scenario a array field may not be a viable solution. Then ? how about a collection that just have student_id and course_id. This even exists in the RDBMS world too.
so the workarounds available should be enough to find and design an optimized solution for probably the most complex of the scenarios.

Related

MongoDB Extended Reference Pattern - Managing Data Duplication

I'm new to MongoDB and trying to wrap my head around managing duplicate data. The Extended Reference Pattern (link) is a good example. When you have two related collections (e.g., Customers and Orders), it can make sense for performance reasons to duplicate some information that would otherwise just live in the referenced collection. So for instance, the Order collection might duplicate the customer's name to avoid unnecessary joins with some queries.
I totally get that. And I totally get that you should be careful about what data you duplicate ("it works best if [duplicated fields] don't frequently change"), as updating those records can be expensive. What I don't understand is how you're supposed to keep track of where all that data is housed. Suppose you do need to update a customer's name. If that information duplicated in multiple orders within the Order Collection, plus maybe one or two other collections, tracking down where all the customer name lives (and the mechanics of changing it) sounds like a logistical nightmare!
Is there some sort of Mongo voodoo magic that can help with these sorts of updates, or is that just a necessarily messy process?
you have to manage all that changes on your app, so you have to take care when select one pattern or another, they are not silver bullets.
and remember not all the data need to be updated, depends of the situation, the data and the context of your app.

Single big collection for all products vs Separate collections for each Product category

I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.
The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.
There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.
Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.
Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?
I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".
So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?
Any help would be appreciated.
As you mentioned, you need to play with your data and use-case. You will have better picture.
Some decisions required as below.
Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data
Decide the number of indices required.
Decide the number of writes, reads per second.
Decide the size of documents per category.
Decide the query pattern.
Some inputs based on the requirements
If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.
As you have different set of fields per category, you could try with multiple collections.
There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.
If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.
If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.
Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.
The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.
As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.
If any of my points are incorrect, please let me know.
To SQL or to NoSQL?
I think that before you implement this in NoSQL, you should ask yourself why you are doing that. I quite like NoSQL but some data is definitely a better fit to that model than others.
The data you are describing is a classic case for a relational SQL DB. That's fine if it's a hobby project and you want to try NoSQL, but if this is for a production environment or client, you are likely making the situation more difficult for them.
Relational or non-relational?
You mention common fields across all products. If you wish to update these fields and have those updates reflected in all products, then you have relational data.
Background
It may be worth reading Sarah Mei 2013 article about this. Skip to the section "How MongoDB Stores Data" and read from there. Warning: the article is called "Why You Should Never Use MongoDB" and is (perhaps intentionally) somewhat biased against Mongo, so it's important to read this through the correct lens. The message you should get from this article is that MongoDB is not a good fit for every data type.
Two strategies for handling relational data in Mongo:
every time you update one of these common fields, update every product's document with the new common field data. This is generally only ok if you have few updates or few documents, but not both.
use references and do joins.
In Mongo, joins typically happen code-side (multiple db calls)
In Arango (and in other graph dbs, as well as some key-value stores), the joins happen db-side (single db call)
Decisions
These are important factors to consider when deciding which DB to use and how to model your data
I've used MongoDB, ArangoDB and Neo4j.
Mongo definitely has the best tooling and it's easy to find help, but I don't believe it's good fit in this case
Arango is quite pleasant to work with, but doesn't yet have the adoption that it deserves
I wouldn't recommend Neo4j to anyone looking for a NoSQL solution, as its nodes and relations only support flat properties (no nesting, so not real documents)
It may also be worth considering MariaDB or Postgres

MongoDB : where is the limit between "few" and "many"?

I am coming from the relational database world (Rails / PostgreSQL) and transitioning to the NoSQL world (Meteor / MongoDB), so I am learning about denormalization, embedding and true links.
It seems that, in many cases, choosing between various database schemas comes down to the number of documents that will be "related" to each others.
In this video series, the author distinguishes:
one-to-many relationships from one-to-few relationships
many-to-many relationships from few-to-few relationships
So, I am wondering: where is the limit between few and many?
I guess there may not be a hard number, but are we in the dozens, the hundreds, the thousands or the millions?
It's all relative and is really kind of a dangerous question to make assumptions about when you're designing an architecture. It's worth investing time to make the right choices for your schema and your setup. I would advise a few steps:
Do the math. Multiply your relationships out based on what you expect your application to need to do. If you have a few nested arrays or embedded documents, a couple of "one-to-few" can expand out to many documents pretty easily when you start $unwinding them.
Write a prototype. Do some basic testing on your expected hardware/environment to see if it can easily handle that load when you do queries for all the data.
Based on your testing, create the limitations. This is where you need to draw the line on how many relations you can create per document, for each relationship type, before the system breaks down.
If it were me, I would say one-to-few is less than a dozen, and one-to-many is theoretically unlimited, but practically in the millions. Maybe there should be a middle ground of "one-to-some" to indicate possibly hundreds.
Taken from 6 rules of thumb for MongoDB schema design:
one-to-few - two until few hundreds
one-to-many - couple of hundreds until few thousands
one-to-squillions - thousands and up
I totally agree with #womp about the need to choose the right scheme for your use case. The article I posted above has some good guidelines and examples of which schema design to use.

How one approaches for defining containers for MongoDB?

First of all, I have extensive experience on Relational DBs but very beginner level knowledge of Document DB. I'm exploring MongoDB but my question is in general to Document DB.
AFA I know (I may be wrong), A Document DB is consisting of containers and containers contain same of different object structures. These object structures are defined such a way that filters and information can be applied in most optimum way. For ex. A is written by Authors. So object of Book will contain list of authors also. This way searching can be made faster and performance can be gained.
What is my problem?
I'm creating an application (yet haven't started as I'm confused here). It's relational DB is something like this....
The problem is I'm not able to design the Document DB structure for this requirement.
Please somebody help my in designing such database or can give me idea on "What approach one should select while designing such database?"
This comes down to answering the following questions:
What are your most common access patterns? It is helpful to think of your API methods, or top 5-10 queries to decide how to organize.
What are your transactional needs? Which of these entity types occur together in transactions and queries?
How often do they change? Should you embed or reference?
If you could include these details, we can help with more targeted suggestions.
http://azure.microsoft.com/documentation/articles/documentdb-modeling-data/ is also worth a read if you haven't already.
The main difference between DocumentDB and relational databases/MongoDB is that collections are more like shards/partitions and not tables.

MongoDB - Denormalization / model opinion

I've been getting in to mongo, but coming from RDBMS background facing the probably obvious questions with regards to denormalisation and general data modelling.
If I have a document type with an array of sub docs, each sub doc has a status code.
In The relational world I would add a foreign key to the record, StatusId, simple.
In mongodb, would you denormalise the key pieces of data from the "status" e.g. Code and desc and hold objectid referencing another collection of proper status. I guess the next question is one of design, if the status doc is modified I'd then need to modified the denormalised data?
Another question on the same theme is how would you model a transaction table, say I have events and people, the events could be quite granular, say time sheets which over time may lead to many records. Based on what I've seen, this would seem like a good candidate for a child / sub array of docs, of course that could be indexed for speed.
Therefore is it possible to query / find just the sub array or part of it? And given the 16mb limit for doc size, and I just limited the transaction history of the person? Or should the transaction history be a separate collection with a onjid referencing the person?
Thanks for any input
Sam
Or should the transaction history be a separate collection with a onjid referencing the person?
Probably, I think this S/O question may help you understand why.
if the status doc is modified I'd then need to modified the denormalised data?
Yes this is standard trade-off in MongoDB. You will encounter this question a lot. You may need to leverage a Queue structure to ensure that data remains consistent across multiple collections.
Therefore is it possible to query / find just the sub array or part of it?
This is a tough one specific to MongoDB. With the basic query syntax, you have only limited support for dealing with arrays of objects. The new "Aggregration Framework" is actually much better here, but it's not available in a stable build.
All your "how to model this or that" can't really be answered, because good schema design depends on so many factors (access patters, hardware characteristics, is cluster used, etc).
if the status doc is modified I'd then need to modified the denormalised data?
Usually yes, that's the drawback of denormalisation. But sometimes you don't have to (some social network site stores user name with a photo tag and doesn't update it when user changes his name).
to query / find just the sub array or part of it?
It is not currently possible to fetch only a part of array (unless using map/reduce, of course).
And given the 4mb limit
Where did you get this from? It's 16mb at the moment.
While it's true that schema design does take into account many factors, the need to denormalize data usually comes up somewhere. I tend to take advantage of denormalization in my apps that use MongoDB because I feel it lends itself well storing denormalized data:
no additional column maintenance
support for hashes and arrays as field types (perfect for storing denormalized fields)
speedy, non-blocking writes make syncing data less expensive
document size growth only marginally affects performance up to limits (for the most part)
There are a few gems that help you manage denormalized data, including setting it up and keeping it in sync. If you're using Mongoid, you try mongoid_alize. DISCLAIMER: I am the author and maintainer of mongoid_alize.