I have an MSSQL database which I am considering porting to CouchDB or MongoDB. I have a many-to-many relationship within the SQL db which has hundreds of thousands rows in the xref table, corresponding to tens of thousands of rows in the tables on each side of the relationship. Will CouchDB and/or MongoDB be able to handle this data, and what would be the best way of formatting the relevant documents for performant querying? Thanks very much.
For CouchDB, I would highly recommend reading this article about Entity Relationships.
One thing I would note in CouchDB is to be careful of attempting to "normalize" a non-relational data model. The document-based storage offers you a great deal of flexibility, and it's seldom the best idea to abstract everything into as many "document types" as you can think of. Many times, it's best to leave much of your data within the same document unless you have clear cases where separate entities exist.
One common use-case of many-to-many relationships is implementing tagging. There are articles about different methods you can use to accomplish this in CouchDB. It may apply to your requirements, it may not, but it's probably worth a read.
Since the 'collection' model of MongoDB is similar to tables you can of course maintain
the m:n relationship inside a dedicated mapping collection (using the _id of the related documents of the referenced documents from other collections).
If you can: consider redesign your application using embedded documents.
http://www.mongodb.org/display/DOCS/Schema+Design
In general: try to turn off your memories to a RDBMS when working with MongoDB.
Blindly copying the database design from RDBMS to MongoDB is neither helpful nor adviceable nor will it work in general.
Related
My startup project require complex data structure, which must be ready for geolocation and fulltext search. The information in the database are added by two different types of users, who together build a complex relationship network.
For a better understanding of my questions, here is a simple diagram that shows this relationship:
My first choice was MongoDB and Elasticsearch, but I noticed the problem of multiplication of the same data. Further planning, we concluded that for certain parts of the application need ACID transactions possibilities.
What NoSQL database is good for complex many to many relationships?
What would be a good choice for us?
A graph database like Neo4j is perfect for that.
If you need features of a document database like MongoDB, but with Neo4j under the hood, try Structr: http://de.slideshare.net/AxelMorgner/neo4j-as-document-database http://structr.org
(Disclaimer: I'm the project inititator of Structr)
Just read a lecture primarily about MongoDB which states that NOSQL is not appropriate for data which is naturally joined? Can't really understand why this is could someone please explain?
MongoDB does not support JOINs. The reason is that MongoDB is build for clustering, which means that the data is distributed over multiple independent servers. When the data required for a JOIN is distributed over multiple machines on a network, it gets hard to implement it in a performant way. So the MongoDB developers decided to do without joins so they can prioritize scalability.
For that reason it is usually better to model 1:n relations by embedding the sub-documents in the parent document. This works well to model compositions where the child-documents are inseparable from the parent (one invoice consists of multiple positions) but not so much for aggregations where the child-documents can switch parents or even exist independent from the parent-document (one department has multiple employees). And it doesn't work at all for n:m relations (one usergroup has multiple members, and each member can be in multiple usergroups).
When you have such a situation where embedding isn't reasonably possible, you need to mimic joins on the application layer, which is slow because each query requires multiple network-roundtrips.
There are, however, NoSQL databases which work much better with handling relations between data. One subset of NoSQL are graph databases like Neo4j. These handle complex relations between entities even better than relational databases in many cases.
What is exactly demoralization in Nosql databases?
I have read it means modelling different object types as different documents. My first guess was it means Aggregation without storing related data, i.e storing all rows of an entity in a single document with related data being referred by different documents for each row.
But I'm not sure if this is correct or not?
An example would be helpful.
Thanks in advance
I do mean demoralization and not denormalization. I came across this term in the following links:
1. Couchbase documentation
2. Blog on Nosql
In the context of NoSQL (and database in general), demoralization is synonymous to denormalization. You can find mixed usage of demoralization and denormalization in many documents, or mention of demoralization being the opposite of normalization (so again, the same as denormalization) :
What Is Meant By Denormalization In SQL?
Database Denormalization
What is demoralization?
Normalization & Demoralization
Designing databases - OLTP and OLAP
There is even that reference, which mention that some/many spell checkers suggest "demoralization" instead of "denormalization". This could explain why some people use demoralization : The effect of denormalization
NoSQL is a very, very wide field. It covers a lot of entirely different databases systems with entirely different concepts of how data should be structured.
The dogma of database normalization applies mostly to classic relational databases. The further a NoSQL database is away from the relational philosophy, the more do you have to question this dogma.
The philosophy of normalization assumes that database JOINs are cheap. So any data which can be split over multiple tables to remove redundancies should be split. But that doesn't apply to all NoSQL databases. Some of them don't support JOIN operations, so getting data stored in many different database entries can be a very expensive operation which either requires multiple consecutive queries to the database or expensive database-sided code execution. When you use one of those databases, you should store your data in a way that every performance-critical use-case can be fulfilled by looking up as few entries as possible, even when this means that you will have redundant data.
Those non-relational NoSQL databases which don't support JOINs frequently support arrays in database entries instead. These are usually the preferred way to model 1:n relations. So when 1 person has n telephone numbers, you wouldn't store the telephone numbers in a separate table/document/collection/whateveryoucallit, you would store them in an array in the person entry. There is usually no reason to handle telephone numbers as self-sustained entities when it wouldn't be for the inability of SQL to work properly with multiple values in a single field.
Denormalization in a NoSQL world would mean the same as in a RDBMS world. Duplication of data for read performance.
What would be the equivalent of ERD for a NoSQL database such as MongoDB?
It looks like you asked a similar question on Quora.
As mentioned there, the ERD is simply a mapping of the data you intend to store and the relations amongst that data.
You can still make an ERD with MongoDB as you still want to track the data and the relations. The big difference is that MongoDB has no joins, so when you translate the ERD into an actual schema you'll have to make some specific decisions about implement the relationships.
In particular, you'll need to make the "embed vs. reference" decision when deciding how this data will actually be stored. Relations are still allowed, just not enforced. Many of the wrappers for MongoDB actually provide lookups across collections to abstract some of this complexity.
Even though MongoDB does not enforce a schema, it's not recommended to proceed completely at random. Modeling the data you expect to have in the system is still a really good idea and that's what the ERD provides you.
So I guess the equivalent to the ERD is the ERD?
You could just use a UML class diagram instead too.
Moon Modeler supports schema design for MongoDB. It allows users to define diagrams with nested structures.
I know of no standard means of diagramming document-oriented "schema".
I'm sure you could use an ERD to map out your schemata but since document databases do not truly support--or more importantly enforce--relationships between data, it would only be as useful as your code was disciplined to internally enforce such relationships.
I have been thinking about the same issue for quite some time.
And I came to the following conclusion:
If NoSQL databases are generally schemaless, you don't actually have a 'schema' to illustrate in a diagram.
Thus, I think you should take a "by example" approach.
You could draw some mindmaps exemplifying how your data would look like when stored in a NoSQL DB such as MongoDB.
And since these databases are very dynamic you could also create some derived mindmaps to show how the data from today could evolve in time.
Take a look at this topic too.
Confusion about NoSQL Design
MongoDB does support 'joins', just not in the SQL sense of INNER JOIN (the default SQL join). While the concept of 'join' is typically associated with SQL, MongoDB does have the aggregation framework with its data processing pipeline stages. The $lookup pipeline stage is used to create the equivalent of a LEFT JOIN in SQL. That is, all documents on the left of a relationship will be pass through the pipeline, as well as any relating documents on the right side of the relationship. The documents are modified to include the relationship as part of the new documents.
Consequently, I postulate that Entity Relationship Diagrams do have a role in MongoDB. Documents are certainly related to each other in the db, and we should have a visualization of these relationships, including the cardinality relationship, e.g. full participation, partial participation, weak/strong entities, etc.
Of course, MongoDB also introduces the concept of embedded documents and referenced documents, and so I argue it adds additional flavor to the model of the ERD. And I certainly would want to see embedded and referenced relationships mapped out in a visual diagram.
The remaining question is so what is out there? What is out there for Mongoose for NodeJS? Mongoid for Ruby? etc. If you check the respective repositories for their corresponding ORMs (Object Relational Mappers), then you will see there are ERDs for them. But in terms of their completeness, perhaps there is a lot to be desired and the open source community is welcome to make contributions.
https://www.npmjs.com/package/mongoose-erd
https://rubygems.org/gems/railroady
Before I dive really deep into MongoDB for days, I thought I'd ask a pretty basic question as to whether I should dive into it at all or not. I have basically no experience with nosql.
I did read a little about some of the benefits of document databases, and I think for this new application, they will be really great. It is always a hassle to do favourites, comments, etc. for many types of objects (lots of m-to-m relationships) and subclasses - it's kind of a pain to deal with.
I also have a structure that will be a pain to define in SQL because it's extremely nested and translates to a document a lot better than 15 different tables.
But I am confused about a few things.
Is it desirable to keep your database normalized still? I really don't want to be updating multiple records. Is that still how people approach the design of the database in MongoDB?
What happens when a user favourites a book and this selection is still stored in a user document, but then the book is deleted? How does the relationship get detached without foreign keys? Am I manually responsible for deleting all of the links myself?
What happens if a user favourited a book that no longer exists and I query it (some kind of join)? Do I have to do any fault-tolerance here?
MongoDB doesn't support server side foreign key relationships, normalization is also discouraged. You should embed your child object within parent objects if possible, this will increase performance and make foreign keys totally unnecessary. That said it is not always possible, so there is a special construct called DBRef which allows to reference objects in a different collection. This may be then not so speedy because DB has to make additional queries to read objects but allows for kind of foreign key reference.
Still you will have to handle your references manually. Only while looking up your DBRef you will see if it exists, the DB will not go through all the documents to look for the references and remove them if the target of the reference doesn't exist any more. But I think removing all the references after deleting the book would require a single query per collection, no more, so not that difficult really.
If your schema is more complex then probably you should choose a relational database and not nosql.
There is also a book about designing MongoDB databases: Document Design for MongoDB
UPDATE The book above is not available anymore, yet because of popularity of MongoDB there are quite a lot of others. I won't link them all, since such links are likely to change, a simple search on Amazon shows multiple pages so it shouldn't be a problem to find some.
See the MongoDB manual page for 'Manual references' and DBRefs for further specifics and examples
Above, #TomaaszStanczak states
MongoDB doesn't support server side foreign key relationships,
normalization is also discouraged. You should embed your child object
within parent objects if possible, this will increase performance and
make foreign keys totally unnecessary. That said it is not always
possible ...
Normalization is not discouraged by Mongo. To be clear, we are talking about two fundamentally different types of relationships two data entities can have. In one, one child entity is owned exclusively by a parent object. In this type of relationship the Mongo way is to embed.
In the other class of relationship two entities exist independently - have independent lifetimes and relationships. Mongo wishes that this type of relationship did not exist, and is frustratingly silent on precisely how to deal with it. Embedding is just not a solution. Normalization is not discouraged, or encouraged. Mongo just gives you two mechanisms to deal with it; Manual refs (analoguous to a key with the foreign key constraint binding two tables), and DBRef (a different, slightly more structured way of doing the same). In this use case SQL databases win.
The answers of both Tomasz and Francis contain good advice: that "normalization" is not discouraged by Mongo, but that you should first consider optimizing your database document design before creating "document references". DBRefs were mentioned by Tomasz, however as he alluded, are not a "magic bullet" and require additional processing to be useful.
What is now possible, as of MongoDB version 3.2, is to produce results equivalent to an SQL JOIN by using the $lookup aggregation pipeline stage operator. In this manner you can have a "normalized" document structure, but still be able to produce consolidated results. In order for this to work you need to create a unique key in the target collection that is hopefully both meaningful and unique. You can enforce uniqueness by creating a unique index on this field.
$lookup usage is pretty straightforward. Have a look at the documentation here: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#lookup-aggregation. Run the aggregate() method on the source collection (i.e. the "left" table). The from parameter is the target collection (i.e. the "right" table). The localField parameter would be the field in the source collection (i.e. the "foreign key"). The foreignField parameter would be the matching field in the target collection.
As far as orphaned documents, from your question I would presume you are thinking about a traditional RDBMS set of constraints, cascading deletes, etc. Again, as of MongoDB version 3.2, there is native support for document validation. Have a look at this StackOver article: How to apply constraints in MongoDB? Look at the second answer, from JohnnyHK
Packt Publishers have a bunch of good books on MongoDB. (Full Disclosure: I wrote a couple of them.)