My startup project require complex data structure, which must be ready for geolocation and fulltext search. The information in the database are added by two different types of users, who together build a complex relationship network.
For a better understanding of my questions, here is a simple diagram that shows this relationship:
My first choice was MongoDB and Elasticsearch, but I noticed the problem of multiplication of the same data. Further planning, we concluded that for certain parts of the application need ACID transactions possibilities.
What NoSQL database is good for complex many to many relationships?
What would be a good choice for us?
A graph database like Neo4j is perfect for that.
If you need features of a document database like MongoDB, but with Neo4j under the hood, try Structr: http://de.slideshare.net/AxelMorgner/neo4j-as-document-database http://structr.org
(Disclaimer: I'm the project inititator of Structr)
Related
First of all, I have extensive experience on Relational DBs but very beginner level knowledge of Document DB. I'm exploring MongoDB but my question is in general to Document DB.
AFA I know (I may be wrong), A Document DB is consisting of containers and containers contain same of different object structures. These object structures are defined such a way that filters and information can be applied in most optimum way. For ex. A is written by Authors. So object of Book will contain list of authors also. This way searching can be made faster and performance can be gained.
What is my problem?
I'm creating an application (yet haven't started as I'm confused here). It's relational DB is something like this....
The problem is I'm not able to design the Document DB structure for this requirement.
Please somebody help my in designing such database or can give me idea on "What approach one should select while designing such database?"
This comes down to answering the following questions:
What are your most common access patterns? It is helpful to think of your API methods, or top 5-10 queries to decide how to organize.
What are your transactional needs? Which of these entity types occur together in transactions and queries?
How often do they change? Should you embed or reference?
If you could include these details, we can help with more targeted suggestions.
http://azure.microsoft.com/documentation/articles/documentdb-modeling-data/ is also worth a read if you haven't already.
The main difference between DocumentDB and relational databases/MongoDB is that collections are more like shards/partitions and not tables.
I'm hearing more about NoSQL, but have yet had someone give me a clear explanation of how it is to be used instead of relational databases.
I've read that it can't do left joins, so I was trying to figure out how you'd be able to use such a data storage. From reading: Preserve Joins by code in MongoDB it seems like a suggestion is to just make a large table, as if you already did the joins on it.
If the above statement is true, then I can see how it can be used. However I'm curious on how you'd handle repeat data. As the concept of normalizing, helps you remove the redundancy and ensure consistency in the data (e.g. Slight modifications like capitalization, white space, etc)...
Are we simply sacrificing the consistency of the data for scalable speed, or am I missing something?
Edit
I've been doing some more digging and found the answers the following questions useful for clarifying my understanding:
Why Google's BigTable referred as a NoSQL database?
How do you track record relations in NoSQL?
My understanding of consistency seems to be correct from those answers. And it looks like NoSQL is suppose to be used for specific problems types and that if you need relations that you should use a relational database.
But this raises more questions like:
It makes me wonder about real life examples of when to use NoSQL versus when not to?
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
MongoDB has the ability to have documents which include arrays of other documents. This solves many cases where you would have relations in reational databases.
When an invoice has multiple positions, you wouldn't put these positions into a separate collection. You would embed them as an array.
It makes me wonder about real life examples of when to use NoSQL versus when not to?
There are many different NoSQL databases, each one designed with different use-cases in mind. But you tagged this question as MongoDB, so I assume that you mean MongoDB in particular.
MongoDB has two main advantages over relational databases.
First, it scales well.
When the database is too slow or too big, you can easily add more servers by creating a cluster or replica-set of multiple shards. This doesn't work nearly as well with most relational databases.
Second, it allows heterogeneous data.
Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.
What can't it do?
As a rather new technology, there isn't that much literature about it. The software ecosystem around it isn't that well either. The tools you can get for relational databases are often much more shiny.
There are also some use-cases MongoDB isn't well-suited for.
MongoDB doesn't do JOINs. When your data is very relational and denormalizing it would be counter-productive, it might be a poor choice for your product. But you might want to take a look at graph databases like Neo4j, which focus even more on relations than relational databases. Update 2016: MongoDB 3.2 now has rudimentary JOIN support with the $lookup aggregation stage, but it's still very limited in functionality compared to relational and graph databases.
MongoDB doesn't do transactions. At least not complex transactions. Certain actions which only affect a single document are guaranteed to be atomic, but as soon as you affect more than one document, you can't guarantee that no other query will happen in-between and find an inconsistent state.
MongoDB is bad for ad-hoc reporting. Its options for data-mining are severely limited. The rather new aggregation functions help and MapReduce can also solve some surprisingly complex problems when you learn to use it smart, but SQL has usually the better tools for things like that.
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Relational databases are around for about 40 years. Their theory is a well-researched topic in computer science. There are whole libraries of books written about the theory behind them. There is a by-the-book solution for every imaginable corner-case by now.
But NoSQL databases, on the other hand, are a rather new technology. We are still figuring out the best practices. The most frequent advise is: "Use your own head. Think about what queries are performed most often, and optimize your data schema for them."
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
When possible I would advise against using two different database technologies in the same product:
Anyone who maintains and supports the product must be familiar with both technologies
Troubleshooting gets a lot harder
The sysadmins need to keep an additional database running and updated
You have an additional point of failure which can lead to downtime
I would only recommend to mix database technologies when fulfilling your requirements without it doesn't just become hard but physically impossible. Otherwise, make your pick and stay with it.
When it comes to NoSQL, there are bewildering number of choices to select a specific NoSQL database as is clear in the NoSQL wiki.
In my application I want to replace mysql with NOSQL alternative. In my application I have user table which has one to many relation with large number of other tables. Some of these tables are in turn related to yet other tables. Also I have a user connected to another user if they are friends.
I do not have documents to store, so this eliminates document oriented NoSQL databases.
I want very high performance.
The NOSQL database should work very well with Play Framework and scala language.
It should be open source and free.
So given above, what NoSQL database I should use?
I think you may be misunderstanding the nature of "document databases". As such, I would recommend MongoDB, which is a document database, but I think you'll like it.
MongoDB stores "documents" which are basically JSON records. The cool part is it understands the internals of the documents it stores. So given a document like this:
{
"name": "Gregg",
"fave-lang": "Scala",
"fave-colors": ["red", "blue"]
}
You can query on "fave-lang" or "fave-colors". You can even index on either of those fields, even the array "fave-colors", which would necessitate a many-to-many in relational land.
Play offers a MongoDB plugin which I have not used. You can also use the Casbah driver for MongoDB, which I have used a great deal and is excellent. The Rogue query DSL for MongoDB, written by FourSquare is also worth looking at if you like MongoDB.
MongoDB is extremely fast. In addition you will save yourself the hassle of writing schemas because any record can have any fields you want, and they are still searchable and indexable. Your data model will probably look much like it does now, with a users "collection" (like a table) and other collections with records referencing a user ID as needed. But if you need to add a field to one of your collections, you can do so at any time without worrying about the older records or data migration. There is technically no schema to MongoDB records, but you do end up organizing similar records into collections.
MongoDB is one of the most fun technologies I have happened to come across in the past few years. In that one happy Saturday I decided to check it out and within 15 minutes was productive and felt like I "got it". I routinely give a demo at work where I show people how to get started with MongoDB and Scala in 15 minutes and that includes installing MongoDB. Shameless plug if you're into web services, here's my blog post on getting started with MongoDB and Scalatra using Casbah: http://janxspirit.blogspot.com/2011/01/quick-webb-app-with-scala-mongodb.html
You should at the very least go to http://try.mongodb.org
That's what got me started.
Good luck!
At this point the answer is none, I'm afraid.
You can't just convert your relational model with joins to a key-value store design and expect it to be a 1:1 mapping. From what you said it seems that you do have joins, some of them recursive, i.e. referencing another row from the same table.
You might start by denormalizing your existing relational schema to move it closer to a design you wish to achieve. Then, you could see more easily if what you are trying to do can be done in a practical way, and which technology to choose. You may even choose to continue using MySQL. Just because you can have joins doesn't mean that you have to, which makes it possible to have a non-relational design in a relational DBMS like MySQL.
Also, keep in mind - non-relational databases were designed for scalability - not performance! If you don't have thousands of users and a server farm a traditional relational database may actually work better for you.
Hmm, You want very high performance of traversal and you use the word "friends". The first thing that comes to mind is Graph Databases. They are specifically made for this exact case.
Try Neo4j http://neo4j.org/
It's is free, open source, but also has commercial support and commercial licensing, has excellent documentation and can be accessed from many languages (REST interface).
It is written in java, so you have native libraries or you can embedd it into your java/scala app.
Regarding MongoDB or Cassendra, you now (Dec. 2016, 5 years late) try longevityframework.org.
Build your domain model using standard Scala idioms such as case classes, companion objects, options, and immutable collections. Tell us about the types in your model, and we provide the persistence.
See "More Longevity Awesomeness with Macro Annotations! " from John Sullivan.
He provides an example on GitHub.
If you've looked at longevity before, you will be amazed at how easy it has become to start persisting your domain objects. And the best part is that everything persistence related is tucked away in the annotations. Your domain classes are completely free of persistence concerns, expressing your domain model perfectly, and ready for use in all portions of your application.
What would be the equivalent of ERD for a NoSQL database such as MongoDB?
It looks like you asked a similar question on Quora.
As mentioned there, the ERD is simply a mapping of the data you intend to store and the relations amongst that data.
You can still make an ERD with MongoDB as you still want to track the data and the relations. The big difference is that MongoDB has no joins, so when you translate the ERD into an actual schema you'll have to make some specific decisions about implement the relationships.
In particular, you'll need to make the "embed vs. reference" decision when deciding how this data will actually be stored. Relations are still allowed, just not enforced. Many of the wrappers for MongoDB actually provide lookups across collections to abstract some of this complexity.
Even though MongoDB does not enforce a schema, it's not recommended to proceed completely at random. Modeling the data you expect to have in the system is still a really good idea and that's what the ERD provides you.
So I guess the equivalent to the ERD is the ERD?
You could just use a UML class diagram instead too.
Moon Modeler supports schema design for MongoDB. It allows users to define diagrams with nested structures.
I know of no standard means of diagramming document-oriented "schema".
I'm sure you could use an ERD to map out your schemata but since document databases do not truly support--or more importantly enforce--relationships between data, it would only be as useful as your code was disciplined to internally enforce such relationships.
I have been thinking about the same issue for quite some time.
And I came to the following conclusion:
If NoSQL databases are generally schemaless, you don't actually have a 'schema' to illustrate in a diagram.
Thus, I think you should take a "by example" approach.
You could draw some mindmaps exemplifying how your data would look like when stored in a NoSQL DB such as MongoDB.
And since these databases are very dynamic you could also create some derived mindmaps to show how the data from today could evolve in time.
Take a look at this topic too.
Confusion about NoSQL Design
MongoDB does support 'joins', just not in the SQL sense of INNER JOIN (the default SQL join). While the concept of 'join' is typically associated with SQL, MongoDB does have the aggregation framework with its data processing pipeline stages. The $lookup pipeline stage is used to create the equivalent of a LEFT JOIN in SQL. That is, all documents on the left of a relationship will be pass through the pipeline, as well as any relating documents on the right side of the relationship. The documents are modified to include the relationship as part of the new documents.
Consequently, I postulate that Entity Relationship Diagrams do have a role in MongoDB. Documents are certainly related to each other in the db, and we should have a visualization of these relationships, including the cardinality relationship, e.g. full participation, partial participation, weak/strong entities, etc.
Of course, MongoDB also introduces the concept of embedded documents and referenced documents, and so I argue it adds additional flavor to the model of the ERD. And I certainly would want to see embedded and referenced relationships mapped out in a visual diagram.
The remaining question is so what is out there? What is out there for Mongoose for NodeJS? Mongoid for Ruby? etc. If you check the respective repositories for their corresponding ORMs (Object Relational Mappers), then you will see there are ERDs for them. But in terms of their completeness, perhaps there is a lot to be desired and the open source community is welcome to make contributions.
https://www.npmjs.com/package/mongoose-erd
https://rubygems.org/gems/railroady
I have an MSSQL database which I am considering porting to CouchDB or MongoDB. I have a many-to-many relationship within the SQL db which has hundreds of thousands rows in the xref table, corresponding to tens of thousands of rows in the tables on each side of the relationship. Will CouchDB and/or MongoDB be able to handle this data, and what would be the best way of formatting the relevant documents for performant querying? Thanks very much.
For CouchDB, I would highly recommend reading this article about Entity Relationships.
One thing I would note in CouchDB is to be careful of attempting to "normalize" a non-relational data model. The document-based storage offers you a great deal of flexibility, and it's seldom the best idea to abstract everything into as many "document types" as you can think of. Many times, it's best to leave much of your data within the same document unless you have clear cases where separate entities exist.
One common use-case of many-to-many relationships is implementing tagging. There are articles about different methods you can use to accomplish this in CouchDB. It may apply to your requirements, it may not, but it's probably worth a read.
Since the 'collection' model of MongoDB is similar to tables you can of course maintain
the m:n relationship inside a dedicated mapping collection (using the _id of the related documents of the referenced documents from other collections).
If you can: consider redesign your application using embedded documents.
http://www.mongodb.org/display/DOCS/Schema+Design
In general: try to turn off your memories to a RDBMS when working with MongoDB.
Blindly copying the database design from RDBMS to MongoDB is neither helpful nor adviceable nor will it work in general.