I am designing an enterprise application and there is a big question for me if it is ok to use one Database per each aggregates in Domain-Driven Design and apply CQRS for them.
For example I have one Domain that contains several Bounded Context and each BC have two or more aggregates, so can i use a relational Database like MSSQL and no-sql Database like MongoDb for one or more aggregate?
The concept (Domain-Driven Design) do not discuss the exact implementations. So it does not limit the use of database implementations. Go ahead with what you are trying if it suits your use case. The only thing is to go through some planning ahead for design, which can change if required sometime later. I would recommend having event sourcing in the blend as well. It'll really help the through denormalization in the mix with CQRS.
The main concern is to take care of commands reflecting state consistently through all databases. For example, if you have one aggregate root having an entity and some value objects spread over multiple databases, make sure that all the adapters behave similarly so that the domain then has no concern over how the data is stored (separated) across databases. If that is achieved neatly, then domains are free to have only domain logic. I mean this in terms of how the interfaces are designed for multiple databases. If the NoSQL DB interface shows methods that convey documents and the SQL DB shows it works on the tables, the domain will definitely take a hit switching between documents and tables. Abstract that logic (may be using Hexagonal architecture) and you're in a good position with multiple DBs.
Related
I want to design a product which allows customers to create their own websites. A customer will be able to maintain his website's data model on the fly, do queries on it and display the output on a html page. I doubt a traditional RDMBS is the right choice for two reasons; with every customer the amount of data will grow and the RDBMS might reach its limits even if scaled. As the data model is highly dynamic doing many DDL queries will slow down the whole system.
I'm trying to figure out which database/datastorage system might be the best option for such a system. Recently I read a lot through NoSQL solutions like Cassandra and MongoDB and it looks promising in terms of performance but comes with a flaw: it's not relational data so data have to be denormalized.
I don't know what will be the impact of denormalizing a dynamic customer defined data model, because the customer models and inserts data first (in a relational way) and then does the queries afterwards. The denormalization has to happen automatically which leads to another problem: Can I create one table for each query, even if some queries might be similar? There might exist a high redundancy of data after a while.
Does creating/updating tables on the fly have any impact?
Every time the customer changes data the same data has to be changed in all tables which hold a copy of the same entity (like the name of an employee has to be changed in "team member" and also in "project task"). Are those updates costly?
Is it possible to nest data with unlimited depth like {"team": {"members": [{"name": "Ben"}]}}?
There might be even better/other approaches, I'm happy for any hints.
Adding clarification to the requirements
My question actually is, how can I use a NoSQL DB like Cassandra to maintain relational data and will the solution still perform better compared to a RDBMS?
The customer thinks relational (because in fact, data are always relational in my opinion) no matter what DBMS is used. And this service is not about letting the customer chose the underlying data storage. There can only be one.
A customer can define his own relational data model by using a management frontend provided by the application. The data model may be changed at any time by the customer. In RDBMS a DDL on a production system is not a good idea. On top of the data schema the customer can add named queries and use them as a data source on any web page he creates.
An example would be a query for News given the name "news" and in a web page it would be used like <ul><li query="news"><h1>[news.title]</h1></li></ul>, which would execute the query and iterate through the data and repeat the <li> on each iteration. That is the most simple example though.
In more complex examples if using SQL there might be extensive use of sub queries which performs bad. In NoSQL it seems there is the option to first denormalize and prepare a table with the data needed by the query and then just query that table. Any changes to involved data would lead to an update for that table. That means for every query the customer creates the system will automatically create and maintain a table and its data, so there will be a lot of data redundancy. Benchmarks state that Cassandra is fast in writing so that might be an option.
Let me put my 2 cents in.
Talking about of ability for users having own data models is not about SaaS.
In the pure SaaS paradigm, each user has the same functionality and data model. He could add his own objects, but not the classes of objects.
So scaling in this paradigm is a rather obvious (though frankly, it could be not so trivial) solution. You can get cloud DB with built-in multi-tenant support (like Azure, for example), you can use Amazon's RDS and add more instances as the user amount growth, you can use sharding (for instance, a partition by users) if the database supports it, etc.
But when we're talking about custom data model for each user is more like IaaS (infrastructure). It is some more low-level thing and you just say: "Ok, guys, you may build any data model you want, whatever".
And I believe that if you move the responsibility for the data model creation to the user, you should also move the responsibility for database selection, as IaaS provides. So the user would say:" "Ok, I need key-value database here" and you provide him Cassandra's table for example. If he wants RDBMS, you provide him one also.
Otherwise, you have to consider not the data model itself, but also the data strategy that your customer needs. So some customer may need to have key-value storage (that needs to be backed by some noSQL DB), the other may need RDBMS. How would you know it?
For instance, consider the entity from your example: {"team": {"members": [{"name": "Ben"}]}}. One user would use this model for the single type of queries something like "get the members for the team" and "add the member for the team". Another one user may need to query frequently for some stats information (average team player age, games played). And these two scenarios could demand different database types: first is for key-value search, the other is RDBMS. How would you guess the database type and structure as key-value storages are modeled around queries?
Technically, you may even try to guess the database type depending on the users' data model and queries, but you need to add some restrictions for users' creativity. Otherwise, it would be very untrivial task.
And about scaling, as each model is unique, you need to have add database instances as users grow. Of course, you can have multiple users in the single database instance in the different schemas, and you will need to determine the users' amount per instance by experiments or performance testing.
You may also look at the document-oriented databases, but I think that you need review your concept and make some changes. Maybe you have some obvious restrictions yet, but I just didn't get it from your post.
I am trying to create a redis based datastore with multiple fields that can be used to fetch the entity based on its value. The data would be something like;
Person<Entity>
Name
Address
Purchases<Another Entity>
Reviews<list of another Entity>
The same will also exist in other entites as this will be a many-to-many relationship between the different entities.
I am not considering traditional databases as I am looking for scalability and fault tolerance in such example.
What I am creating is the following
Hash of Entity id mapped to each entity object
Sets containing the association of say Person to Purchases and another for Purchases to Person and so on - one for both sides of a many to many relationship.
Since this design will involve a lot of overhead, I suspect there is some flaw in keeping this unnormalized. As for the choice of using a memory store over a database, I am considering query response time to be of critical value. I am looking for suggestions about my design as I am implementing this example to learn how to handle bigdata challenges.
I am looking for suggestions about my design as I am implementing this
example to learn how to handle bigdata challenges.
On what basis do you believe your challenges are Big Data? How much data we talking about? You need to ask yourself that question first before discounting relational databases as a solution that may likely meet your needs.
I am not considering traditional databases as I am looking for
scalability and fault tolerance in such example.
Redis and relational databases have the same scalability issue; they don't scale well horizontally unless you either implement or use a custom sharding technique. Redis Cluster is meant to address this, but it's a work in progress and not yet production ready, in the meantime you can use twemproxy. Developed by Twitter, it's a proxying solution to distribute keys across a cluster of redis servers.
I am trying to create a redis based datastore with multiple fields
that can be used to fetch the entity based on its value.
Redis is not designed to query based on values, period; read up on this and this to better understand why.
It seems to me that, at the end of the day, most NoSQL databases are at their core key/value stores, which means one should be able to build a layer which could be NoSQL database agnostic.
That layer would only use CRUD operations (put, set, delete), but would expose more advanced features, and you'd be able to switch with minimal effort the underlying DB whether it's Mongo, Redis, Cassandra, etc.
Would building something like this have value to many people, and does it already exist?
Thanks
NuoDB is an elastically scalable SQL/ACID database that uses a Key/Value model for storage. It runs on top of Amazon S3 today (as well as standard file systems) and could support any KV store in principle. For the moment it's access method is SQL, but the system could readily support other data access languages and methods if that is a common requirement.
Barry Morris, NuoDB Inc.
There's kundera and DataNucleus
UnQL means Unstructured Query Language. It's an open query language for JSON, semi-structured and document databases.
It's next to impossible to build such thing.
As a thought experiment, I suggest that you take, for example, Redis, MongoDB and Cassandra, and design an API of such layer.
These NoSQL solutions have drastically different characteristics and they serve different purposes. Trying to build a common API for them is like building a common API for SQL database, spreadsheet document, plain text file and gmail.
While you can certainly come up with something, it will completely pointless.
Different needs call for different tools.
PlayOrm is another solution that is built on cassandra but has a pluggable interface for hbase, mongodb, etc. etc. 20/30 years ago they said the same thing about RDBMS, but more and more the featuresets converged. I suspect you will see alot of that in nosql database's as well as they adopt each other's feature sets.
currently, they have vastly different feature sets but at the core there is a set of operations that is very very similar.
PlayOrm actually builds it's query language which works on any noSQL provider as well, so it's S-SQL scalable SQL can work with cassandra, hadoop, etc. etc.
later,
Dean
Does it make sense to break up the data model of an application into different database systems? For example, the application stores all user data and relationships in a graph database (ideal for storing relationships), while storing other data in a document database, such as CouchDB or MongoDB? This would require the user graph database to reference unique ids in the document databases and vice versa.
Is this over complicating the data model and application? Or is this using the best uses of both types of database systems for scaling your application?
It definitely can make sense and depends fully on the requirements of your application. If you can use other database systems for things in which they are really good at.
Take for example full text search. Of course you can do more or less complex full text searches with a relational database like MySql. But there are systems like e.g. Lucene/Solr which are optimized for such things and can search fast in millions of documents. So you could use these systems for their special task (here: make a nifty full text search), then you return the identifiers and maybe load the relational structured data from the RDBMS.
Or CouchDB. I use couchDB in some projects as a caching systems. In combination with a relational database. Of course I need to care about consistency, but it it's definitely worth the effort. It pushed performance in the projects a lot and decreases for example load on the server from 2 to 0.2. :)
Something like this is for instance called cross-store persistence. As you mentioned you would store certain data in your relational database, social relationships in a graphdb, user-generated data (documents) in a document-db and user provided multimedia files (pictures, audio, video) in a blob-store like S3.
It is mainly about looking at the use-cases and making sure that from wherever you need it you might access the "primary" or index key of each store (back and forth). You can encapsulate the actual lookup in your domain or dao layer.
Some frameworks like the Spring Data projects provide some initial kind of cross-store persistence out of the box, mostly integrating JPA with a different NOSQL datastore. For instance Spring Data Graph allows it to store your entities in JPA and add social graphs or other highly interconnected data as a secondary concern and leverage a graphdb for the typical traversal and other graph operations (e.g. ranking, suggestions etc.)
Another term for this is polyglot persistence.
Here are two contrary positions on the question:
Pro:
"Contrary to that, I’m a big fan of polyglot persistence. This simply means using the right storage backend for each of your usecases. For example file storages, SQL, graph databases, data ware houses, in-memory databases, network caches, NoSQL. Today there are mostly two storages used, files and SQL databases. Both are not optimal for every usecase."
http://codemonkeyism.com/nosql-polyglott-persistence/
Con:
"I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency."
http://nosql.mypopescu.com/post/1529816758/why-redis-and-memcached-cassandra-lucene
The more I read/use non-sql databases, the more I love it.
It's so for the OOP world and it's easy to use, like Rails for Frameworks.
I know the disadvantages. The major concern seems to be the no-transaction and no-concurrency part. Am I correct?
Are these the only features making it hard for developers to choose to use non-sql databases entirely, even for transactions?
If these features were fixed, would it be more OK to only use document-based databases for an application?
Cause now it seems like you still have to use a RDBMS for customer billing data while your content could be in document-based databases like MongoDB/CouchDB/Cassandra.
Could someone shed a light on this.
Yes of course you can build entire applications on non-relational data models. As a general rule though most people don't want to do that. The problem is that hierarchical/graph based data models (ie. any model that depends on navigational data structures) significantly increase the complexity and reduce the effectiveness of queries and data integrity in the database. The relational model was invented 40 years ago precisely to overcome those disadvantages inherent in navigation-based databases.
No.
They do not seem to be appropriate for fixed-schema, high-volume, mostly numerical data. Think data warehousing. Think ad-hoc analytical queries. They could take over all (or some of the) areas where RDBMS have not been a good fit in the first place (areas where people came up with XML databases, and object-oriented databases, and graph databases, and so on).
This is just like Excel not being able to replace Word (also admittedly, most Excel files I see these days are more presentation than spreadsheet). Different tools for different tasks.
In short, many NoSql solutions don't have cascading updates so if your application's data schema requires this, you will either update multiple documents (ie columns whatever) programatically, or stick with a sql based solution to handle this.
Concurrency is handled differently for different solutions.
I think this blog does a good job at explaining some of the trade-offs using a NoSQL solution
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
It depends also on the development of new, faster hardware.
You can distribute your database over multiple cheap computers (scaling out) if you use Cassandra and MongoDB. There will always be data sets that are too large for one computer because people collect and keep more data when it is possible to collect and keep more data.
However most data sets fit on one computer and can be stored in a SQL database. It is also possible to scale out a SQL database but foreign keys and complex transactions become slow when you distribute your data over multiple machines.
You have to make some tough choices when you distribute your data over multiple machines: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html, you don't have to worry about the CAP theorem if all your data is on one machine.