I'm not sure if stack-overflow is a right platform to ask the title question. I'm in dilemma as to which front-end and back-end stack should i consider for developing a health related web application?
I heartily appreciate any suggestions or recommendations. Thanks.
You will need to have a look at your data, if it is relational, I would personally go for a SQL server such Microsoft SQL Server, MySQL or Postgres. If your data is non-relational you can go for something like Mongo.
Here is an image that explains how relational data and non-relational data work:
I'm not saying that MongoDB is bad, it all depends on your data and how you would like to structure your data. Obviously when you're working with healthcare data such as patient data there are certain laws you need to adhere to, especially in the United States with HIPPA, but I am sure almost every country has one of those.
The implications might be that you need to encrypt any data stored in the database, and that's one of the benefits of a relational database as most of them have either TDE (Transparent Data Encryption) or Encryption at Rest. Which means that your data is secured when in use and when not in use, respectively.
When it comes to the front-end you can look at Javascript frameworks such as Angular, Vue, React and then for your backend you can choose pretty much anything that you know well such as NodeJS or .NET Core or Go, pick your poison, each of them have their advantages and drawbacks so you will need to investigate your options before committing to one or the other.
It depends on your data structures. You can use MongoDB with dynamic schemas that eliminates the need for a predefined data structure. So you can use MongoDB when you have a dynamic dataset which is less relational. In the other hand, MongoDB is natively scalable. So you can store a large amount of data without much trouble.
Use a relational DB system when you have highly relational entities. SQL enables you to have complex transactions between entities with high reliability.
MongoDB/NoSQL
High write load
Unstable Schemas
Can handle big amount of data
High availability
SQL
Data structure fits for tables and row
Strict relationships among entities
Complex queries
Frequent updates in a large number of records
Related
It seems to me that, at the end of the day, most NoSQL databases are at their core key/value stores, which means one should be able to build a layer which could be NoSQL database agnostic.
That layer would only use CRUD operations (put, set, delete), but would expose more advanced features, and you'd be able to switch with minimal effort the underlying DB whether it's Mongo, Redis, Cassandra, etc.
Would building something like this have value to many people, and does it already exist?
Thanks
NuoDB is an elastically scalable SQL/ACID database that uses a Key/Value model for storage. It runs on top of Amazon S3 today (as well as standard file systems) and could support any KV store in principle. For the moment it's access method is SQL, but the system could readily support other data access languages and methods if that is a common requirement.
Barry Morris, NuoDB Inc.
There's kundera and DataNucleus
UnQL means Unstructured Query Language. It's an open query language for JSON, semi-structured and document databases.
It's next to impossible to build such thing.
As a thought experiment, I suggest that you take, for example, Redis, MongoDB and Cassandra, and design an API of such layer.
These NoSQL solutions have drastically different characteristics and they serve different purposes. Trying to build a common API for them is like building a common API for SQL database, spreadsheet document, plain text file and gmail.
While you can certainly come up with something, it will completely pointless.
Different needs call for different tools.
PlayOrm is another solution that is built on cassandra but has a pluggable interface for hbase, mongodb, etc. etc. 20/30 years ago they said the same thing about RDBMS, but more and more the featuresets converged. I suspect you will see alot of that in nosql database's as well as they adopt each other's feature sets.
currently, they have vastly different feature sets but at the core there is a set of operations that is very very similar.
PlayOrm actually builds it's query language which works on any noSQL provider as well, so it's S-SQL scalable SQL can work with cassandra, hadoop, etc. etc.
later,
Dean
Does it make sense to break up the data model of an application into different database systems? For example, the application stores all user data and relationships in a graph database (ideal for storing relationships), while storing other data in a document database, such as CouchDB or MongoDB? This would require the user graph database to reference unique ids in the document databases and vice versa.
Is this over complicating the data model and application? Or is this using the best uses of both types of database systems for scaling your application?
It definitely can make sense and depends fully on the requirements of your application. If you can use other database systems for things in which they are really good at.
Take for example full text search. Of course you can do more or less complex full text searches with a relational database like MySql. But there are systems like e.g. Lucene/Solr which are optimized for such things and can search fast in millions of documents. So you could use these systems for their special task (here: make a nifty full text search), then you return the identifiers and maybe load the relational structured data from the RDBMS.
Or CouchDB. I use couchDB in some projects as a caching systems. In combination with a relational database. Of course I need to care about consistency, but it it's definitely worth the effort. It pushed performance in the projects a lot and decreases for example load on the server from 2 to 0.2. :)
Something like this is for instance called cross-store persistence. As you mentioned you would store certain data in your relational database, social relationships in a graphdb, user-generated data (documents) in a document-db and user provided multimedia files (pictures, audio, video) in a blob-store like S3.
It is mainly about looking at the use-cases and making sure that from wherever you need it you might access the "primary" or index key of each store (back and forth). You can encapsulate the actual lookup in your domain or dao layer.
Some frameworks like the Spring Data projects provide some initial kind of cross-store persistence out of the box, mostly integrating JPA with a different NOSQL datastore. For instance Spring Data Graph allows it to store your entities in JPA and add social graphs or other highly interconnected data as a secondary concern and leverage a graphdb for the typical traversal and other graph operations (e.g. ranking, suggestions etc.)
Another term for this is polyglot persistence.
Here are two contrary positions on the question:
Pro:
"Contrary to that, I’m a big fan of polyglot persistence. This simply means using the right storage backend for each of your usecases. For example file storages, SQL, graph databases, data ware houses, in-memory databases, network caches, NoSQL. Today there are mostly two storages used, files and SQL databases. Both are not optimal for every usecase."
http://codemonkeyism.com/nosql-polyglott-persistence/
Con:
"I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency."
http://nosql.mypopescu.com/post/1529816758/why-redis-and-memcached-cassandra-lucene
I'm developing a restaurant application and there will be daily orders for particular guests. because of that daily base of data I thought to use a NoSQL Database e.g. MongoDB to avoid a lot of joins in a relational database (e.g. meal of an order for a particular day of a particular guest ). Other data like guest data (pre name, last name ....) would be stored in a relational database.
What do you think? Is a NoSQL Database a good solution for that type of problem?
thanks
I would stick with a traditional RDBMS - unless this is a project to learn/understand MongoDB/other, a normal RDBMS is going to help you achieve what you want much more easily.
Databases in the style of Mongo offer a number of advantages over traditional RDBMSs, but these advantages are only really in areas such as:
handling/processing immense (web-scale?) quantities of not-particularly-structured data
providing very very quick performance on cheaper hardware
providing easy clustering for maximum uptime
The application you describe on the other hand is unlikely to need near-bulletproof uptime, and is also unlikely to need to process/store massive quantities of data quickly.
Your data sounds very structured with clearly-defined relationships, and even a very busy restaurant is not going to produce the amounts of data that would justify sharding/clustering in the MongoDB style of things.
So, unless you are looking for a project to help you learn MongoDB, I would recommend sticking with a traditional database.
This is more than a weak description of your requirements in order to give any hint.
If your data model fits the options of MongoDB (no JOIN, embedded documents, database references) then give it a go.
http://www.mongodb.org/display/DOCS/Schema+Design
In addition google for "Mongodb Schema Design"...lots of useful slides and blogs coming up.
I'm using both mysql and MongoDB for my app. Let's face it, as hard as I tried, I still needed some type of join queries. In Mongo, it meant making two calls to the DB but because it was so quick, I didn't take a performance hit. I stored my user session information inside MySQL and use Mongo to store user information. What I love about it is the geo-spatial feature.
The more I read/use non-sql databases, the more I love it.
It's so for the OOP world and it's easy to use, like Rails for Frameworks.
I know the disadvantages. The major concern seems to be the no-transaction and no-concurrency part. Am I correct?
Are these the only features making it hard for developers to choose to use non-sql databases entirely, even for transactions?
If these features were fixed, would it be more OK to only use document-based databases for an application?
Cause now it seems like you still have to use a RDBMS for customer billing data while your content could be in document-based databases like MongoDB/CouchDB/Cassandra.
Could someone shed a light on this.
Yes of course you can build entire applications on non-relational data models. As a general rule though most people don't want to do that. The problem is that hierarchical/graph based data models (ie. any model that depends on navigational data structures) significantly increase the complexity and reduce the effectiveness of queries and data integrity in the database. The relational model was invented 40 years ago precisely to overcome those disadvantages inherent in navigation-based databases.
No.
They do not seem to be appropriate for fixed-schema, high-volume, mostly numerical data. Think data warehousing. Think ad-hoc analytical queries. They could take over all (or some of the) areas where RDBMS have not been a good fit in the first place (areas where people came up with XML databases, and object-oriented databases, and graph databases, and so on).
This is just like Excel not being able to replace Word (also admittedly, most Excel files I see these days are more presentation than spreadsheet). Different tools for different tasks.
In short, many NoSql solutions don't have cascading updates so if your application's data schema requires this, you will either update multiple documents (ie columns whatever) programatically, or stick with a sql based solution to handle this.
Concurrency is handled differently for different solutions.
I think this blog does a good job at explaining some of the trade-offs using a NoSQL solution
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
It depends also on the development of new, faster hardware.
You can distribute your database over multiple cheap computers (scaling out) if you use Cassandra and MongoDB. There will always be data sets that are too large for one computer because people collect and keep more data when it is possible to collect and keep more data.
However most data sets fit on one computer and can be stored in a SQL database. It is also possible to scale out a SQL database but foreign keys and complex transactions become slow when you distribute your data over multiple machines.
You have to make some tough choices when you distribute your data over multiple machines: http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html, you don't have to worry about the CAP theorem if all your data is on one machine.
What are the advantages of using NoSQL databases? I've read a lot about them lately, but I'm still unsure why I would want to implement one, and under what circumstances I would want to use one.
Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.
But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.
NoSQL databases have been around for a long time - just the term is new. Some examples are graph, object, column, XML and document databases.
For your 2nd question: Is it okay to use both on the same site?
Why not? Both serves different purposes right?
NoSQL solutions are usually meant to solve a problem that relational databases are either not well suited for, too expensive to use (like Oracle) or require you to implement something that breaks the relational nature of your db anyway.
Advantages are usually specific to your usage, but unless you have some sort of problem modeling your data in a RDBMS I see no reason why you would choose NoSQL.
I myself use MongoDB and Riak for specific problems where a RDBMS is not a viable solution, for all other things I use MySQL (or SQLite for testing).
If you need a NoSQL db you usually know about it, possible reasons are:
client wants 99.999% availability on
a high traffic site.
your data makes
no sense in SQL, you find yourself
doing multiple JOIN queries for
accessing some piece of information.
you are breaking the relational
model, you have CLOBs that store
denormalized data and you generate
external indexes to search that data.
If you don't need a NoSQL solution keep in mind that these solutions weren't meant as replacements for an RDBMS but rather as alternatives where the former fails and more importantly that they are relatively new as such they still have a lot of bugs and missing features.
Oh, and regarding the second question it is perfectly fine to use any technology in conjunction with another, so just to be complete from my experience MongoDB and MySQL work fine together as long as they aren't on the same machine
Martin Fowler has an excellent video which gives a good explanation of NoSQL databases. The link goes straight to his reasons to use them, but the whole video contains good information.
You have large amounts of data - especially if you cannot fit it all on one physical server as NoSQL was designed to scale well.
Object-relational impedance mismatch - Your domain objects do not fit well in a relaitional database schema. NoSQL allows you to persist your data as documents (or graphs) which may map much more closely to your data model.
NoSQL is a database system where data is organized into the document (MongoDB), key-value pair (MemCache, Redis), and graph structure form(Neo4J).
Maybe there are possible questions and answer for "When to go for NoSQL":
Require flexible schema or deal with tree-like data?
Generally, in agile development we start designing systems without knowing all requirements upfront, whereas later on throughout the development database system may need to accommodate frequent design changes, showcasing MVP (Minimal Viable product).
Or you are dealing with a data schema that is dynamic in nature.
e.g. System logs, very precise example is AWS cloudtrail logs.
Data set is vast/big?
Yes NoSQL databases are the better candidate for applications where the database needs to manage millions or even billions of records without compromising performance and availability while may be trading for inconsistency(though modern databases are exception here where it allows tunable consistency over availability e.g. Casandra, Cloud provider databases CosmosDB, DynamoDB).
Trade-off between scaling over consistency
Unlike RDMS, NoSQL databases may make the dataset consistent across other nodes eventually which is the default behavior, but it's easy to scale in terms of performance and availability.
Example: This may be good for storing people who are online in the instant messaging app, API tokens in DB, and logging website traffic stats.
Performing Geolocation Operations:
MongoDB hash rich support for doing GeoQuerying & Geolocation operations. I really loved this feature of MongoDB. So does the PostresSQL but ease of implementation is something that depends on the use case
In nutshell, MongoDB is a great fit for applications where you can store dynamic structured data on a large scale.
Edits:
Updated the answer about the consistency of the database.
Some essential information is missing to answer the question: Which use cases must the database be able to cover? Do complex analyses have to be performed from existing data (OLAP) or does the application have to be able to process many transactions (OLTP)? What is the data structure? That is far from the end of question time.
In my view, it is wrong to make technology decisions on the basis of bold buzzwords without knowing exactly what is behind them. NoSQL is often praised for its scalability. But you also have to know that horizontal scaling (over several nodes) also has its price and is not free. Then you have to deal with issues like eventual consistency and define how to resolve data conflicts if they cannot be resolved at the database level. However, this applies to all distributed database systems.
The joy of the developers with the word "schema less" at NoSQL is at the beginning also very big. This buzzword is quickly disenchanted after technical analysis, because it correctly does not require a schema when writing, but comes into play when reading. That is why it should correctly be "schema on read". It may be tempting to be able to write data at one's own discretion. But how do I deal with the situation if there is existing data but the new version of the application expects a different schema?
The document model (as in MongoDB, for example) is not suitable for data models where there are many relationships between the data. Joins have to be done on application level, which is additional effort and why should I program things that the database should do.
If you make the argument that Google and Amazon have developed their own databases because conventional RDBMS can no longer handle the flood of data, you can only say: You are not Google and Amazon. These companies are the spearhead, some 0.01% of scenarios where traditional databases are no longer suitable, but for the rest of the world they are.
What's not insignificant: SQL has been around for over 40 years and millions of hours of development have gone into large systems such as Oracle or Microsoft SQL. This has to be achieved by some new databases. Sometimes it is also easier to find an SQL admin than someone for MongoDB. Which brings us to the question of maintenance and management. A subject that is not exactly sexy, but that is a part of the technology decision.
Handling A Large Number Of Read Write Operations
Look towards NoSQL databases when you need to scale fast. And when do you generally need to scale fast?
When there are a large number of read-write operations on your website & when dealing with a large amount of data, NoSQL databases fit best in these scenarios. Since they have the ability to add nodes on the fly, they can handle more concurrent traffic & big amount of data with minimal latency.
Flexibility With Data Modeling
The second cue is during the initial phases of development when you are not sure about the data model, the database design, things are expected to change at a rapid pace. NoSQL databases offer us more flexibility.
Eventual Consistency Over Strong Consistency
It’s preferable to pick NoSQL databases when it’s OK for us to give up on Strong consistency and when we do not require transactions.
A good example of this is a social networking website like Twitter. When a tweet of a celebrity blows up and everyone is liking and re-tweeting it from around the world. Does it matter if the count of likes goes up or down a bit for a short while?
The celebrity would definitely not care if instead of the actual 5 million 500 likes, the system shows the like count as 5 million 250 for a short while.
When a large application is deployed on hundreds of servers spread across the globe, the geographically distributed nodes take some time to reach a global consensus.
Until they reach a consensus, the value of the entity is inconsistent. The value of the entity eventually gets consistent after a short while. This is what Eventual Consistency is.
Though the inconsistency does not mean that there is any sort of data loss. It just means that the data takes a short while to travel across the globe via the internet cables under the ocean to reach a global consensus and become consistent.
We experience this behaviour all the time. Especially on YouTube. Often you would see a video with 10 views and 15 likes. How is this even possible?
It’s not. The actual views are already more than the likes. It’s just the count of views is inconsistent and takes a short while to get updated.
Running Data Analytics
NoSQL databases also fit best for data analytics use cases, where we have to deal with an influx of massive amounts of data.
I came across this question while looking for convincing grounds to deviate from RDBMS design.
There is a great post by Julian Brown which sheds lights on constraints of distributed systems. The concept is called Brewer's CAP Theorem which in summary goes:
The three requirements of distributed systems are : Consistency, Availability and Partition tolerance (CAP in short). But you can only have two of them at a time.
And this is how I summarised it for myself:
You better go for NoSQL if Consistency is what you are sacrificing.
I designed and implemented solutions with NoSQL databases and here is my checkpoint list to make the decision to go with SQL or document-oriented NoSQL.
DON'Ts
SQL is not obsolete and remains a better tool in some cases. It's hard to justify use of a document-oriented NoSQL when
Need OLAP/OLTP
It's a small project / simple DB structure
Need ad hoc queries
Can't avoid immediate consistency
Unclear requirements
Lack of experienced developers
DOs
If you don't have those conditions or can mitigate them, then here are 2 reasons where you may benefit from NoSQL:
Need to run at scale
Convenience of development (better integration with your tech stack, no need in ORM, etc.)
More info
In my blog posts I explain the reasons in more details:
7 reasons NOT to NoSQL
2 reasons to NoSQL
Note: the above is applicable to document-oriented NoSQL only. There are other types of NoSQL, which require other considerations.
Ran into this thread and wanted to add my experience.. Many SQL databases support json data in columns and support querying of this json. So what I have used is a hybrid using a relational database with columns containing json..