What does a Graph Database solve that is difficult in a Relational Database? - nosql

I have a client who understands that his data model is a directed acyclic graph. We've been working with collections of nodes and an intermediate table of edges, and the performance has been pretty good. We have less than 100,000 data nodes in the current implementation, although that may grow by one or two orders of magnitude. He's recently become convinced that, since we have a graph, a graph database (like Neo4J or Titan) would be "better."
What problems does a graph-oriented database actually solve that cannot be solved with SQL, or that requires much more heavy lifting from the SQL client? From what I can see, path discovery appears to be it, but that can't be the whole story.

In a relational database, nodes and edges will be related by some value they have in common. Searching for a node or edge will generally involve querying an index for this value.
In a graph database, nodes and edges are related directly by the same sort of internal database structures a relational database uses to maintain the internal structure of an index. So finding an edge from a node or a node from an edge is more like going one level deep in a relational index regardless of the number of nodes; while if you have millions of nodes in a relational database the index would be several levels deep.

actually its true, just Like mongo has "geo spacial" database features built in, so it doesnt have to process as much as if you wanted to do it with mysql. The two databases you mentioned (ive worked with titan) are just better for graphing and it wont be so harsh on your php or db statements.

In short, don't fix it if it's not broken. If the advantage is not clear to you or your client, just weight the cost of migrating against this unclear benefit of a graph database.
Having no experience with the aforementioned graph databases, I can only assume that what you could gain from databases like that is faster development as your database will be a better fit for the type of data that you have. I have been working with MongoDB a lot and the cherry on the cake for me has been development speed due to the simplicity of querying/writing to the database, followed by much richer document structures without having to define any schemas. You also get some amazing features like super simple replication, automatic failover, automatic sharding etc but in your case with just 100k documents you are not likely to think about this kind of problems any time soon. All the mainstream relational databases can run on a small server and perform well with that amount of docs.

Related

NoSQL in a single machine

As part of my university curriculum I ended up with a real project which consists in helping a company shifting from their relational data warehouse into a NoSQL data warehouse. The thing is that what they are looking for is better performance in large jobs but so far they have used a single machine and if they indeed migrate to NoSQL they still wish to keep using a single machine for cost reasons.
As far as I know the whole point of NoSQL is to run it in a large distributed system of several machines. So I don't see the point of this migration, specially since I am pretty sure (but not entirely) that if they do install NoSQL, they will probably end having even worst performance.
But still I am not comfortable telling them this since I am still new to this area (less than a month), so I wonder, is there are any situation where using NoSQL in a single machine for a datawarehouse would be justifiable performance wise? Or is it just a plain bad idea?
The answer to your question, like the answer to so many questions, is "it depends."
Ignoring the commentary on the question, I think there may be legitimacy to your client's question. Both relational and non-relational databases ultimately hold data in key-value tuples, with indexes and such to ensure quick and speedy access to the data. The difference is that SQL/relational databases contain an incredible amount of overhead to attempt the optimal way to retrieve results given an unknown set of queries, as well as ensure stable concurrency. This overhead is both computationally expensive and rarely results in the optimal solution. As a result, SQL databases often perform significantly slower for simple repetitive queries.
No-sql databases, on the other hand, are more of a "bare-bones" database, relying on programmers and intelligent design to achieve success. They are optimized to retrieve a value for a given key very quickly, often sub-millisecond. As a result, increased up front investment in the design results in superior and near-optimal performance. It will be necessary to determine the cost-benefit of doing this up-front design, but it is all but guaranteed that the no-sql approach will perform better regardless of the number of machines involved (in fact, SQL databases are very difficult or impossible to cluster together and is one of the main reasons why NoSql was developed).
Eventually we will see relational-like solutions implemented on a no-sql platform. In fact, Mongo, Elasticsearch, and Couchbase (probably others) already have SQL-like query functionality. But right now, you are faced with this dilemma.
For a single machine if the load is write heavy e.g. your logging a lot of events you could do for cassandra. Also a good alternative is hbase but its heavy and not suggested for single node. If they expose api in json you could look into document based dbs such as couchbase, mongo db. If you have an idea about the load then selecting a nosql data store is much easier
If you're in a position where you need to pick one, I think you should look first at MongoDB. If you've never tried it, I really recommend you visit their live demo with tutorial and give it a try. If you like, download and follow the installation guide on their site. It's free, runs well on a single machine, and is incredibly easy to use.
In addition to MongoDB, I've used Oracle, SQL Server, MySQL, SQLite, and HBase. I understand Cassandra should be in the list but I've not tried it. With MongoDB, I was fully deployed and executing reads and writes from an application in like two hours. I attribute most of that to their website's clear and concise instructional content. The biggest learning curve was figuring out how the queries work for things like updating a record or deleting a record without deleting the entire set of similar records.
Regarding NoSQL vs RDBMS, some points to consider:
Adding a new column to RDBMS table can lock the database in or degrade performance in another
MongoDB is schema-less so adding a new field, does not effect old documents and will be instant (think how flexible that really is - throw any dimension of data into this system without maintenance overhead)
You're less likely to require a DBA to solve your schema problems when an application changes
I think problems related to table size are irrelevant, so you won't run into a scaling problem - just a disk space problem on single machine

Neo4j instead of relational database

I am implementing a sinatra/rails based web portal that might eventually have few many:many relationships between tables/models. This is a one man team and part time but real world app.
I discussed my entity with someone and was advised to try neo4j. Coming from real 'non-sexy' enterprise world, my inclination is to use relational db until it stops scaling or becomes a nightmare because of sharding etc and then think about anything else.
HOWEVER,
I am using postgres for the first time in this project along with datamapper and its taking me time to get started very fast
I am just trying out few things and building more use cases so I consitently have to update my schema (prototyping idea and feedback from beta) . I wont have to do this in neo4j (except changing my queries)
Seems like its very easy to setup search using neo4j . But Postgres can do full text search as well.
Postgres recently announced support for json and javascript. Wondering if I should just stick with PG and invest more time learning PG (which has a good community) instead neo4j.
Looking for usecases where neo4j is better, especially at protyping/initial phase of a project. I understand if the website grows I might end up having multiple persistent technologies like s3, relational (PG), mongo etc.
Also it would be good to know how it plays out with Rails/Ruby ecosystem.
Update1:
I got a lot of good answers and seems like the right thing to do is stick with Postgres for now (especially since I deploy to heroku)
However the idea of being schema-less is tempting. Basically I am thinking of a approach where you don't define a datamodel until you have say 100-150 users and you have yourself figured out a good schema (business use cases) for your product , while you are just demoing the concept and getting feedback with limited signups. Then one can decide a schema and start with relational.
Would be nice to know if there are easy to use schema/less persistence option (based on ease to use/setup for new user) that might give up say scaling etc.
Graph databases should be considered if you have a really chaotic data model. They were needed to express highly complex relationships between entities. To do that, they store relationships at the data level whereas RDBMS use a declarative approach. Storing relationships only makes sense if these relationships are very different, otherwise you'll just end up duplicating data over and over, taking a lot of space for nothing.
To require such variety in relationships you'd have to handle huge amount of data. This is where graph databases shines because instand of doing tons of joins, they just pick a record and follow his relationships. To support my statement : you'll notice that every use cases on Neo4j's website are dealing with very complex data.
In brief, if you don't feel concerned with what I said above, I think you should use another technology. If this is just about scaling, schemalessness or starting fast a project, then look at other NoSQL solutions (more specifically, either column or document oriented databases). Otherwise you should stick with PostgreSQL. You could also, like you said, consider polyglot persistence,
About your update, you might consider hStore. I think it fits your requirements. It's a PostgreSQL module which also works on Heroku.
I don't think I agree that you should only use a graph database when your data model is very complex. I'm sure they could handle a simple data model/relationships as well.
If you have no prior experience with Neo4j or Postgres, then most likely both with take quite a bit of time to learn well.
Some things to keep in mind when picking:
It's not just about development against a database technology. You should consider deployment as well. How easy is it to deploy and scale Postgres/Neo4j?
Consider the community and tools around each technology. Is there a data mapper for Neo4j like there is for Postgres?
Consider that the data models are considerably different between the two. If you can already think relationally, then I'd probably stick with Postgres. If you go with Neo4j you're going to be making a lot of mistakes for several months with your data models.
Over time I've learned to keep it simple when I can. Postgres might be the boring choice compared to Neo4j, but boring doesn't keep you up at night. =)
Also I never see anyone mention it, but you should look at Riak (http://basho.com/riak/) too. It's a document database that also provides relationships (links) between objects. Not as mature as a graph database, but it can connect a few entities quickly.
The most appropriate choice depends on what problem you are trying to solve.
If you just have a few many to many tables, a relational database can be fine. In general, there is better OR-mapper support for relational databases, as they are much older and have a standardized interface and row-column structure. They also have been improved on for a long time, so they are stable and optimized for what they are doing.
A graph database is better if e.g. your problem is more about the connections between entities, especially if you need higher distance connections, like "detect cycles (of unspecified length)", some "what do friends-of-a-friend like". Things like that get unwieldy when restricted to SQL joins. A problem specific language like cypher in case of Neo4j makes that much more concise. On the downside, there are mappers between graph dbs and objects, but not for every framework and language under the sun.
I recently implemented a system prototype using neo4j and it was very useful to be able to talk about the structure and connections of our data and be able to model that one to one in the data storage. Also, adding other connections between data points was easy, neo4j being a schemaless storage. We ended up switching to mongodb due to troubles with write performance, but I don't think we could have finished the prototype with that in the same time.
Other NoSQL datastores like document based, column, key-value also cover specific usecases. Polyglot persistence is definitively something to look at, so keep your choice of backend reasonably separated from your business logic, to allow you to change your technology later if you learned something new.

MongoDB vs. Cassandra vs. MySQL for real-time advertising platform

I'm working on a real-time advertising platform with a heavy emphasis on performance. I've always developed with MySQL, but I'm open to trying something new like MongoDB or Cassandra if significant speed gains can be achieved. I've been reading about both all day, but since both are being rapidly developed, a lot of the information appears somewhat dated.
The main data stored would be entries for each click, incremented rows for views, and information for each campaign (just some basic settings, etc). The speed gains need to be found in inserting clicks, updating view totals, and generating real-time statistic reports. The platform is developed with PHP.
Or maybe none of these?
There are several ways to achieve this with all of the technologies listed. It is more a question of how you use them. Your ideal solution may use a combination of these, with some consideration for usage patterns. I don't feel that the information out there is that dated because the concepts at play are very fundamental. There may be new NoSQL databases and fixes to existing ones, but your question is primarily architectural.
NoSQL solutions like MongoDB and Cassandra get a lot of attention for their insert performance. People tend to complain about the update/insert performance of relational databases but there are ways to mitigate these issues.
Starting with MySQL you could review O'Reilly's High Performance MySQL, optimise the schema, add more memory perhaps run this on different hardware from the rest of your app (assuming you used MySQL for that), or partition/shard data. Another area to consider is your application. Can you queue inserts and updates at the application level before insertion into the database? This will give you some flexibility and is probably useful in all cases. Depending on how your final schema looks, MySQL will give you some help with extracting the data as long as you are comfortable with SQL. This is a benefit if you need to use 3rd party reporting tools etc.
MongoDB and Cassandra are different beasts. My understanding is that it was easier to add nodes to the latter but this has changed since MongoDB has replication etc built-in. Inserts for both of these platforms are not constrained in the same manner as a relational database. Pulling data out is pretty quick too, and you have a lot of flexibility with data format changes. The tradeoff is that you can't use SQL (a benefit for some) so getting reports out may be trickier. There is nothing to stop you from collecting data in one of these platforms and then importing it into a MySQL database for further analysis.
Based on your requirements there are tools other than NoSQL databases which you should look at such as Flume. These make use of the Hadoop platform which is used extensively for analytics. These may have more flexibility than a database for what you are doing. There is some content from Hadoop World that you might be interested in.
Characteristics of MySQL:
Database locking (MUCH easier for financial transactions)
Consistency/security (as above, you can guarantee that, for instance, no changes happen between the time you read a bank account balance and you update it).
Data organization/refactoring (you can have disorganized data anywhere, but MySQL is better with tables that represent "types" or "components" and then combining them into queries -- this is called normalization).
MySQL (and relational databases) are more well suited for arbitrary datasets and requirements common in AGILE software projects.
Characteristics of Cassandra:
Speed: For simple retrieval of large documents. However, it will require multiple queries for highly relational data – and "by default" these queries may not be consistent (and the dataset can change between these queries).
Availability: The opposite of "consistency". Data is always available, regardless of being 100% "correct".[1]
Optional fields (wide columns): This CAN be done in MySQL with meta tables etc., but it's for-free and by-default in Cassandra.
Cassandra is key-value or document-based storage. Think about what that means. TYPICALLY I give Cassandra ONE KEY and I get back ONE DATASET. It can branch out from there, but that's basically what's going on. It's more like accessing a static file. Sure, you can have multiple indexes, counter fields etc. but I'm making a generalization. That's where Cassandra is coming from.
MySQL and SQL is based on group/set theory -- it has a way to combine ANY relationship between data sets. It's pretty easy to take a MySQL query, make the query a "key" and the response a "value" and store it into Cassandra (e.g. make Cassandra a cache). That might help explain the trade-off too, MySQL allows you to always rearrange your data tables and the relationships between datasets simply by writing a different query. Cassandra not so much. And know that while Cassandra might PROVIDE features to do some of this stuff, it's not what it was built for.
MongoDB and CouchDB fit somewhere in the middle of those two extremes. I think MySQL can be a bit verbose[2] and annoying to deal with especially when dealing with optional fields, and migrations if you don't have a good model or tools. Also with scalability, I'm sure there are great technologies for scaling a MySQL database, but Cassandra will always scale, and easily, due to limitations on its feature set. MySQL is a bit more unbounded. However, NoSQL and Cassandra do not do joins, one of the critical features of SQL that allows one to combine multiple tables in a single query. So, complex relational queries will not scale in Cassandra.
[1] Consistency vs. availability is a trade-off within large distributed dataset. It takes a while to make all nodes aware of new data, and eg. Cassandra opts to answer quickly and not to check with every single node before replying. This can causes weird edge cases when you base you writes off previously read data and overwriting data. For more information look into the CAP Theorem, ACID database (in particular Atomicity) as well as Idempotent database operations. MySQL has this issue too, but the idea of high availability over correctness is very baked into Cassandra and gives it many of its scalability and speed advantages.
[2] SQL being "verbose" isn't a great reason to not use it – plus most of us aren't going to (and shouldn't) write plain-text SQL statements.
Nosql solutions are better than Mysql, postgresql and other rdbms techs for this task. Don't waste your time with Hbase/Hadoop, you've to be an astronaut to use it. I recommend MongoDB and Cassandra. Mongo is better for small datasets (if your data is maximum 10 times bigger than your ram, otherwise you have to shard, need more machines and use replica sets). For big data; cassandra is the best. Mongodb has more query options and other functionalities than cassandra but you need 64 bit machines for mongo. There are some works around for analytics in both sides. There is atomic counters in both sides. Both can scale well but cassandra is much better in scaling and high availability. Both have php clients, both have good support and community (mongo community is bigger).
Cassandra analytics project sample:Rainbird http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
mongo sample: http://www.slideshare.net/jrosoff/scalable-event-analytics-with-mongodb-ruby-on-rails
http://axonflux.com/how-superfeedr-built-analytics-using-mongodb
doubleclick developers developed mongo http://www.informationweek.com/news/software/info_management/224200878
Cassandra vs. MongoDB
Are you considering Cassandra or MongoDB as the data store for your next project? Would you like to compare the two databases? Cassandra and MongoDB are both “NoSQL” databases, but the reality is that they are very different. They have very different strengths and value propositions – so any comparison has to be a nuanced one. Let’s start with initial requirements… Neither of these databases replaces RDBMS, nor are they “ACID” databases. So If you have a transactional workload where normalization and consistency are the primary requirements, neither of these databases will work for you. You are better off sticking with traditional relational databases like MySQL, PostGres, Oracle etc. Now that we have relational databases out of the way, let’s consider the major differences between Cassandra and MongoDB that will help you make the decision. In this post, I am not going to discuss specific features but will point out some high-level strategic differences to help you make your choice.
Expressive Object Model
MongoDB supports a rich and expressive object model. Objects can have properties and objects can be nested in one another (for multiple levels). This model is very “object-oriented” and can easily represent any object structure in your domain. You can also index the property of any object at any level of the hierarchy – this is strikingly powerful! Cassandra, on the other hand, offers a fairly traditional table structure with rows and columns. Data is more structured and each column has a specific type which can be specified during creation.
Verdict: If your problem domain needs a rich data model then MongoDB is a better fit for you.
Secondary Indexes
Secondary indexes are a first-class construct in MongoDB. This makes it easy to index any property of an object stored in MongoDB even if it is nested. This makes it really easy to query based on these secondary indexes. Cassandra has only cursory support for secondary indexes. Secondary indexes are also limited to single columns and equality comparisons. If you are mostly going to be querying by the primary key then Cassandra will work well for you.
Verdict: If your application needs secondary indexes and needs flexibility in the query model then MongoDB is a better fit for you.
High Availability
MongoDB supports a “single master” model. This means you have a master node and a number of slave nodes. In case the master goes down, one of the slaves is elected as master. This process happens automatically but it takes time, usually 10-40 seconds. During this time of new leader election, your replica set is down and cannot take writes. This works for most applications but ultimately depends on your needs. Cassandra supports a “multiple master” model. The loss of a single node does not affect the ability of the cluster to take writes – so you can achieve 100% uptime for writes.
Verdict: If you need 100% uptime Cassandra is a better fit for you.
Write Scalability
MongoDB with its “single master” model can take writes only on the primary. The secondary servers can only be used for reads. So essentially if you have three node replica set, only the master is taking writes and the other two nodes are only used for reads. This greatly limits write scalability. You can deploy multiple shards but essentially only 1/3 of your data nodes can take writes. Cassandra with its “multiple master” model can take writes on any server. Essentially your write scalability is limited by the number of servers you have in the cluster. The more servers you have in the cluster, the better it will scale.
Verdict: If write scalability is your thing, Cassandra is a better fit for you.
Query Language Support
Cassandra supports the CQL query language which is very similar to SQL. If you already have a team of data analysts they will be able to port over a majority of their SQL skills which is very important to large organizations. However CQL is not full blown ANSI SQL – It has several limitations (No join support, no OR clauses) etc. MongoDB at this point has no support for a query language. The queries are structured as JSON fragments.
Verdict: If you need query language support, Cassandra is the better fit for you.
Performance Benchmarks
Let’s talk performance. At this point, you are probably expecting a performance benchmark comparison of the databases. I have deliberately not included performance benchmarks in the comparison. In any comparison, we have to make sure we are making an apples-to-apples comparison.
Database model - The database model/schema of the application being tested makes a big difference. Some schemas are well suited for MongoDB and some are well suited for Cassandra. So when comparing databases it is important to use a model that works reasonably well for both databases.
Load characteristics – The characteristics of the benchmark load are very important. E.g. In write-heavy benchmarks, I would expect Cassandra to smoke MongoDB. However, in read-heavy benchmarks, MongoDB and Cassandra should be similar in performance.
Consistency requirements - This is a tricky one. You need to make sure that the read/write consistency requirements specified are identical in both databases and not biased towards one participant. Very often in a number of the ‘Marketing’ benchmarks, the knobs are tuned to disadvantage the other side. So, pay close attention to the consistency settings.
One last thing to keep in mind is that the benchmark load may or may not reflect the performance of your application. So in order for benchmarks to be useful, it is very important to find a benchmark load that reflects the performance characteristics of your application. Here are some benchmarks you might want to look at:
- NoSQL Performance Benchmarks
- Cassandra vs. MongoDB vs. Couchbase vs. HBase
Ease of Use
If you had asked this question a couple of years ago MongoDB would be the hands-down winner. It’s a fairly simple task to get MongoDB up and running. In the last couple of years, however, Cassandra has made great strides in this aspect of the product. With the adoption of CQL as the primary interface for Cassandra, it has taken this a step further – they have made it very simple for legions of SQL programmers to use Cassandra very easily.
Verdict: Both are fairly easy to use and ramp up.
Native Aggregation
MongoDB has a built-in Aggregation framework to run an ETL pipeline to transform the data stored in the database. This is great for small to medium jobs but as your data processing needs become more complicated the aggregation framework becomes difficult to debug. Cassandra does not have a built-in aggregation framework. External tools like Hadoop, Spark are used for this.
Schema-less Models
In MongoDB, you can choose to not enforce any schema on your documents. While this was the default in prior versions in the newer version you have the option to enforce a schema for your documents. Each document in MongoDB can be a different structure and it is up to your application to interpret the data. While this is not relevant to most applications, in some cases the extra flexibility is important. Cassandra in the newer versions (with CQL as the default language) provides static typing. You need to define the type of very column upfront.
I'd also like to add Membase (www.couchbase.com) to this list.
As a product, Membase has been deployed at a number of Ad Agencies (AOL Advertising, Chango, Delta Projects, etc). There are a number of public case studies and examples of how these companies have used Membase successfully.
While it's certainly up for debate, we've found that Membase provides better performance and scalability than any other solution. What we lack in indexing/querying, we are planning on more than making up for with the integration of CouchDB as our new persistence backend.
As a company, Couchbase (the makers of Membase) has a large amount of knowledge and experience specifically serving the needs of Ad/targeting companies.
Would certainly love to engage with you on this particular use case to see if Membase is the right fit.
Please shoot me an email (perry -at- couchbase -dot- com) or visit us on the forums: http://www.couchbase.org/forums/
Perry Krug
I would look at New Relic as an example of a similar workload. They capture over 200 Billion data points a day to disk and are using MySQL 5.6 (Percona) as a backend.
A blog post is available here:
http://blog.newrelic.com/2014/06/13/store-200-billion-data-points-day-disk/

When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site?

What are the advantages of using NoSQL databases? I've read a lot about them lately, but I'm still unsure why I would want to implement one, and under what circumstances I would want to use one.
Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.
But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.
NoSQL databases have been around for a long time - just the term is new. Some examples are graph, object, column, XML and document databases.
For your 2nd question: Is it okay to use both on the same site?
Why not? Both serves different purposes right?
NoSQL solutions are usually meant to solve a problem that relational databases are either not well suited for, too expensive to use (like Oracle) or require you to implement something that breaks the relational nature of your db anyway.
Advantages are usually specific to your usage, but unless you have some sort of problem modeling your data in a RDBMS I see no reason why you would choose NoSQL.
I myself use MongoDB and Riak for specific problems where a RDBMS is not a viable solution, for all other things I use MySQL (or SQLite for testing).
If you need a NoSQL db you usually know about it, possible reasons are:
client wants 99.999% availability on
a high traffic site.
your data makes
no sense in SQL, you find yourself
doing multiple JOIN queries for
accessing some piece of information.
you are breaking the relational
model, you have CLOBs that store
denormalized data and you generate
external indexes to search that data.
If you don't need a NoSQL solution keep in mind that these solutions weren't meant as replacements for an RDBMS but rather as alternatives where the former fails and more importantly that they are relatively new as such they still have a lot of bugs and missing features.
Oh, and regarding the second question it is perfectly fine to use any technology in conjunction with another, so just to be complete from my experience MongoDB and MySQL work fine together as long as they aren't on the same machine
Martin Fowler has an excellent video which gives a good explanation of NoSQL databases. The link goes straight to his reasons to use them, but the whole video contains good information.
You have large amounts of data - especially if you cannot fit it all on one physical server as NoSQL was designed to scale well.
Object-relational impedance mismatch - Your domain objects do not fit well in a relaitional database schema. NoSQL allows you to persist your data as documents (or graphs) which may map much more closely to your data model.
NoSQL is a database system where data is organized into the document (MongoDB), key-value pair (MemCache, Redis), and graph structure form(Neo4J).
Maybe there are possible questions and answer for "When to go for NoSQL":
Require flexible schema or deal with tree-like data?
Generally, in agile development we start designing systems without knowing all requirements upfront, whereas later on throughout the development database system may need to accommodate frequent design changes, showcasing MVP (Minimal Viable product).
Or you are dealing with a data schema that is dynamic in nature.
e.g. System logs, very precise example is AWS cloudtrail logs.
Data set is vast/big?
Yes NoSQL databases are the better candidate for applications where the database needs to manage millions or even billions of records without compromising performance and availability while may be trading for inconsistency(though modern databases are exception here where it allows tunable consistency over availability e.g. Casandra, Cloud provider databases CosmosDB, DynamoDB).
Trade-off between scaling over consistency
Unlike RDMS, NoSQL databases may make the dataset consistent across other nodes eventually which is the default behavior, but it's easy to scale in terms of performance and availability.
Example: This may be good for storing people who are online in the instant messaging app, API tokens in DB, and logging website traffic stats.
Performing Geolocation Operations:
MongoDB hash rich support for doing GeoQuerying & Geolocation operations. I really loved this feature of MongoDB. So does the PostresSQL but ease of implementation is something that depends on the use case
In nutshell, MongoDB is a great fit for applications where you can store dynamic structured data on a large scale.
Edits:
Updated the answer about the consistency of the database.
Some essential information is missing to answer the question: Which use cases must the database be able to cover? Do complex analyses have to be performed from existing data (OLAP) or does the application have to be able to process many transactions (OLTP)? What is the data structure? That is far from the end of question time.
In my view, it is wrong to make technology decisions on the basis of bold buzzwords without knowing exactly what is behind them. NoSQL is often praised for its scalability. But you also have to know that horizontal scaling (over several nodes) also has its price and is not free. Then you have to deal with issues like eventual consistency and define how to resolve data conflicts if they cannot be resolved at the database level. However, this applies to all distributed database systems.
The joy of the developers with the word "schema less" at NoSQL is at the beginning also very big. This buzzword is quickly disenchanted after technical analysis, because it correctly does not require a schema when writing, but comes into play when reading. That is why it should correctly be "schema on read". It may be tempting to be able to write data at one's own discretion. But how do I deal with the situation if there is existing data but the new version of the application expects a different schema?
The document model (as in MongoDB, for example) is not suitable for data models where there are many relationships between the data. Joins have to be done on application level, which is additional effort and why should I program things that the database should do.
If you make the argument that Google and Amazon have developed their own databases because conventional RDBMS can no longer handle the flood of data, you can only say: You are not Google and Amazon. These companies are the spearhead, some 0.01% of scenarios where traditional databases are no longer suitable, but for the rest of the world they are.
What's not insignificant: SQL has been around for over 40 years and millions of hours of development have gone into large systems such as Oracle or Microsoft SQL. This has to be achieved by some new databases. Sometimes it is also easier to find an SQL admin than someone for MongoDB. Which brings us to the question of maintenance and management. A subject that is not exactly sexy, but that is a part of the technology decision.
Handling A Large Number Of Read Write Operations
Look towards NoSQL databases when you need to scale fast. And when do you generally need to scale fast?
When there are a large number of read-write operations on your website & when dealing with a large amount of data, NoSQL databases fit best in these scenarios. Since they have the ability to add nodes on the fly, they can handle more concurrent traffic & big amount of data with minimal latency.
Flexibility With Data Modeling
The second cue is during the initial phases of development when you are not sure about the data model, the database design, things are expected to change at a rapid pace. NoSQL databases offer us more flexibility.
Eventual Consistency Over Strong Consistency
It’s preferable to pick NoSQL databases when it’s OK for us to give up on Strong consistency and when we do not require transactions.
A good example of this is a social networking website like Twitter. When a tweet of a celebrity blows up and everyone is liking and re-tweeting it from around the world. Does it matter if the count of likes goes up or down a bit for a short while?
The celebrity would definitely not care if instead of the actual 5 million 500 likes, the system shows the like count as 5 million 250 for a short while.
When a large application is deployed on hundreds of servers spread across the globe, the geographically distributed nodes take some time to reach a global consensus.
Until they reach a consensus, the value of the entity is inconsistent. The value of the entity eventually gets consistent after a short while. This is what Eventual Consistency is.
Though the inconsistency does not mean that there is any sort of data loss. It just means that the data takes a short while to travel across the globe via the internet cables under the ocean to reach a global consensus and become consistent.
We experience this behaviour all the time. Especially on YouTube. Often you would see a video with 10 views and 15 likes. How is this even possible?
It’s not. The actual views are already more than the likes. It’s just the count of views is inconsistent and takes a short while to get updated.
Running Data Analytics
NoSQL databases also fit best for data analytics use cases, where we have to deal with an influx of massive amounts of data.
I came across this question while looking for convincing grounds to deviate from RDBMS design.
There is a great post by Julian Brown which sheds lights on constraints of distributed systems. The concept is called Brewer's CAP Theorem which in summary goes:
The three requirements of distributed systems are : Consistency, Availability and Partition tolerance (CAP in short). But you can only have two of them at a time.
And this is how I summarised it for myself:
You better go for NoSQL if Consistency is what you are sacrificing.
I designed and implemented solutions with NoSQL databases and here is my checkpoint list to make the decision to go with SQL or document-oriented NoSQL.
DON'Ts
SQL is not obsolete and remains a better tool in some cases. It's hard to justify use of a document-oriented NoSQL when
Need OLAP/OLTP
It's a small project / simple DB structure
Need ad hoc queries
Can't avoid immediate consistency
Unclear requirements
Lack of experienced developers
DOs
If you don't have those conditions or can mitigate them, then here are 2 reasons where you may benefit from NoSQL:
Need to run at scale
Convenience of development (better integration with your tech stack, no need in ORM, etc.)
More info
In my blog posts I explain the reasons in more details:
7 reasons NOT to NoSQL
2 reasons to NoSQL
Note: the above is applicable to document-oriented NoSQL only. There are other types of NoSQL, which require other considerations.
Ran into this thread and wanted to add my experience.. Many SQL databases support json data in columns and support querying of this json. So what I have used is a hybrid using a relational database with columns containing json..

Are document-oriented databases meant to replace relational databases?

Recently I've been working a little with MongoDB and I have to say I really like it. However it is a completely different type of database then I am used. I've noticed that it is most definitely better for certain types of data, however for heavily normalized databases it might not be the best choice.
It appears to me however that it can completely take the place of just about any relational database you may have and in most cases perform better, which is mind boggling. This leads me to ask a few questions:
Are document-oriented databases being developed to be the next generation of databases and basically replace relational databases completely?
Is it possible that projects would be better off using both a document-oriented database and a relational database side by side for various data which is better suited for one or the other?
If document-oriented databases are not meant to replace relational databases, then does anyone have an example of a database structure which would absolutely be better off in a relational database (or vice-versa)?
Are document-oriented databases have been developed to be the next generation of databases and basically replace relational databases completely?
No. Document-oriented databases (like MongoDB) are very good at the type of tasks that we typically see in modern web sites (fast look-ups of individual items or small sets of items).
But they make some big trade-offs with relational systems. Without things like ACID compliance they're not going to be able to replace certain RDBMS. And if you look at systems like MongoDB, the lack of ACID compliance is a big reason it's so fast.
Is it possible that projects would be better off using both a document-oriented database and a relational database side by side for various data which is better suited for one or the other?
Yes. In fact, I'm running a very large production web-site that uses both. The system was started in MySQL, but we've migrated part of it over to MongoDB, b/c we need a Key-Value store and MySQL just isn't very good at finding one item in a 150M records.
If document-oriented databases are not meant to replace relational databases, then does anyone have an example of a database structure which would absolutely be better off in a relational database (or vice-versa)?
Document-oriented databases are great storing data that is easily contained in "key-value" and simple, linear "parent-child" relationships. Simple examples here are things like Blogs and Wikis.
However, relational databases still have a strong leg up on things like reporting, which tends to be "set-based".
Honestly, I can see a world where most data is "handled" by Document-oriented database, but where the reporting is done in a relational database that is updated by Map-reduce jobs.
This is really a question of fitness for purpose.
If you want to be able to join some tables together and return a filtered set of results, you can only do that with a relational database. If you want mind-bending performance and have incredible volumes of data, that's when column-family or document-oriented databases come into their own.
This is a classic trade-off. Relational databases offer you a whole suite of features, which comes with a performance cost. If you couldn't join, index, scan or perform a whole other list of features, you remove the need to have any view over ALL data, which gives you the performance and distribution you need to crunch serious data.
Also, I recommend you follow the blogs of Ayende Rahien on this topic.
http://ayende.com/blog/
#Sohnee is spot on. I might add that relational databases
are excellent for retrieving information in unexpected combinations -- even if that occasionally leads to the Bad Idea of extensive reports being run on time-sensitive production systems rather than on a separate data warehouse.
are a mature technology where you can easily find staff and well tested solutions to any number of problems (including the limitations of the relational model, as well as the imperfect implementation that is SQL).
Ask yourself what you want to do, and what qualities are important to you. You can do everything programming related in shell scripts. Do you want to?
I keep asking the same question, which is what landed me here. I use both MySQL and MongoDB (not in tandem currently, though its an idea). I have to honestly say I'm very happy to never touch MySQL again. Sure there's the "ACID" compliance, but have you ever run into the need to repair your tables with MySQL? Have you ever had a corrupted database? It happens. Have you ever had any other issues with MySQL? Any lock contentions or dead locks? Any problems with clustering? How easy was it to setup and configure?
MongoDB...You turn it on and it's done....Then it's autosharding. It's incredibly simple and it's also incredibly fast. So think about that. Your time.
No, they don't have JOINs but it's a completely incorrect statement to say that it discounts more than 99% of data management needs. I often get opposition when trying to explain MongoDB, people even snickering. Let's just face it. People don't want to learn new things and they think that what they know is all they need. Sure, you can get away using MySQL the rest of your life and build your web sites. It works, we know it works. We also know it fails. If it didn't, you'd never ask the question and we probably wouldn't see so many document oriented databases. We know that yes it does scale but it's a pain in the rear to scale it.
Also let's eliminate traffic and scaling from the picture. Take out setup. Now let's focus on use. What is your experience when using MySQL? How good are you with MySQL architecture and making efficient queries? How much time do you spend looking over queries with EXPLAIN? How much time do you spend making schema diagrams? ... I say take that time back. It's better spent elsewhere.
That's my two cents. I really do love MongoDB and hope to never use MySQL again and for the type of web sites I build, it's very possible that I won't need to. Though I'm still trying to find out WHEN I would want to use MySQL over MongoDB, not when I CAN (let's face it, it stores data, congratulations, I could write a ton of XML files too but it's not a good idea), but when it would BENEFIT to use one or the other. In the meantime, I'm going to go do my job with MongoDB and have less headaches.
As long as you don't need multi-object transactions, MongoDB can be a favorable replacement for an RDMBS, especially in a web application context. Speed, schemalessness, and document modeling are all helpful this domain.
In my opinion document-oriented databases are only good for
Databases which data is better represented using a hierarchical (tree) model. This is not common for website databases.
Databases with huge amount of data like the Facebook and Amazon databases. In this case it is required to sacrifice the benefits of the relational model.
MongoDB main characteristics are
document model (JSON)
high level(close to real world object), less collections
sharding (optional)
programmer friendly
drivers, same data structures arrays/hash maps
Document databases
A document is more general than a table, its far easier to represent a table with a JSON than storing JSON to a table.
So yes document databases could replace table databases.
Sharding
Joins in sharded collections are expensive for any database.
MongoDB added $lookup years now, and in MongoDB 5.1+ it can be used even when both collections are sharded.
But looks like joins in distributed databases are slow, and should be avoided, so relational way of modelling should be avoided.
No sharding
I think when sharding isn't used, MongoDB will co-exists and overlap with relational databases(especially after ACID support and $lookup support), to replace them its hard, and doesn't look like goal of MongoDB right now.
So overal looks like MongoDB could do what relational databases do,
but for now its not a replacement.
The opposite isn't true, relational databases have much bigger problems if they try to behave like MongoDB
AFAIK, document databases don't have JOIN. That's pretty much a show-stopper for > 99% of data management needs.
As Matthew Flaschen points out in the comments, even on the desktop, databases such as SQLite are introducing SQL semantics to areas that have traditionally used propriety file formats or XML.