Smart way to evaluate what is the right NoSQL database for me? - mongodb

There appears to be a myriad of NoSQL databases available these days:
CouchDB
MongoDB
Cassandra
Hadoop
There's also a boundary between these tools and tools such as Redis that work as a memcached replacement.
Without hand waving and throwing too many buzz words - my question is the following:
How does one intelligently decide which tool here makes the most sense for their project? Are the projects similar enough to where the answer to this is subjective, eg: Ruby is better than Python or Python is better than Ruby? Or are we talking Apples and oranges here in that they each of them solve different problems?
What's the best way to educate myself on this new trend?

Perhaps one way to think of it is, programming has recently evolved from using one general-purpose language for everything to using the general-purpose language for most things, plus domain-specific languages for the more appropriate parts. For example, you might use Lua to script artificial intelligence of a character in a game.
NoSQL databases might be similar. SQL is the general purpose database with the longest and broadest adoption. While it could be shoehorned to serve many tasks, programmers are beginning to use NoSQL as a domain-specific database when it is more appropriate.

I would argue, that the 4 major players you named do have quite different featuresets and try to solve different problems with different priority.
For instance, as far as i know Cassandra (and i assume Hadoop) central focus is on large scale installations.
MongoDb tries to be a better scaling alternative to classic SQL servers in providing comparably powerful query functions.
CouchDB's focus is comparably small scale (will not shard at all, "only" replicate), high durability and easy synchronization of data.
You might want to check out http://nosql-database.org/ for some more information.
I am facing pretty much the same problem as you, and i would say there is no real alternative to look at all solutions in detail.

Check out this site: http://cattell.net/datastores/ and in particular the PDF linked at the bottom (CACM Paper). The latter contains an excellent discussion of the relative merits of various data store solutions.

It's easy. NoSQL databases are ACID compliant databases minus some guarantees. So just decide which guarantees you can do without and find the database that fits. If you don't need durability for example, maybe redis is best. Or if you don't need multi-record transactions, then perhaps look into mongodb.

Related

Why would you use NoSQL to build posts/comments over relational? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

rewrite website architecture using a nosql database

i want to rewrite an existing website, for a client, that has 100000+ visitors a day and i am considering using Cassandra db, Couch Db or Mongo Db instead of using Mysql and couple it with Solr.
what i want to ask is if it is a good idea to switch to nosql for a website that sits on a single server(would not use for now multiple nodes)?
what problems that may arise on the long term. I am a little afraid of using nosql because these db`s are relatively young. But considering the speed gain for queries makes it really attractive.
i am using php as the backend programming language.
Thanks
Although the platforms you mention are very young compared to SQL, they have now been around long enough that they are somewhat mature and you don't risk much by using them instead of SQL if they fit what you are trying to do.
However, in this case it may be better to stick with SQL - you already have all the code working well with SQL, and you can get most of the performance improvements you need by adding a search engine or cache component rather than rewriting the entire system.
If the rewrite is something you were planning to do anyway, you can use any datastore you want - just pick the one where the standard datamodel is closest to your data and the queries you need to support.
I suspect the most difficult thing will be to transform your data model for nosql DB. There will be no JOIN, and 'workarounds' for joins are not that straightforward in nosql databases.
Also, performance is not guaranteed out of the box, you will have to work hard to achieve it. Nosql databases have relaxed constraints on your data, which in turn provides developers with more options on how to work with that data; which in turn enables higher-performance solutions.
Many nosql DBs are still quite young. They may be used in many successful projects, but yet, in general they are not as reliable as popular relational DBs. Of course, it is unlikely for them to fail in a big way, but the likelihood of small bugs here and there is higher.
Perhaps the most well known failure associated with nosql was foursquare's mongodb outage. But it doesn't look that big of a deal to me.

MongoDB for personal non-distributed work

This might be answered here (or elsewhere) before but I keep getting mixed/no views on the internet.
I have never used anything else except SQL like databases and then I came across NoSQL DBs (mongoDB, specifically). I tried my hands on it. I was doing it just for fun, but everywhere the talk is that it is really great when you are using it across distributed servers. So I wonder, if it is any helpful(in a non-trivial way) for doing small projects and things mainly only on a personal computer? Are there some real advantages when there is just one server.
Although it would be cool to use MapReduce (and talk about it to peers :d) won't it be an overkill when used for small projects run on single servers? Or are there other advantages of this? I need some clear thought. Sorry if I sounded naive here.
Optional: Some examples where/how you have used would be great.
Thanks.
IMHO, MongoDB is perfectly valid for use for single server/small projects and it's not a pre-requisite that you should only use it for "big data" or multi server projects.
If MongoDB solves a particular requirement, it doesn't matter on the scale of the project so don't let that aspect sway you. Using MapReduce may be a bit overkill/not the best approach if you truly have low volume data and just want to do some basic aggregations - these could be done using the group operator (which currently has some limitations with regard to how much data it can return).
So I guess what I'm saying in general is, use the right tool for the job. There's nothing wrong with using MongoDB on small projects/single PC. If a RDBMS like SQL Server provides a better fit for your project then use that. If a NoSQL technology like MongoDB fits, then use that.
+1 on AdaTheDev - but there are 3 more things to note here:
Durability: From version 1.8 onwards, MongoDB has single server durability when started with --journal, so now it's more applicable to single-server scenarios
Choosing a NoSQL DB over say an RDBMS shouldn't be decided upon the single or multi server setting, but based on the modelling of the database. See for example 1 and 2 - it's easy to store comment-like structures in MongoDB.
MapReduce: again, it depends on the data modelling and the operation/calculation that needs to occur. Depending on the way you model your data you may or may not need to use MapReduce.

When NOT to use NoSQL?

There was an article on Hacker News a couple of days ago that reached first page titled something like
"2 cases when not to use Mongodb" but I really can't find it anymore...
Does anyone know where I can find the above described article?
What cases are there when NoSQL fails?
We use MongoDB for storing tons and tons of analytics data for which we don't care if some stuff occasionally gets lost in a server crash. The data really fits MongoDB well and it would have been a nightmare if we were to use an SQL database for this. But for bank transactions we wouldn't even consider MongoDB.
The write lock might be a problem for some people. On the other hand MongoDB supports easy sharding, much easier than with SQL. Sharding allows us to scale horizontally which is a huge plus for our data.
http://news.ycombinator.com/item?id=1691748
By any reasonable definition "NoSQL" ought to include non-SQL RDBMSs in its scope (because there's no sound reason why the relational model can't address the same requirements as other NoSQL models). If you accept that, then there is no limit to what NoSQL DBMSs could do. We would have no more need of SQL - ever!
Sadly, there seems to be a common assumption among NoSQL thought leaders that "NoSQL" has to mean "not relational". That is highly unfortunate because if the relational model is ignored then NoSQL is never likely to replace SQL for many purposes. (I take it for granted that finding a long-term, relational model replacement for SQL would actually be a good thing :)
You don't want to use NoSQL typically when you....
... don't want to use SQL! /hardy har har
Most of the NoSQL solutions I've seen seem to fall in the key-value store approach, and aren't relational. They tend to give up ACID properties.
So when you evaluate a database system, when you don't need ACID, when you don't want relational algebra, when you do have a need for a KV store, then the NoSQL approach is your friend.
Note too that there is a wide variety of 'NoSQL' systems, and they all are busily working on slightly different approaches.

What is the benefit of using a "document-oriented DBMS"?

I must be missing something, because everything I've seen so far suggests that it isn't any more interesting than a single table for storing blobs and a second table for tags that apply to it.
Now I certainly can see some benefit to that from a design pattern, but why would I want to use a "document-oriented DBMS" instead of just building it using a traditional database like SQL Server, Oracle, or Postgres?
I enjoyed listening to the floss weekly episode about CouchDB. Lots of reasoning and ideas there.
Prior to listening, most of the stuff I read about this topic triggered not much insight (for me). Listening to people talking and reasoning about why&where you want to use document-oriented DBs helped me a lot to really get the concepts, reasoning, pros and cons. Now all the articles and statements (IMHO) suddenly make a lot more sense.
Your mileage may vary, but this helped me a lot.
I presume you are meaning products such as Couch DB or Tokyo Cabinet (rather than ECM products like Documentum). I think the attraction for many developers is familiarity.
Firstly, the conceptual model (in most cases) is key-value pairs, like a configuration file. As most frameworks seem to require a lot of configuration-wrangling, front-end/middle tier developers are comfortable with that way of working working. Secondly, these tools offer interfaces in developer-friendly languages like Java, Python, etc.
Whereas, traditional RDBMS products require thinking in a different fashion - relationally. And they require learning not just a weird language, SQL, but a new way of programming: set-based rather than procedural. If you rehearse the arguments for putting business logic in the middle tier rather stored procedures in the database, well a lot of them apply to No SQL as well.
why would I want to use a "document-oriented DBMS" instead of just
building it using a traditional database like SQL Server, Oracle, or
Postgres?
Historically document-oriented DBMS do not allow to span ACID transactions across documents. They have poor support for Foreign Keys. The trade ought to allow for more accessible operations, maintenance and scaling.