Is CouchDB a good persistent layer for Membase? - nosql

Membase is great for social game due to it's low latency.
As I understand CouchDB is a MVCC system using b+ tree, with a focus on append only design.
(http://guide.couchdb.org/draft/btree.html)
One of the most important scenario of Membase is social game.
Social game has a lot of write operations (50+%).
And a good portion of them are in-place updates.
So why is CouchDB a suitable persistent layer for Membase?

I'd also add that CouchDB's append-only log format really doesn't have much relation to whether application writes are new items or updates. The append-only format gives us much better reliability and performance than an in-place system (like sqlite...which is still quite reliable). It's also much easier to take backups of.
Does Membase NEED an append-only log format? maybe not...does it NEED CouchDB?...YES!
The benefits of map-reduce and indexing as well as eventually consistent replication that CouchDB brings are nothing less than huge for Membase...and the benefits of low-latency, clustering and UI that Membase brings to CouchDB are arguably just as important.
(Disclosure: I work for Couchbase)
Perry Krug

CouchDB has great file formats, great ability to recover from crashes, sophisticated authentication and authorization tools, and a universal, standard, interface: HTTP. CouchDB is poor at low-latency queries, optimized memory utilization, and heavy update speeds (a million per second).
Membase currently has only a simple SQLite file format for persistence, less sophisticated authentication and authorization, using a more obscure protocol. Membase is amazing for low-latency queries, ideal memory utilization, and heavy update speeds.
I think the two complement each other very well. Since the merging effort is coming from core developers in both projects, collaborating together, I expect to see the strengths of both and the weaknesses of neither. Yes, CouchDB is a good persistence layer for Membase.

Money speaks and if there ever was a vote of confidence then here it is, not only from a new lead investor but also from the existing ones as well.
http://www.couchbase.com/press-releases/couchbase-series-C
Besides, don't you think that Membase itself is more than well enough qualified to make an evaluation for such a merger decision?

Related

What does a web-based framework scalable?

thanks you very much in advance.
First of all, I conceive scalability as the ability to design a system that doest not change when the demand of its services, whatever they are, increases considerably. May you need more hardware (vertically or horizontally0? Fine, add it at your leisure because the system is prepared and has been designed to cope with it.
My question is simple to ask but presumably very complex to answer. I would like to know what you I look at in a framework to make sure it will scale accordingly, both in number of hits and number of sessions running simultaneously.
This question is not about technology nor a particular framework at all, it is more a theoretical question.
I know that depend very much on having a good database design and a proper hardware behind with replication, etc... Let's assume that this all exists, however yet my framework must meet some criteria, what?
Provide a memcache?
Ability to run across multiple machines (at the web server level) and use many replicated databases? But what is in the software that makes that possible?
etc...
Please, let's not relate the answers with any particular programming language or technology behind.
Thanks again,
D.
I think scalability depends most of all on the use case: do you expect huge amounts of data, then you should focus on the database, if it's about traffic, focus on the server, is it about adding new features, focus on your data-model and the framework you are using...
Comparing a microposts-service like Twitter to a university website or a webservice like GoogleDocs you will find quite different requirements.
First of all the common notion of scalability is the ability of a software to improve in throughput or capacity if more hardware resources are added (CPUs, memory, bandwidth etc).
Software that does not improve in increased resources is not scalable.
Getting out of the definitions, I think your question is related to evaluation of frameworks you are planning to introduce to your implementation that may affect your software's ability to scale.
IMHO the most important factor to evaluate when introducing a framework is to see if there is hidden serialization in it (that serialization in effects transfers to/affects your software)
So if you introduce a framework that introduces serialization in your application that can affect your ability to scale.
How to evaluate?
Careful source code inspection (if open source)
Are there any performance guarantees offered by those that build the
framework?
Do measurements yourself to see how introducing this framework
affects your performance and replace if not satisfied

Best suited NoSQL database for Content Recommender

I am currently working in a project which includes migrating a content recommender from MySQL to a NoSQL database for performarce reasons. Our team has been evaluating some alternatives like MongoDB, CouchDB, HBase and Cassandra. The idea is to choose a database that is capable of running in a single server or in a cluster.
So far we have discarded the use of Hbase due to its dependency on a distributed environment. Even having the idea of scaling horizontally, we need to run the DB in a single server for a little while in production. MongoDB was also discarded because it does not support map/reduce features.
We have still 2 alternatives and we have no solid background to decide. Any guidance or help is appreciated
NOTE: I do not pretend to create a religion-like discussion with non-founded arguments. It is a strictly technical question to be discussed in the problem's context
Graph databases are usually considered as best suited for recommendation engines, since a lot of the recommendation algorithms are actually graph based. I recommend looking into Neo4J - it can handle billions of nodes/edges on a single machine and it supports a so-called high availability mode which is a master-slave setup with automatic master selection.

Is Cassandra ready for prime time yet?

I started looking into Cassandra and I am really impressed with what it provides, but at the same time I read about how Reddit had a fire drill after migrating to Cassandra, and about twitter deciding to not using it for tweets. Although those were about a year ago or so, I am wondering if the latest version is ready for prime time yet?
Netflix has talked extensively about how they are moving from Oracle and SimpleDB entirely to Cassandra.
Twitter was also at the Cassandra Summit a few weeks ago talking about how they use Cassandra for multiple projects; Reddit had some early problems with being under-capacity, but later said, "Our traffic more than tripled [in 2010], and the transparent scalability afforded to us by Apache Cassandra is in large part what allowed us to do it on our limited resources."
There are many other companies using Cassandra (and DataStax customers are the tip of the iceberg).
In short, Cassandra is solving real problems for real companies. Just don't go into it expecting MySQL and you'll be fine. The DataStax documentation is a good starting point.
(Chris is mistaken about API stability: we were clear that after 0.7 we would be strict about maintaining backwards compatibility, and we have, even for "maintenance" operations like schema updates and mixed-version cluster operation for downtime-free upgrades. I would also note that unlike many "NoSQL" databases, Cassandra has always taken data durability seriously.)
Cassandra is still under very heavy development. The API is still changing, and in that respect, no the product isn't stable. There are still occasional glitches, and a number of kinks to be worked out. It is still a very young product with a long way to reach before actual maturity.
Having said that, Cassandra is quite capable, provided that you are capable of structuring your data in a manner suited to Cassandra's strong points. In other words, if you play to Cassandra's strengths I think you'll find that it's "mature enough" at this point. There are already a number of large sites that use cassandra, and in this regard it's certainly ready for "prime time" (whatever that really means).
It will be years (if ever) before it has the same reputation and stability as a traditional DBMS like MySQL.

Choosing noSQL - availability priorited

We have thought a bit about running a noSQL database for our next project. However, we're not sure about which platform that will give us the best possible availability and has the best built-in replication features/functions to provide this - with the least headache.
Right now, Cassandra appears as the best candidate, but we would like to hear more about this from someone that have more experience in this area, then we do.
Thanks a lot!
High availablity will most likely be achieved with a Dynamo clone.
Cassandra is a good option although it has been bashed recently by several early adapters.
Project Voldemort is also Dynamo-based and therefore easily optimized for high-availability, it's what LinkedIn are using.
Another interesting noSQL option might be membase, I myself didn't use it but their notion of virtual buckets for rebalancing as opposed to just consistent hashing makes a lot of sense and would appear to provide more robust high-availability.

Regarding NOSQL - Alternatives to RDBMS

I have been stumbled on things like RDBMS alternatives very often now a days... And i am following some of the open source implementation..
What I understand is: it is best suited for the web apps in large scale (like google & amazon).. they mainly concentrated on very large distributed data stores..
how this could help small start ups looking for a existing costly alternative data stores.. and is this really yield both performance & maintanance gain for small applications?
I just started this discussion and belive somebody here already got same frustration trying these new approaches earlier and may gain experience in it.. this may help start ups like us..
It all depends on your scaling requirments. RBDMS require locks to work and so can only really be scaled "up". NoSQL-style DBs such as Googles bigtable and CouchDB are massively scalable and very cheap, but can get very complicated to write an app on top of as developers have to deal with all kinds of data consistency/fault tolerance issues in thier application layer.
I would say for a small application you're probably better off with a SQL-based relational database. Whilst in theory much more expensive, being realistic at a small scale that price trades off as a much simpler system to work with.
If however you're start up is a muti-tenant solution which needs to deal with a lot of writes, I'd look carefully at alternatives.