What is the difference between CouchDB and Couchbase? - nosql

Are there any essential differences between CouchDB and Couchbase?

I think there are some essential differences between CouchDB and Couchbase Server that need to be pointed out.
I will not write about the advantages of switching from CouchDB to the Couchbase Server because those are described pretty much everywhere (see The Future of CouchDB by Damien Katz or
Couchbase vs. Apache CouchDB
by Couchbase). Instead, I will try to enumerate features of CouchDB that you will not find in the Couchbase Server.
All of the names relating to CouchDB and Couchbase can be really confusing, so I've updated this answer, to begin with a brief explanation of the most important ones.
Names and confusion
There is CouchDB, CouchIO, CouchOne, Couchbase, Couchbase Server, Couchbase Mobile, Couchbase Lite, CouchApps, BigCouch, Touchbase, Membase, Memcached, MemcacheDB... all different and yet related in a way not at all obvious from the names alone.
First, there was CouchDB, a database created by Damien Katz, a former IBM developer. Its official name was changed to Apache CouchDB after it became an Apache project.
A company named CouchIO was founded to work on Apache CouchDB and later changed its name to CouchOne (by "its name" I mean the company name - not the database name).
CouchOne (formerly CouchIO) merged with Membase (formerly NorthScale) to form a new company called Couchbase. Membase (the company) developed Membase (a product of the same name). Membase was created by several leaders of the Memcached project and it used the Memcached protocol. After the merger of CouchOne and Membase, Couchbase continued the development of the Membase software and later changed its name to Couchbase Server.
Today I think most people believe that Couchbase Server is a new version of CouchDB but it is, in fact, a new version of Membase. It still uses the Memcached protocol and not the RESTful API of CouchDB. Meanwhile, CouchDB is still CouchDB, actively maintained and enhanced as an Apache project.
Now to the relevant differences:
Licensing
The Couchbase Server is not entirely open-source/free software. There are two versions: Community Edition (free but no latest bug fixes) and Enterprise Edition (there are restrictions on usage, confidentiality provisions, audits by Couchbase Inc. that "will be conducted during regular business hours at Licensee's facilities" and other terms typical to proprietary software that many people may find unacceptable).
CouchDB is an open-source/free software (no strings attached) project of The Apache Software Foundation and is released under the Apache License, Version 2.0 (DFSG-compatible, FSF-approved, OSI-approved, GPL-compatible, non-copyleft, commercial-friendly).
Philosophy
I have never seen it directly pointed out but this may be actually the most important difference between those two databases because it is deeply about the underlying philosophy of distributed computing models and not only about certain features, APIs or licensing. CouchDB and the Couchbase Server completely differ in their philosophy of building distributed systems and databases.
According to the CAP theorem it is impossible for a distributed database to simultaneously provide consistency, availability and partition tolerance.
CouchDB is an AP type system (provides Availability and Partition tolerance).
Couchbase Server is EITHER a CP type system (according to Wikipedia) OR a CA type system (according to Couchbase technical update) - WHICH OF THESE IS CORRECT? PLEASE COMMENT.
Features
This is what I found to be a list of CouchDB features that are not supported by the Couchbase Server:
no RESTful API (only for views, not for CRUD operations)
no _changes feed
no peer-to-peer replication
no CouchApps
no Futon (there is a different administration interface available)
no document IDs
no notion of databases (there are only buckets)
no replication between a CouchDB database and Couchbase Server
no explicit attachments (you have to store additional files as new key/value pairs)
no HTTP API for everything (you need to use the Couchbase Server SDKs or one of the Experimental Client Libraries at Couchbase Develop so no experiments with curl and wget)
no CouchDB API (it uses the Memcached API instead)
you can't do everything from the browser (you have to write a server-side application)
no two-tier architecture for Web apps is possible (you have to write a server-side application to sit between the browser and the database, like with relational databases)
no eventual consistency
not entirely open-source/free software
not a drop-in replacement for CouchDB (seems like a drop-in replacement for Memcached instead)
Those features of CouchDB may or may not be important to you so whether the lack of them is a disadvantage or not is strictly subjective, but I think that the decision whether to switch from CouchDB to Couchbase Server or not should be based on those differences and your dependence on those feature in your current CouchDB deployments.
For example if you've got interested in CouchDB after watching The CouchDB changes feed NodeCamp talk by Mikeal Rogers or one of the great CouchApp tutorials by J. Chris Anderson then you have to realize that if you want to switch to the Couchbase Server then you will have to forget about pretty much everything they were talking about.
Because of that, I would say that Couchbase Server looks like an evolution of Memcached and Membase (not an evolution of CouchDB) and as such it looks like a great product if you are currently using Memchached or Membase. If you are using CouchDB in the most basic way then you may consider using the Couchbase Server for the same things and it may or may not perform better (if you don't mind the license restrictions). But if you are actually using any of the features that are unique in CouchDB (like the changes feed, CouchApps, two-tier architecture, peer-to-peer replication etc.) then you can either forget about those features or stay with CouchDB.
In any case, make sure to read and understand the Migration to Couchbase for CouchDB Users tutorial before you think about switching.
People often get the wrong impression (maybe after reading things like "What's the future of CouchDB? It's Couchbase.") that CouchDB is somehow obsoleted by the Couchbase Server, or that it is an old, legacy version of Couchbase. Meanwhile CouchDB is an actively maintained open-source project, Couchbase server is a completely separate project (it is a newer project but it is not a newer version of CouchDB - they are not even compatible) and since even new tools for creating CouchApps still keep being developed (eg. see the Kanso project) then CouchDB is not going anywhere soon.
I hope it clarifies the confusion. Please correct me if I'm wrong on anything here.
Update:
Couchbase Server is actually a new name for the Membase Server (the Membase Server was renamed to Couchbase Server somewhere around version 1.8). See Couchbase 2011 Year in Review:
Unfortunately, we confused the heck out of many of our potential users. In addition to Membase Server and our new mobile products we also offered Couchbase Single Server which was a packaged “distribution” of Apache CouchDB. On top of that we began releasing developer previews of Couchbase Server 2.0, which incorporated CouchDB technology into Membase Server – but this product was not compatible with Couchbase Single Server (or CouchDB). [...] Membase Server will be renamed Couchbase Server 1.8 on its next release in January – a tiny step that simply alleviates “name” confusion. As has been planned from the beginning, the Couchbase Server 2.0 release (currently at Developer Preview 3) will add index and query functionality. While Couchbase Server 2.0 will incorporate substantial technology from the CouchDB project, it will not be upward compatible with CouchDB and it shouldn’t be viewed as a “version of CouchDB.” [emphasis added]
See also:
Comments to "The Future of CouchDB" by Damien Katz (removed in 2012 - available in the Web Archive)
Comments to "Why Couchbase?" by Damien Katz (removed in 2012 - available in the Web Archive)
Couchbase 2011 Year in Review
Membase Server is Now Couchbase Server
Couchbase technical update
Difference between Cloudant and CouchOne

They are different yet similar pieces of software. I've remixed the content from the top answer into a picture that might help clarify the "difference" as well as the common things:
A comment from Matt Ingenthron adds to this:
To add some context/corrections: NorthScale founders are Steve Yen and Dustin Sallings. I joined them shortly after founding. Also, Damien didn't later join Couchbase, he was part of CouchIO/Couch One prior to the merger. Citing a fun, historical source: https://youtube.com/watch?v=aZ_JOnU8tkI

I think CouchBase seem to be perceived as CouchDB's 'enterprise' alternative. Which in a way seem to be true.
Apart from lack of ability to attach files to records ( documents) and 'out-of-box' REST endpoints compared to CouchDB, CouchBase has sql like language i.e. N1QL (sometimes pronounced a Nickel, UPDATE renamed to SQL++ in Couchbase 7.0).
This is one of the reason why I don't really like / recommend using the term 'NoSQL'. I personally like term 'Non-relational'.

Related

Is it possible to scale Axon Framework without Axon Server Enterprise

Is it possible to scale Axon Framework without Axon Server Enterprise? I'm interested in creating a prototype CQRS app with Axon, but the final, deployable system has to be be free from licensing fees. If Axon Framework can't be scaled to half a dozen nodes using free software, then I should probably look elsewhere.
If Axon Framework turn out not to be a good choice for the system, what would you recommend? Would building something around Apache Pulsar be a sensible alternative?
I think I have good news for you then.
You can utilize Axon Framework perfectly fine without Axon Server Enterprise.
Firstly, you can use the Axon Server Standard edition, which is completely free and you can check out the code too if you want.
If you prefer to get infrastructure back in your own hands, you can also select different approaches to distributing the CommandBus and the EventBus/EventStore.
For the CommandBus the framework provides the DistributedCommandBus for which two implementation are in place, being:
JGroups
Spring Cloud
I'd argue option 2 is the most ideal for distributing your commands, as it gives you the freedom to choose whichever Spring Cloud Discovery Service implementation you desire. That should give you the handles to work "free of licenses" in that area.
For distributing your Events, you can broadly look at two approaches:
Share the database, aka your EventStore, among all instances
Use a event message bus to distribute your event messages
If you want instances of your nodes to be able to Event Source the Command Model, you are inclined to use option 1. This is required as Axon Framework requires a dedicated EventStore to be able to source the Command Models from history.
When you just want to connect other applications to your Event Stream, option 2 would be a good fit. Again, the framework has two options in this area:
AMQP
Kafka
The only thing I'd like to point out on this part additionally is that the Kafka extension is still in a release candidate state. It is being worked on actively at the moment though.
All these extensions and their options should be clearly stated in the Reference Guide, so I'd definitely check this documentation out if you are gonna start an application.
There is a sole thing you cannot distribute though, which is the QueryBus.
There is an outstanding issue to resolve this, for which someone put the effort in to provide a PR.
After seeing how Axon Server Standard edition does it though, he intentionally closed the PR (with the following comment) as it didn't seem feasible to him to maintain such a tool at this stage.
So, can you use Axon Framework without Axon Server Enterprise?
Well, I think you can. :-)
Mind you though, although you'd be winning on not having a license fee if you don't use Axon Server Enterprise, it doesn't mean your production is gonna be free.
You'd be introducing back quite some infrastructure set up and production time to get this going.
Hope this gives you plenty of feedback #ahoffer!

MongoDB for commercial use

As I am pretty horrible in reading English legal documents I hoped one of you could answer this question.
In about a month I need to do an internship at a company for my bachelor. They would like me to develop a system for internal use (will not be sold) that requires a database.
They are allowing me a free hand (from what I understood) in selecting a database. As (as far as I understand atm) the data that needs to be stored does not contain a lot of relations (1 or 2) and is not heavily queried, I was thinking of using mongoDB as a back-end server.
Can mongoDB community be used freely in this type of an application under the new license? Most I find using Google involves the old license.
First of all, it's important to know why MongoDB adopted a new license for the product Community Server. This change was made as a response to a increasing number of cloud providers that are offering MongoDB database as a paid service to their users without playing by the open-source rules. Indeed, it's pretty unfair to have companies reselling the free version of a product you spent a lot of money to develop without contributing anything back.
As you can read in MongoDB new license's FAQ What specifically is the difference between the GPL and the SSPL:
A company that offers a publicly available MongoDB as a service must release the software it uses to offer such service under the terms of the SSPL, including the management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the source code made available.
That means that a company that offers MongoDB Community Version as a service to their users, must open the source code of the softwares developed to make that service work, like: monitoring tools, user interfaces, etc.
What changes to you: nothing.
Be the software you are developing for internal or external use, your company is just using MongoDB as a component of the project, not as the final product. So you are free to keep using it.

What are the pros and cons of DynamoDB with respect to other NoSQL databases?

We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?
For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.
Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.
I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.
There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/
I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)
I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.
Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.
Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.

Can mongodb be used as an embedded database?

I am working on a RSS reader application. And I need to find a backend database. I want the database be embedded because I don't want the users to install a database server.
I know SQLite is a good choice, but I am wondering if there are any other nosql choices?
(I don't yet have 50 rep points to comment on, and build upon, the accepted answer; otherwise I would, sorry!)
You can embed MongoDB in your OEM solution but there are two things to consider:
It is written in C++, so if you are coding in a different language you might need to write a wrapper that launchers the database process separately.
MongoDB is licensed under Gnu AGPL-3.0 which is a copy left server license. The accepted answer, and the Google group quote, both correctly state that this would normally force you to also be AGPL licensed. However, they MongoDb states that the intention of the license is to allow refinements to their code to be submitted back, and that your product will remain separate. This makes me think that the normal copy left rules don't apply.
The goal of the server license is to require that enhancements to MongoDB be released to the community. Traditional GPL often does not achieve this anymore as a huge amount of software runs in the cloud. For example, Google has no obligation to release their improvements to the MySQL kernel – if they do they are being nice.
To make the above practical, we promise that your client application which uses the database is a separate work. To facilitate this, the mongodb.org supported drivers (the part you link with your application) are released under Apache license, which is copyleft free. Note: if you would like a signed letter asserting the above promise please request via email.
Source: http://www.mongodb.org/display/DOCS/Licensing
According to the Google Group, yes it can, but it doesn't cover how exactly.
Yes, but it isn't pretty and will
force your app to be AGPL licensed. If
you are interested take a look at how
the tools handle the --dbpath option.
Source: http://groups.google.com/group/mongodb-user/browse_thread/thread/463956a93d3fb734?pli=1
If you're using .NET, one option might be RavenDB, which is a document database, and can be embedded.
Please checkout https://github.com/Softmotions/ejdb
This project being developed to resolve this issue.
How about Couchbase Lite? It's an open source, embeddable document database. While it can function as a standalone document database, its real value is in its ability to synchronize with remote document databases. It may be aimed at iOS / Android, but it can run on anything with a JVM.
https://github.com/couchbase/couchbase-lite-java
There is no straight forwarding way to use MongoDB as an embedded library in terms of a well-reusable library. Eliot - head of 10gen - spoke of "it would be nice to have one" - but there is nothing available that could be reused in a sane way.
Looks like a lot of OEMs are trying to get Mongo on to their hardware and devices for real-time processing. A link from MongoDBs website
I usually use Buildroot to create a cross-compiled Embedded Linux root file-system along with all the user space packages.
I noticed that MongoDB is one of the packages that's already integrated as one of the Buildroot builtin packages.
You may check out MongoDB make file for some hints regarding how to built it for Embedded Linux.

NO-SQL reliable for small business app?

I'm deciding between go for a NON-SQL engine or a regular SQL one for a document managment system for small bussines.
I have experience with firebird/sql server and found a good track of reliability (specially with firebird).
This market is full of crappy "servers" (clon-made PC, the mayority), cheap harddisk, rarely use of RAID or anything like that, some are in locations where a power-off is normal, some not have a UPS, etc... (I will include off-site auto-backup to external servers, but that no change the internal setup). (I know about end-user education about such proper setups, but is stupid depend on that, so stick to te point)
From the desing point of view, a schema-less database is the way to go for my system, but, I worry if any of the actual solutions (MongoDb, Tokyo Cabinet, etc) are like firebird and survice crash, malfunctions & abuse so data corruption is very rare.
The plan is store the office documents there & provide a central repository.
Check out Neo4j. It is a graph database (schema-free) that can be used like a document or key/value store.
Neo4j has been in production for many years in environments like you describe. Unlike many other NOSQL databases Neo4j actually flushes data to disk and uses a transaction log to recover from an inconsistent state. It also has real transactions (full ACID) that can span multiple operations and treat them as a single unit (which also seems to be a feature that is frequently left out in many other NOSQL stores).
-Johan
(Disclaimer: I am part of the Neo4j team)
CouchDB has the reliability you need:
The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable (ACID) properties. On-disk, CouchDB never overwrites committed data or associated structures, ensuring the database file is always in a consistent state.
Look at the ACID Properties section here for more info.
With CouchDB you also get easy backup and replication.
I've no code in production using CouchDB yet, but so far I'm very happy with the tests and the development process with CouchDB.