Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available? - distributed-computing

Two points I don’t understand about RDBMS being CA in CAP Theorem :
1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP?
2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes?
I’m a novice at DB architecture and CAP theorem so please bear with me.

It is very easy to misunderstand the CAP properties, hence I'm providing some illustrations to make it easier.
Consistency: A query Q will produce the same answer A regardless the node that handles the request. In order to guarantee full consistency we need to ensure that all nodes agree on the same value at all times. Not to be confused with eventual consistency in which the network moves towards having all data consistent but there are periods of time in which it is not.
Availability: If the distributed system receives query Q it will always produce an answer for that query. This should not be confused with "high-availability", this is not about having the capacity to process a higher troughput of queries, it is about not refusing to answer.
Partition Tolerance: The system continues to function despite the existence of a partition. This is not about having mechanisms to "fix" the partition, it is about tolerating the partition, i.e. continuing despite the partition.
Note that the following examples do not cover all possible scenarios. Consider the following caption:
An example for CP:
The system is partition tolerant because its nodes keep accepting requests despite the partition; it is consistent because the only nodes providing answers are those that maintain a connection to the master node that handles all the write requests; it is not available because the nodes in the other partition do not provide an answer to the queries they receive.
Examples for AP:
Either because (respectively) we have the slave nodes replying to requests regardless whether they able to reach master or because the slave nodes in the other partition elect a new master, or because we have a masterless cluster, availability is achieved because all questions are getting an answer - consistency is dropped because both partitions are replying while potentially yielding different states.
Examples for CA:
If we disconnect nodes when a partition occurs, we can ensure that we have at most one partition which ultimately means that the network is not partitioned anymore, or simply there is no service at all. This is the opposite of partition tolerance, because the system is avoiding the partition instead of functioning despite it. Consistency and availability holds in these partially or fully disconnected systems because all working nodes (if any) have the same state and all received queries (if any) will get an answer - shutdown nodes do not receive queries.
To answer the questions:
Under default configurations, databases such as Cassandra and MongoDB are partition tolerant because they do not shutdown nodes to cope with partitions, whereas RDBMS such as MySQL do.
Availability has very little to do with master/slave setup, e.g. Cassandra is masterless and very available because it doesn't really matter which node dies. As for availability in a master/slave setup, there is no reason to stop responding to all queries when master is dead, but you may need to suspend write operations while electing a new one.

A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can not achieve all three at the same time. Some databases actually make an effort to support all three but still prioritizes them in a certain way.
For example, MySQL can be CP and CA depending on the configurations. By default, it is CA because it follows a master slave paradigm which data is replicated to the slaves. Partition tolerance is sacrificed in the event that a set of the slaves loses the connection to the master and therefore decides to elect a new master creating two masters with their own set of slaves.
However, MySQL also has another configuration which is a clustered configuration. It prioritizes CP over availability eg. the cluster will shutdown if there are not enough live nodes to serve all the data.
There are probably more configurations for MySQL that makes it satisfy other CAP theorem combinations but overall, I just wanted say that it depends on what your system requires. Sometimes databases are better for one configuration vs another so its best to see what kinds of problems that may also occur in using a certain configuration.
As for implementing the CAP theorem, I would advise taking a further look into different databases and how they implement the priorities for the CAP theorem. There are just too many different ways of implementing them eg. generally, the master slave model is used for CA systems, the hash ring for AP systems, etc.

CAP theorem is problematic and it applies only to distributed database systems. When you have distributed databases then network partition and node crashes can happen. And when network partition happens you must have partition tolerance (the P of your CAP).
So to answer your question number 1) It’s either CP or AP. It can be configured as Will mentioned.
More about why partition tolerance is a must:
https://codahale.com/you-cant-sacrifice-partition-tolerance/
More about problems around CAP theorem:
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html

I agree that RDBMS can have all the properties of CAP. I have started studying noSQL DBs and had prior experience with IBM DB2.
Here is how IBM DB2 satisfies all the 3 CAP properties
C : Consistency : Every relational database satisfies this due to the transactional nature of RDBMS.
A : Availability : Availability means that when a query is made for a data that exists, it should be returned. Again, a relational database is designed to do this easily.
P : Partition Tolerance : This is the most interesting one. From DB2 stand point, in the application that I was working on, we had 2 databases spread across different data centres. One was the primary and communicated with the secondary via heartbeats. Each of these primary and secondary databases, had 12 physical instances where data was distributed on the basis of some predefined logic. If the primary goes down, the secondary detects this and takes the place of primary. Since the primary and secondary were always maintained in sync, data remains consistent as well.
This is how I think that RDBMS satisfies all 3 properties of CAP Theorem.
I may be wrong, and open to discussion on this.

Related

How CA distributed system according to Cap Theorem can exist

How can a distributed system be consistent and available (CA)?
Because I would argue when a network partition occurs, CA cannot be possible in a way where every node of the network, even the partioned nodes that users are connected to, continue to be available and answer with consistent data.
It can't.
As often mentioned, the CAP theorem in its original form is a little misleading. It can be restated as
in the presence of the network partition, a distributed system is either available or consistent
so you are right. Generally, systems cannot be classified as CA, CP or AP only, since partition tolerance is a property of the system, which describes what to choose in case of a network partition. So it is possible that a system can behave according to AP sometimes, and CP other times (however it is not common).
Another interesting part is that RDBMS databases are often at the CA side of the triangle. This is only the case in a single node setup. Even with master (write) - slave (read) setup, the system is not CA (or if it is termed "CA" for some reason, and cannot recover from network partitions, then a split-bran scenario may happen, a new master is elected for the partition, and chaos ensues, possibly breaking the consistency of the system).
Useful read: https://codahale.com/you-cant-sacrifice-partition-tolerance/.
It can, but it won't.
The CAP theorem reasons about guarantees when one or more nodes get isolated from the rest of the cluster. In such cases a node has three options which result in the three known CAP trade-offs: i) it keeps responding to any received requests AP; ii) it no longer responds to received requests until it is again able to reach the others CP; iii) it shuts down before receiving any requests to eliminate the partition along with it CA.
In other words you can achieve CA by having your nodes shutting down instead of tolerating the partition but bear in mind that partitions are likely to keep happening hence this will converge to the scenario in which you have a single node in your cluster and I assume this is the opposite of what you want, i.e. having a cluster with multiple nodes is kind of the whole point.
Therefore in practice you end up choosing between CP and CA. See this answer for more illustrative examples.
Dr. Stonebraker says: The guidance from the CAP theorem is that you
must choose either A or C, when a network partition is present. As is
obvious in the real world, it is possible to achieve both C and A in
this failure mode.
See this for thoughts on why CA can exist:
CA is a specification of the operating range: you specify that the
system does not work well under partition or, more precisely, that
partitions are outside the operating range of the system.
My background is far from these theoretical considerations and I must say it is highly confusing. I am researching distributed Blockchain systems and I don't see why those "generalized" definitions of C, A, P must always apply. If let's say 5% of nodes fail or are otherwise partitioned, the consensus still functions. If an end user is connected to a partitioned node, the node could let the user know it lost connection. I don't even see how any major Blockchain network is CP without defining conditions such as "if a certain amount of nodes fail or get partitioned, the consensus halts".

Why are RDBMS considered Available (CA) for CAP Theorem

If I understand the CAP Theorem correctly, availability means that the cluster continues to operate even if a node goes down.
I've seen a lot of people (http://blog.nahurst.com/tag/guide) list RDBMS as CA, but I do not understand how RBDMS is available, as if a node goes down, the cluster must go down to maintain consistency.
My only possible answer to this has been that most RDBMS are a single node, so there is no "non-failing" node. But, this seems to be a technicality, not true 'availability' and definitely not high availability.
Thank you.
First of all, let me clarify and state that the consistency in RDBMS is different than consistency in distributed systems. RDBMS (single system) applies consistency to transactional consistency, where as in distributed systems consistency means view from anywhere in the system (read from any node) is consistent. So RDMBS single node cannot be discussed with regards to CAP theorem. It is like comparing apple to orange.
RDBMS with master-slave can be compared to distributed systems. Here RDBMS can be configured to CA/CP or AP. MySQL for example, provides a way to configure the system in a way that if there is a quorum loss (not enough secondary available for commit log replication), the cluster is not available (CP system). MySQL also provides a configuration to allow the cluster to operate as long as master is available (CA system) with the potential of data loss. SQL Server AlwaysOn is an AP system, because commit log replication is asynchronous (even on sync replicas).
So RDBMS can be any of CA, CP or AP in a distributed world.
I believe you are misunderstanding the relation between CAP-Availability and node-UP/DOWN. Availability is about providing an answer to every received query - when a node is down it cannot receive queries, therefore if you bring down parts of or the entire cluster, the CAP-Availability property holds. Although this may sound counter intuitive at first glance, by shutting down nodes you are holding on to CAP-Availability and dropping CAP-Partition tolerance instead. I've recently posted an answer whose examples provide some clarification.
In a nutshell: A partition occurs that isolates node N. If N receives a request it can either: i) answer which grants availability but drops consistency because N is out of sync; ii) do not answer to avoid replying with an out-of-date result, thereby dropping availability because we received a request but issued no reply for it.
Alternatively we can shutdown N as soon as it becomes disconnected from the rest of the cluster which allows us to keep C and A, but drop P, because: i) N will not receive any requests; ii) all received requests will be performed to the fully connected and consistent cluster, hence they will all be answered with consistent values; iii) the cluster is not partition tolerant because it does not tolerate partitions - instead it shutdowns partitioned nodes.
In CAP Theorem P is for Partition tolerance , which is the ability of system to handle partitions(partitions are isolated clusters - due to network failure or any other reason ..).
In a distributed network to handle a partition , system has to pick either Consistency or Availability.
In case of RDBMS there is no chance for partitions (assuming not distributed which is normal case) ,So Those will be always CA.

Why is MongoDB supposed to Consistent & Partition tolerant but not Available [duplicate]

Everywhere I look, I see that MongoDB is CP.
But when I dig in I see it is eventually consistent.
Is it CP when you use safe=true? If so, does that mean that when I write with safe=true, all replicas will be updated before getting the result?
MongoDB is strongly consistent by default - if you do a write and then do a read, assuming the write was successful you will always be able to read the result of the write you just read. This is because MongoDB is a single-master system and all reads go to the primary by default. If you optionally enable reading from the secondaries then MongoDB becomes eventually consistent where it's possible to read out-of-date results.
MongoDB also gets high-availability through automatic failover in replica sets: http://www.mongodb.org/display/DOCS/Replica+Sets
I agree with Luccas post. You can't just say that MongoDB is CP/AP/CA, because it actually is a trade-off between C, A and P, depending on both database/driver configuration and type of disaster: here's a visual recap, and below a more detailed explanation.
Scenario
Main Focus
Description
No partition
CA
The system is available and provides strong consistency
partition, majority connected
AP
Not synchronized writes from the old primary are ignored
partition, majority not connected
CP
only read access is provided to avoid separated and inconsistent systems
Consistency:
MongoDB is strongly consistent when you use a single connection or the correct Write/Read Concern Level (Which will cost you execution speed). As soon as you don't meet those conditions (especially when you are reading from a secondary-replica) MongoDB becomes Eventually Consistent.
Availability:
MongoDB gets high availability through Replica-Sets. As soon as the primary goes down or gets unavailable else, then the secondaries will determine a new primary to become available again. There is an disadvantage to this: Every write that was performed by the old primary, but not synchronized to the secondaries will be rolled back and saved to a rollback-file, as soon as it reconnects to the set(the old primary is a secondary now). So in this case some consistency is sacrificed for the sake of availability.
Partition Tolerance:
Through the use of said Replica-Sets MongoDB also achieves the partition tolerance: As long as more than half of the servers of a Replica-Set is connected to each other, a new primary can be chosen. Why? To ensure two separated networks can not both choose a new primary. When not enough secondaries are connected to each other you can still read from them (but consistency is not ensured), but not write. The set is practically unavailable for the sake of consistency.
As a brilliant new article showed up and also some awesome experiments by Kyle in this field, you should be careful when labeling MongoDB, and other databases, as C or A.
Of course CAP helps to track down without much words what the database prevails about it, but people often forget that C in CAP means atomic consistency (linearizability), for example. And this caused me lots of pain to understand when trying to classify. So, besides MongoDB give strong consistency, that doesn't mean that is C. In this way, if one make this classifications, I recommend to also give more depth in how it actually works to not leave doubts.
Yes, it is CP when using safe=true. This simply means, the data made it to the masters disk.
If you want to make sure it also arrived on some replica, look into the 'w=N' parameter where N is the number of replicas the data has to be saved on.
see this and this for more information.
MongoDB selects Consistency over Availability whenever there is a Partition. What it means is that when there's a partition(P) it chooses Consistency(C) over Availability(A).
To understand this, Let's understand how MongoDB does replica set works. A Replica Set has a single Primary node. The only "safe" way to commit data is to write to that node and then wait for that data to commit to a majority of nodes in the set. (you will see that flag for w=majority when sending writes)
Partition can occur in two scenarios as follows :
When Primary node goes down: system becomes unavailable until a new
primary is selected.
When Primary node looses connection from too many
Secondary nodes: system becomes unavailable. Other secondaries will try to
elect a new Primary and current primary will step down.
Basically, whenever a partition happens and MongoDB needs to decide what to do, it will choose Consistency over Availability. It will stop accepting writes to the system until it believes that it can safely complete those writes.
Mongodb never allows write to secondary. It allows optional reads from secondary but not writes. So if your primary goes down, you can't write till a secondary becomes primary again. That is how, you sacrifice High Availability in CAP theorem. By keeping your reads only from primary you can have strong consistency.
I'm not sure about P for Mongo. Imagine situation:
Your replica gets split into two partitions.
Writes continue to both sides as new masters were elected
Partition is resolved - all servers are now connected again
What happens is that new master is elected - the one that has highest oplog, but the data from the other master gets reverted to the common state before partition and it is dumped to a file for manual recovery
all secondaries catch up with the new master
The problem here is that the dump file size is limited and if you had a partition for a long time you can loose your data forever.
You can say that it's unlikely to happen - yes, unless in the cloud where it is more common than one may think.
This example is why I would be very careful before assigning any letter to any database. There's so many scenarios and implementations are not perfect.
If anyone knows if this scenario has been addressed in later releases of Mongo please comment! (I haven't been following everything that was happening for some time..)
Mongodb gives up availability. When we talk about availability in the context of the CAP theorem, it is about avoiding single points of failure that can go down. In mongodb. there is a primary router host. and if that goes down,there is gonna be some downtime in the time that it takes for it to elect a new replacement server to take its place. In practical, that is gonna happen very qucikly. we do have a couple of hot standbys sitting there ready to go. So as soon as the system detects that primary routing host went down, it is gonna switch over to a new one pretty much right away. Technically speaking it is still single point of failure. There is still a chance of downtime when that happens.
There is a config server, that is the primary and we have an app server, that is primary at any given time. even though we have multiple backups, there is gonna be a brief period of downtime if any of those servers go down. the system has to first detect that there was an outage and then remaining servers need to reelect a new primary host to take its place. that might take a few seconds and this is enough to say that mongodb is trading off the availability

High Availability In MongoDB

Everybody say that mongoDB is CP in CAP Theorem! But with using master-slave replication, It has high availability too (If a primary fails, the remaining members will automatically try to elect a new primary). My question is, In which situations (and how) It can have AP (with Eventual Consistency)?
Actually, there is a two-part answer:
Sharding level: There's only one authoritative shard per data segment (C), shards work independently (P), if a shard is not available its data isn't available as well (A)
Replica set level: There's only one authoritative master node (C), if required a new master will be selected (P), if there's no primary (during the voting phase which should only last a few seconds, but that's enough) you cannot access the data on that node. If you enable reading from secondaries (eventual consistency), you can read data from secondaries during the voting phase, but still not write new data. Thus it's a CP system.
In general, you don't lose the third characteristic entirely, but trade it for additional latency / overhead or for not having it for a short amount of time.
it is not quite possible to have C,A,P all together, that is the theorem, you cant have them all, you can take only two.
See :
Where does mongodb stand in the CAP theorem?

NoSQL: What does it mean for MongoDB or BigTable to not always be "Available"

Reading Nathan Hurst's Visual Guide to NoSQL Systems, he includes the CAP triangle:
Consistency
Availibility
Partition Tolerance
With SQL Server being an AC system, and MongoDB being a CP system.
These definitions from come a UC Berkley professor Eric Brewer, and his talk at PODC 2000 (Principles of Distributed Computing):
Availability
Availability means just that - the service is available
(to operate fully or not as above). When you buy the book you want to
get a response, not some browser message about the web site being
uncommunicative. Gilbert & Lynch in their proof of CAP Theorem make
the good point that availability most often deserts you when you need
it most - sites tend to go down at busy periods precisely because they
are busy. A service that's available but not being accessed is of no
benefit to anyone.
What does it mean, in the context of MongoDB, or BigTable, for the system to not be "available"?
Do you go to connect (e.g. over TCP/IP), and the server does not respond? Do you attempt execute a query, but the query never returns - or returns an error?
What does it mean to not be available?
Availability in this case means that in the event of a network partition, the server that a client connects to may not be able to guarantee the level of consistency that the client expects (or that the system is configured to provide).
Assuming that you have 3 nodes, A, B, and C, in a hypothetical distributed system. A, B, and C are each running in their own rack of servers, with 2 switches between them:
[Node A] <- Switch #1 -> [Node B] <- Switch #2 -> [ Node C ]
Now assume that said system is set up so that it is GUARANTEED that any write will go to at least 2 nodes before it is considered committed. Now, lets assume that switch #2 gets unplugged, and some client is connected to node C:
[Node A] <- Switch #1 -> [Node B] [ Node C ] <-- Some client
That client will not be able to issue Consistent writes, because the distributed system is currently in a partitioned state (namely, Node C cannot contact enough other nodes to guarantee the 2-node consistency required).
I'd add to this that some NoSQL databases allow very dynamic selection of CAP attributes. Cassandra, for instance, allows clients to specify the number of servers that a write must go to before it is committed on a per-write basis. Writes going to a single server are "AP", writes going to a quorum (or all) servers are more "CA".
EDIT - from the comments below:
In MongoDB you can only have master/slave configuration within a replica set. What this means is that the choice of AP vs CP is made by the client at query time. The client can specify slaveOk, which will read from an arbitrarily selected slave (which may have stale data): mongodb.org/display/DOCS/…. If the client is not OK with stale data, don't specify slaveOk and the query will go to the master. If the client cannot reach the master, then you'll get an error. I'm not sure exactly what that error will be.
The CAP theorem applies to distributed computer systems. MongoDB supports two distinct forms of distributed computing: sharding for horizontal scaling and replica sets for failover/high availability. The two can be used together or independently. I think the CAP theorem applies slightly differently to the two forms:
Sharding level - MongoDB stores data on at most one authoritative shard.
Strong Consistency: A piece of data exists on at most one shard. Incorrect/stale data does not exist.
Strong Partition-tolerance: Even if network partitioned, requests never return incorrect/stale data. Shards continue working independent of other shards.
Weak Availability: Reads/writes of data on a downed shard will fail.
Replica set level - MongoDB replicates data within a shard, ensuring consistency via a single, authoritative primary node.
Strong Consistency: All reads/writes handled by the primary node.
Strong Partition-tolerance: If enough nodes become unreachable, a new primary is elected. The election process ensures there is always at most one primary node.
Weak Availability: Reads/writes will fail when no primary exists, even though the data could be accessed via secondary nodes.
The slaveOK/ReadPreference.SECONDARY option sacrifices some consistency (stale data can be read) for increased performance and availability.