I have been going through Vespa documentation for a while but interested in understanding advantages and disadvantages of vespa over no-sql db's like hbase and cassandra as a key-value store. I don't see any blog/post about it.
1) For hbase,cassandra recommended row size for better performance should not exceed 1MB , 32MB. How about vespa? How large can a vespa document be and what's the recommended size?
2) Where does Vespa fit in CAP theorem?
Vespa is more of an ElasticSearch alternative than an HBase/Cassandra one so while documents can be bigger it isn't the same use case
re 2 - Vespa is CP - as described in the documentation
To elaborate on question 2) re: CAP, Vespa is currently AP (with a caveat, see below), not CP. The C in CAP implies that the linearizability property holds for writes and reads, which is not offered by our existing consistency model. In particular, even though we have a write-ahead log per replica, there’s no consistent distributed log across replicas.
Note that our “A” in AP is “weak” in the sense that we depend on a centralised (but fault tolerant) cluster coordinator which tracks and communicates the availability of nodes. Nodes that are partitioned away from the coordinator leader are not guaranteed to successfully answer client requests (applies to both reads and writes).
I'll add a section to the linked documentation that explicitly states the CAP properties of Vespa.
Related
I have a question that I couldn't find although I read the whole book.
Consider a distributed system in which the database is replicated over five servers. At one point, the network between the replicated servers makes three of servers isolated from the remaining two. Is it still possible for a transaction that involves read and write operations against the replicated database to commit? Motivate
I would appreciate if you could answer this question
This is a bit general question. When I rephrase your question you ask what happens when there a split brain occurs in the distributed database.
And answer is it depends ;-)
Really. It depends what type of the distributed database you work with. From a high level perspective it depends what trade-off from the CAP the database chose - is the database CA or CP system?
(I use this differentiation for sake of brevity, see https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html)
If it's a CA system then for sure you can read and write to any partition. When the split brain is resolved then the database has some recovery tooling that mends the partitions back to a consistent state. Or the database may left this responsibility for the user that will get set of possible values and he has to decide which one is correct.
If it's a CP system then we can say that it depends if you work with the partition which consist the 3 servers of 5 (you work with the majority). Then you may read and write. If your client connects to the minor partition (2 servers of 5) then you probably are not permitted to read and write. But it depends on the consistency you requires. If you requires lineralizability then the client can do neither reads nor writes.
Mongo
From this resource I understand why mongo is not A(Highly Available) based on below statement
MongoDB supports a “single master” model. This means you have a master
node and a number of slave nodes. In case the master goes down, one of
the slaves is elected as master. This process happens automatically
but it takes time, usually 10-40 seconds. During this time of new
leader election, your replica set is down and cannot take writes
Is it for the same reason Mongo is said to be Consistent(as write did not happen so returning the latest data in system ) but not Available(not available for writes) ?
Till re-election happens and write operation is in pending, can slave return perform the read operation ? Also does user re-initiate the write operation again once master is selected ?
But i do not understand from another angle why Mongo is highly consistent
As said on Where does mongodb stand in the CAP theorem?,
Mongo is consistent when all reads go to the primary by default.
But that is not true. If under Master/slave model , all reads will go to primary what is the use of slaves then ? It further says If you optionally enable reading from the secondaries then MongoDB becomes eventually consistent where it's possible to read out-of-date results. It means mongo may not be be
consistent with master/slaves(provided i do not configure write to all nodes before return). It does not makes sense to me to say mongo is consistent if all
read and writes go to primary. In that case every other DB also(like cassandra) will be consistent . Is n't it ?
Cassandra
From this resource I understand why Cassandra is A(Highly Available ) based on below statement
Cassandra supports a “multiple master” model. The loss of a single
node does not affect the ability of the cluster to take writes – so
you can achieve 100% uptime for writes
But I do not understand why cassandra is not Consistent ? Is it because node not available for write(as coordinated node is not able to connect) is available for read which can return stale data ?
Go through: MongoDB, Cassandra, and RDBMS in CAP, for better understanding of the topic.
A brief definition of Consistency and availability.
Consistency simply means, when you write a piece of data in a system/distributed system, the same data you should get when you read it from any node of the system.
Availability means, the system should always be available for read/write operation.
Note: Most systems are not, only available or only consistent, they always offer a bit of both
With the above definition let's see where MongoDB and Cassandra fall in CAP.
MongoDB
As you said MongoDB is highly consistent when reads and write go to the same node(the default case). Further, you can choose in MongoDB to read from other secondary nodes instead of reading from only leader/primary.
Now, when you try to read data from secondary, your consistency will completely depend on, how you want to read data:
You could ask data which is up to maximum, say 5 seconds stale or,
You could just say, return data from majority of nodes for your select statement.
Same way when you write from your client into Mongo leader, you can say, a write is successful if the data is replicated to or stored on majority of servers.
Clearly, from above, we can say MongoDb can be highly consistent or eventually consistent based on how you read/write your data.
Now, what about availability? MongoDB is mostly always available, but, the only time when the leader is down, MongoDB can't accept writes, until it figures out the new leader. Hence, not highly available
So, MongoDB is categorized under CP.
What about Cassandra?
In Cassandra, there is no leader and any nodes can accept write, so the Cassandra cluster is always available for writes and reads even if some nodes go down.
What about consistency in Cassandra?
Same as MongoDB Cassandra can be eventually consistent or highly consistent based on how you read/write data.
You can give consistency levels in your read/write operations, For example:
read/write data from one node
read/write data from majority/quorum of nodes and more
Let's say you give a consistency level of one in your read/write operation. So, your write is successful as soon as data is written to one replica. Now, if your read request happens to go to the other replica where the data is not updated yet(could be due to high network latency or any other reason), you will end up reading the old data.
So, Cassandra is highly available but has configurable consistency levels and hence not always consistent.
In conclusion, in their default behavior, MongoDB falls under CP and Cassandra in AP.
Consistency in the CAP paradigm also includes "eventual consistency" which MongoDB supports. In a contrast to ACID systems, the read in CAP systems does not guarantee a safe return.
In simple words, this means that your Master could have an updated value, but if you do read from Slave, it does not necessarily return the updated value, and that it's okay to no have this updated value by design.
The concept of eventual consistency is explained in an excellent answer here.
By architecture, Cassandra is supposed to be consistent; it offers a special implementation of eventual consistency called the 'tunable consistency' which would meant that the client application may choose the method of handling this- it even offers multi data centre consistency support at low levels!
Most issues from row wise inconsistency in Cassandra comes from the fact that Cassandra uses client timestamps to determine which value is the most recent, and not the server side ones, which may be tad bit confusing to understand at first.
I hope this helps!
You have only to understand the "point-in-time": As you only write to mongodb master, even if slave is not updated, it is consistent, as it has all the data generated util the sync moment.
That is not true for cassandra. As cassandra uses a master-less model, there's no garantee that other nodes has all the data. At a certain time, a node can have certain recent data, and not having older data from nodes not yet synced. Cassandra will only be consistent if you stop write to all nodes and put them online. As soon the sync finished you have a consistent data.
In MongoDB documentation, here, it has been mentioned that in a replica set even with majority readConcern we would achieve eventual consistency. I am wondering how is this possible when we have majority in both reads and writes which leads to a quorum (R+W>N) in our distributed system? I expect a strong consistent system in this setting. This is the technique which Cassandra uses as well in order to achieve strong consistency.
Can someone clarify this for me please?
MongoDb is not regarded very well in terms of strong consistency. If you have a typical sharded and replicated setup to increase consistency will need to trade off some of the performance of the db. As you know you can execute write operations only on the master of the replica set. By default you can only read from it as well. This is possibly the strongest consistency you can get from MongoDb AFAIK as the other nodes are used only for replication, failover and availability reasons. And you could read from the secondary nodes only for operations where having the latest data is not crucial and for long-running operations, such as aggregation for example.
If you set up sharding you could offload a big portion of the read/write operations to different primary nodes. I think that when it comes to MongoDb that is all you could do in order to increasing consistency and performance in particular for larger data sets.
Two points I don’t understand about RDBMS being CA in CAP Theorem :
1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP?
2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes?
I’m a novice at DB architecture and CAP theorem so please bear with me.
It is very easy to misunderstand the CAP properties, hence I'm providing some illustrations to make it easier.
Consistency: A query Q will produce the same answer A regardless the node that handles the request. In order to guarantee full consistency we need to ensure that all nodes agree on the same value at all times. Not to be confused with eventual consistency in which the network moves towards having all data consistent but there are periods of time in which it is not.
Availability: If the distributed system receives query Q it will always produce an answer for that query. This should not be confused with "high-availability", this is not about having the capacity to process a higher troughput of queries, it is about not refusing to answer.
Partition Tolerance: The system continues to function despite the existence of a partition. This is not about having mechanisms to "fix" the partition, it is about tolerating the partition, i.e. continuing despite the partition.
Note that the following examples do not cover all possible scenarios. Consider the following caption:
An example for CP:
The system is partition tolerant because its nodes keep accepting requests despite the partition; it is consistent because the only nodes providing answers are those that maintain a connection to the master node that handles all the write requests; it is not available because the nodes in the other partition do not provide an answer to the queries they receive.
Examples for AP:
Either because (respectively) we have the slave nodes replying to requests regardless whether they able to reach master or because the slave nodes in the other partition elect a new master, or because we have a masterless cluster, availability is achieved because all questions are getting an answer - consistency is dropped because both partitions are replying while potentially yielding different states.
Examples for CA:
If we disconnect nodes when a partition occurs, we can ensure that we have at most one partition which ultimately means that the network is not partitioned anymore, or simply there is no service at all. This is the opposite of partition tolerance, because the system is avoiding the partition instead of functioning despite it. Consistency and availability holds in these partially or fully disconnected systems because all working nodes (if any) have the same state and all received queries (if any) will get an answer - shutdown nodes do not receive queries.
To answer the questions:
Under default configurations, databases such as Cassandra and MongoDB are partition tolerant because they do not shutdown nodes to cope with partitions, whereas RDBMS such as MySQL do.
Availability has very little to do with master/slave setup, e.g. Cassandra is masterless and very available because it doesn't really matter which node dies. As for availability in a master/slave setup, there is no reason to stop responding to all queries when master is dead, but you may need to suspend write operations while electing a new one.
A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can not achieve all three at the same time. Some databases actually make an effort to support all three but still prioritizes them in a certain way.
For example, MySQL can be CP and CA depending on the configurations. By default, it is CA because it follows a master slave paradigm which data is replicated to the slaves. Partition tolerance is sacrificed in the event that a set of the slaves loses the connection to the master and therefore decides to elect a new master creating two masters with their own set of slaves.
However, MySQL also has another configuration which is a clustered configuration. It prioritizes CP over availability eg. the cluster will shutdown if there are not enough live nodes to serve all the data.
There are probably more configurations for MySQL that makes it satisfy other CAP theorem combinations but overall, I just wanted say that it depends on what your system requires. Sometimes databases are better for one configuration vs another so its best to see what kinds of problems that may also occur in using a certain configuration.
As for implementing the CAP theorem, I would advise taking a further look into different databases and how they implement the priorities for the CAP theorem. There are just too many different ways of implementing them eg. generally, the master slave model is used for CA systems, the hash ring for AP systems, etc.
CAP theorem is problematic and it applies only to distributed database systems. When you have distributed databases then network partition and node crashes can happen. And when network partition happens you must have partition tolerance (the P of your CAP).
So to answer your question number 1) It’s either CP or AP. It can be configured as Will mentioned.
More about why partition tolerance is a must:
https://codahale.com/you-cant-sacrifice-partition-tolerance/
More about problems around CAP theorem:
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
I agree that RDBMS can have all the properties of CAP. I have started studying noSQL DBs and had prior experience with IBM DB2.
Here is how IBM DB2 satisfies all the 3 CAP properties
C : Consistency : Every relational database satisfies this due to the transactional nature of RDBMS.
A : Availability : Availability means that when a query is made for a data that exists, it should be returned. Again, a relational database is designed to do this easily.
P : Partition Tolerance : This is the most interesting one. From DB2 stand point, in the application that I was working on, we had 2 databases spread across different data centres. One was the primary and communicated with the secondary via heartbeats. Each of these primary and secondary databases, had 12 physical instances where data was distributed on the basis of some predefined logic. If the primary goes down, the secondary detects this and takes the place of primary. Since the primary and secondary were always maintained in sync, data remains consistent as well.
This is how I think that RDBMS satisfies all 3 properties of CAP Theorem.
I may be wrong, and open to discussion on this.
Everybody say that mongoDB is CP in CAP Theorem! But with using master-slave replication, It has high availability too (If a primary fails, the remaining members will automatically try to elect a new primary). My question is, In which situations (and how) It can have AP (with Eventual Consistency)?
Actually, there is a two-part answer:
Sharding level: There's only one authoritative shard per data segment (C), shards work independently (P), if a shard is not available its data isn't available as well (A)
Replica set level: There's only one authoritative master node (C), if required a new master will be selected (P), if there's no primary (during the voting phase which should only last a few seconds, but that's enough) you cannot access the data on that node. If you enable reading from secondaries (eventual consistency), you can read data from secondaries during the voting phase, but still not write new data. Thus it's a CP system.
In general, you don't lose the third characteristic entirely, but trade it for additional latency / overhead or for not having it for a short amount of time.
it is not quite possible to have C,A,P all together, that is the theorem, you cant have them all, you can take only two.
See :
Where does mongodb stand in the CAP theorem?