How to implement client authentication solution with NoSQL (Cassandra)? - mongodb

I am currently thinking of how to implement an authentication for a web application with a NoSQL solution. The problem I encounter hereby is that in most of the NoSQL solutions (e.g. Cassandra, MongoDB) have probably delayed writes. For example we write on node A but it is not guaranteed that the write is appearing on node B at the same time. This is logical with the approaches behind the NoSQL solutions.
Now one idea would be that you do no secondary reads (so everything goes over a master). This would probably work in MongoDB (where you actually have a master) but not in Cassandra (where all nodes are equal). But our application runs at several independent points all over the world, so we need multi master capability.
At the moment I am not aware of a solution with Cassandra where I could update data and be sure that subsequent reads (to all of the nodes) do have the change. So how could one build an authentication on top of those NoSQL solutions where the authentication request (read) could appear on several nodes in parallel?
Thanks for your help!

With respects to Apache Cassandra:
http://wiki.apache.org/cassandra/API#ConsistencyLevel
The ConsistencyLevel is an enum that controls both read and write behavior based on in your schema definition. The different consistency levels have different meanings, depending on if you're doing a write or read operation. Note that if W + R > ReplicationFactor, where W is the number of nodes to block for on write, and R the number to block for on reads, you will have strongly consistent behavior; that is, readers will always see the most recent write. Of these, the most interesting is to do QUORUM reads and writes, which gives you consistency while still allowing availability in the face of node failures up to half of ReplicationFactor. Of course if latency is more important than consistency then you can use lower values for either or both.
This is managed on the application side. To your question specifically, it comes down to how you design your Cassandra implementation, replication factor across the Cassandra nodes and how your application behaves on read/writes.
Write
ANY: Ensure that the write has been written to at least 1 node, including HintedHandoff recipients.
ONE: Ensure that the write has been written to at least 1 replica's commit log and memory table before responding to the client.
QUORUM: Ensure that the write has been written to N / 2 + 1 replicas before responding to the client.
LOCAL_QUORUM: Ensure that the write has been written to / 2 + 1 nodes, within the local datacenter (requires NetworkTopologyStrategy)
EACH_QUORUM: Ensure that the write has been written to / 2 + 1 nodes in each datacenter (requires NetworkTopologyStrategy)
ALL: Ensure that the write is written to all N replicas before responding to the client. Any unresponsive replicas will fail the operation.
Read
ANY: Not supported. You probably want ONE instead.
ONE: Will return the record returned by the first replica to respond. A consistency check is always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is called ReadRepair)
QUORUM: Will query all replicas and return the record with the most recent timestamp once it has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.
LOCAL_QUORUM: Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have replied.
EACH_QUORUM: Returns the record with the most recent timestamp once a majority of replicas within each datacenter have replied.
ALL: Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any unresponsive replicas will fail the operation.

Related

How to ensure consistent reading in distributed system?

In a distributed system, if only half of the nodes are successfully written, the subsequent nodes that read the unwritten data will be inconsistent. How to avoid this situation?
client write --> Node1 v
--> Node2 v
client read --> Node3 x(The latest data was not read)
My plan:
Compare the data version with other nodes when reading data
If the current node version is found to be lower, it will be routed to other nodes to read data.
I am going to ignore tags [mongo and elastic] :)
What you are planning to do is called Dynamo style replication. That system is eventually consistent by design. (I read a while ago that it could get strongly consistent with some effort, but I don't remember if that paper was correct.)
Back to dynamo and quorum: with three nodes you want to have at least 2 nodes to save writes to assume the write has succeeded. Important point is that you need two nodes to report back to customer the success, but the data is still should to be sent to three nodes.
Let's assume that data is written to two nodes, third failed, but camed back online later. To read the data, you have to read it from any two nodes as well. You will sent read requests to all three, but only two is needed to report back to the customer. This will give you quorum: 2+2>3. This guarantees that there is an intersection in between writes and reads.
This will work ok when the network is good and nodes are healthy. But you will run into major challengers, lost updates and conflict resolution to name a few. But in either way, the system will not be strongly consistent based on design itself.
Let me describe another interesting issue to illustrate weak consistency:
node 1 gets the write
the rest of process fails; node 1 has new data, but node 2 and 3 don't
now, when you read, under the quorum condition, you may or may not see the value from node 1 - since you are picking any two nodes for a read, node 1 may not be in that set.
Long story short, dynamo is not good for strong consistency, and we get to the Raft part of the solution.
Raft will get you what you need. A consistent system. There is a catch to watch for. Most examples are focused on writing - raft maintains a log of messages and consensus is used to agree on the order (and content) of these messages.
But when you do a read, you can't just go to a node, or any two nodes, or three and read the value. You will have to do read via Raft as well, by attaching a read operation to raft's log. This is called linearizable read.
I'll stop here, as this is pretty complicated topic (but not an impossible one to learn).
Hope this gave you enough ideas to explore.
I saw both mongodb and elasticsearch is being tagged, I don't know which case you are thinking, but the two database is very different.
For mongo, replicas are not by default used to increase reading speed, see https://docs.mongodb.com/manual/core/read-preference, the default reading preferences will only look at primary and excludes all replicas. The writing of Mongo is also to the primary first and the replication will happen asynchronously possibly after the write to primary finishes, see https://docs.mongodb.com/manual/core/replica-set-members/. Because of that, if you do a force read to the secondary, you are not guaranteed to have the newest data.
For elasticsearch, elasticsearch naturally does not guarantee you always read the most recent data, see https://www.elastic.co/guide/en/elasticsearch/reference/current/near-real-time.html, so in either way even if there is only one node you may get data that are out of date.

Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

Two points I don’t understand about RDBMS being CA in CAP Theorem :
1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP?
2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes?
I’m a novice at DB architecture and CAP theorem so please bear with me.
It is very easy to misunderstand the CAP properties, hence I'm providing some illustrations to make it easier.
Consistency: A query Q will produce the same answer A regardless the node that handles the request. In order to guarantee full consistency we need to ensure that all nodes agree on the same value at all times. Not to be confused with eventual consistency in which the network moves towards having all data consistent but there are periods of time in which it is not.
Availability: If the distributed system receives query Q it will always produce an answer for that query. This should not be confused with "high-availability", this is not about having the capacity to process a higher troughput of queries, it is about not refusing to answer.
Partition Tolerance: The system continues to function despite the existence of a partition. This is not about having mechanisms to "fix" the partition, it is about tolerating the partition, i.e. continuing despite the partition.
Note that the following examples do not cover all possible scenarios. Consider the following caption:
An example for CP:
The system is partition tolerant because its nodes keep accepting requests despite the partition; it is consistent because the only nodes providing answers are those that maintain a connection to the master node that handles all the write requests; it is not available because the nodes in the other partition do not provide an answer to the queries they receive.
Examples for AP:
Either because (respectively) we have the slave nodes replying to requests regardless whether they able to reach master or because the slave nodes in the other partition elect a new master, or because we have a masterless cluster, availability is achieved because all questions are getting an answer - consistency is dropped because both partitions are replying while potentially yielding different states.
Examples for CA:
If we disconnect nodes when a partition occurs, we can ensure that we have at most one partition which ultimately means that the network is not partitioned anymore, or simply there is no service at all. This is the opposite of partition tolerance, because the system is avoiding the partition instead of functioning despite it. Consistency and availability holds in these partially or fully disconnected systems because all working nodes (if any) have the same state and all received queries (if any) will get an answer - shutdown nodes do not receive queries.
To answer the questions:
Under default configurations, databases such as Cassandra and MongoDB are partition tolerant because they do not shutdown nodes to cope with partitions, whereas RDBMS such as MySQL do.
Availability has very little to do with master/slave setup, e.g. Cassandra is masterless and very available because it doesn't really matter which node dies. As for availability in a master/slave setup, there is no reason to stop responding to all queries when master is dead, but you may need to suspend write operations while electing a new one.
A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can not achieve all three at the same time. Some databases actually make an effort to support all three but still prioritizes them in a certain way.
For example, MySQL can be CP and CA depending on the configurations. By default, it is CA because it follows a master slave paradigm which data is replicated to the slaves. Partition tolerance is sacrificed in the event that a set of the slaves loses the connection to the master and therefore decides to elect a new master creating two masters with their own set of slaves.
However, MySQL also has another configuration which is a clustered configuration. It prioritizes CP over availability eg. the cluster will shutdown if there are not enough live nodes to serve all the data.
There are probably more configurations for MySQL that makes it satisfy other CAP theorem combinations but overall, I just wanted say that it depends on what your system requires. Sometimes databases are better for one configuration vs another so its best to see what kinds of problems that may also occur in using a certain configuration.
As for implementing the CAP theorem, I would advise taking a further look into different databases and how they implement the priorities for the CAP theorem. There are just too many different ways of implementing them eg. generally, the master slave model is used for CA systems, the hash ring for AP systems, etc.
CAP theorem is problematic and it applies only to distributed database systems. When you have distributed databases then network partition and node crashes can happen. And when network partition happens you must have partition tolerance (the P of your CAP).
So to answer your question number 1) It’s either CP or AP. It can be configured as Will mentioned.
More about why partition tolerance is a must:
https://codahale.com/you-cant-sacrifice-partition-tolerance/
More about problems around CAP theorem:
https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html
I agree that RDBMS can have all the properties of CAP. I have started studying noSQL DBs and had prior experience with IBM DB2.
Here is how IBM DB2 satisfies all the 3 CAP properties
C : Consistency : Every relational database satisfies this due to the transactional nature of RDBMS.
A : Availability : Availability means that when a query is made for a data that exists, it should be returned. Again, a relational database is designed to do this easily.
P : Partition Tolerance : This is the most interesting one. From DB2 stand point, in the application that I was working on, we had 2 databases spread across different data centres. One was the primary and communicated with the secondary via heartbeats. Each of these primary and secondary databases, had 12 physical instances where data was distributed on the basis of some predefined logic. If the primary goes down, the secondary detects this and takes the place of primary. Since the primary and secondary were always maintained in sync, data remains consistent as well.
This is how I think that RDBMS satisfies all 3 properties of CAP Theorem.
I may be wrong, and open to discussion on this.

Can Triggers be used in Cassandra for production for a multi datacenter environment?

I have a multi datacenter(DC1, DC2) environment having 3 nodes in each datacenter with RF=3 per datacenter.
Wanted to know if triggers can be used in production in a multi-datacenter environment. If so, how can this be achieved?
Case A: If I start inserting the data to DC1, it would have 3 replicas with in DC1 and is responsible of replicating the data to other data center DC2. Every time an insert into DC2 takes place, I would like to have an trigger event to occur and notify about the latest inserted value in the application. Is it possible?
Case B: If not point 2, is it good to insert the data simultaneously on to two datacenters DC1, DC2 (pointing to a single table) and avoid triggers concept?
Will it have any impact with the network traffic? Based on the latest timestamp, the table would have the last insert to the table which serves the purpose when queried from either of the regions.
Consistency level as LOCAL_QUORUM for Read
Consistency level as ONE for write
dse 4.8.2
With these Consistency levels, good consistency can be achieved lowering the latency for write operation across the datacenters.
Usecase:
We have an application (2 domains) for two different regions(DC1 &
DC2). Users of DC1 region uses domain 1 to access the application and
users of DC2 region uses domain 2 for the same. The data is ingested
to DC1 for the same region and when this replicates in its DC, the
coordinator of DC1 would replicate the data in other DC (DC2). The
moment Dc2 receives the data from DC1, we want to let the application
know about the latest information (Polling_ available using some
trigger event mechanism. Just wanted to know if this can be
implemented with cassandra triggers.
Can someone give the feedback on Case A and Case B? and which would be efficient in production.
Thanks
In either case stated above I am not sure why you want to use a trigger to notify your application that a value was inserted. In the scenario as I understand it your application already knows the newest value. Once the write has been successful you can notify your application with the newest value.
In both cases A and B you are working against some of the basic principals of how Cassandra functions. At an application level you should now need to worry about ensuring replication or eventual consistency of your data across multiple nodes and data centers. That is a large part of what Cassandra brings to the table.
In both Case A and B you are going to get multiple inserts of the same data for each write in each node it is replicated to in both data centers. As you write to DC1 it will also be written to DC2. If you then write to DC2 it will be written back to DC1. This will end with a large number of rows containing the same data and will increase disk requirements and compaction frequency. This will also increase network traffic as the two DC's talk back and forth to gain eventual consistency.
From what I can see here I also have to ask why you are doing an RF=3 on a 3 node cluster. This means that each node in each data center will have all the data essentially making each server a complete replica of the others. This seems like it may be overkill (depending on the data of course) as you are not going to get a lot of the scalability benefits that Cassandra offers.
Cassandra will handle the syncing of data between the data centers and across nodes so your application does not need to worry about this.
One other quick note - Currently your writes are using a CL=ONE. This means that you may end up with cross-DC latency on a write request. If you change this to LOCAL_ONE then you limit your CL query until one of the nodes in the local DC has written the value instead of possibly a node in the other DC. Cassandra will still handle the replication and syncing of the data.
Generally, multiple data center concept is used for workload separation(say different DCs for real-time query,analytic and search). Cassandra by itself takes care of replicating the data across multiple DCs.
So, coming to your question Case B doesn't seems a right option because:
Cassandra automatically replicates data across multiple DCs link
Case A is feasible.alerts/notifications using triggers
Hope, it will be helpful.

Cassandra write performance regarding consistency level

Here is quote from cassandra documentation about writes (LINK)
If all replicas for the affected row key are down, it is still
possible for a write to succeed if using a write consistency level of
ANY. Under this scenario, the hint and written data are stored on the
coordinator node, but will not be available to reads until the hint
gets written to the actual replicas that own the row. The ANY
consistency level provides absolute write availability at the cost of
consistency, as there is no guarantee as to when written data will be
available to reads (depending how long the replicas are down). Using
the ANY consistency level can also potentially increase load on the
cluster, as coordinator nodes must temporarily store extra rows
whenever a replica is not available to accept a write.
My question is: is writing to cassandra slower if we use consistency level of ANY than writes when we use consistency level of ONE ?
Hints are generated when appropriate replica nodes are inaccessible at write time. Write requests are then serialized locally on the request coordinator node. Once a valid replica node becomes available and the coordinator node learns of it, the request is passed along to the newly available replica.
With that background, there are two write-time scenarios to consider:
1) At least one replica is up for the affected row. In this case, there is no difference between consistency levels of ANY and ONE. The write just goes to the replica(s), and hinted handoff is not triggered. No performance difference.
2) All replicas are down for the affected row. This is where hints enter the picture. With consistency ANY there is extra work to be done on the coordinator node at request time, as the hint is written to a local system table for later replay. With consistency ONE, you would simply get a refused write in the same circumstances. ONE will expose write failures to the client, and will be faster than ANY.
Essentially, the tradeoff is refusing requests vs. pushing work onto remaining nodes, but only when nodes responsible for storing that row are down.

Do NoSQL datacenter aware features enable fast reads and writes when nodes are distributed across high-latency connections?

We have a data system in which writes and reads can be made in a couple of geographic locations which have high network latency between them (crossing a few continents, but not this slow). We can live with 'last write wins' conflict resolution, especially since edits can't be meaningfully merged.
I'd ideally like to use a distributed system that allows fast, local reads and writes, and copes with the replication and write propagation over the slow connection in the background. Do the datacenter-aware features in e.g. Voldemort or Cassandra deliver this?
It's either this, or we roll our own, probably based on collecting writes using something like
rsync and sorting out the conflict resolution ourselves.
You should be able to get the behavior you're looking for using Voldemort. (I can't speak to Cassandra, but imagine that it's similarly possible using it.)
The key settings in the configuration will be:
replication-factor — This is the total number of times the data is stored. Each put or delete operation must eventually hit this many nodes. A replication factor of n means it can be possible to tolerate up to n - 1 node failures without data loss.
required-reads — The least number of reads that can succeed without throwing an exception.
required-writes — The least number of writes that can succeed without the client getting back an exception.
So for your situation, the replication would be set to whatever number made sense for your redundancy requirements, while both required-reads and required-writes would be set to 1. Reads and writes would return quickly, with a concomitant risk of stale or lost data, and the data would only be replicated to the other nodes afterwards.
I have no experience with Voldemort, so I can only comment on Cassandra.
You can deploy Cassandra to multiple datacenters with an inter-DC latency higher than a few milliseconds (see http://spyced.blogspot.com/2010/04/cassandra-fact-vs-fiction.html).
To ensure fast local reads, you can configure the cluster to replicate your data to a certain number of nodes in each datacenter (see "Network Topology Strategy"). For example, you specify that there should always be two replica in each data center. So even when you lose a node in a data center, you will still be able to read your data locally.
Write requests can be sent to any node in a Cassandra cluster. So for fast writes, your clients would always speak to a local node. The node receiving the request (the "coordinator") will replicate the data to other nodes (in other datacenters) in the background. If nodes are down, the write request will still succeed and the coordinator will replicate the data to the failed nodes at a later time ("hinted handoff").
Conflict resolution is based on a client-supplied timestamp.
If you need more than eventual consistency, Cassandra offers several consistency options (including datacenter-aware options).