What should i do to avoid single point of failure in cassandra? - nosql

I am using phpcassa library to get and set data into cassandra which i have installed on 2 servers.... I am making connection with my seed node using CassandraConn::add_node('..*.**', 9160); so while insertion automatically gets replicate on other node in cluster... but if my seed node dies (if i shut down the cassandra process) then my insertion will not work and i am unable to get data from the other node too:(, so am i doing the right thing... because in this way their is no use of cluster then.. as ideally if my one node dies in the other node should respond me.. any help will be appreciated?

Connect with RRDNS instead of a single host. http://en.wikipedia.org/wiki/Round-robin_DNS
(You can also use a load balancer but that is usually overkill here.)

Most Cassandra clients will let you directly specify multiple server addresses, and will try them in turn if one fails.
I haven't used phpcassa (only pycassa) but the API docs at http://thobbs.github.com/phpcassa/api/index.html seem to suggeest that you can specify multiple servers.
Round-robin is another alternative as per the previous answer.

Related

Proper Fault-tolerant/HA setup for KeyDB/Redis in Kubernetes

Sorry for a long post, but I hope it would relieve us from some of clarifying questions. I also added some diagrams to split the wall of text, hope you'll like those.
We are in the process of moving our current solution to local Kubernetes infrastructure, and the current thing we investigate is the proper way to setup a KV-store (we've been using Redis for this) in the K8s.
One of the main use-cases for the store is providing processes with exclusive ownership for resources via a simple version of a Distibuted lock pattern, as in (discouraged) pattern here. (More on why we are not using Redlock below).
And once again, we are looking for a way to set it in the K8s, so that details of HA setup are opaque to clients. Ideally, the setup would look like this:
So what is the proper way to setup Redis for this? Here are the options that we considered:
First of all, we discarded Redis cluster, because we don't need sharding of keyspace. Our keyspace is rather small.
Next, we discarded Redis Sentinel setup, because with sentinels clients are expected to be able to connect to chosen Redis node, so we would have to expose all nodes. And also will have to provide some identity for each node (like distinct ports, etc) which contradicts with idea of a K8s Service. And even worse, we'll have to check that all (heterogeneous) clients do support Sentinel protocol and properly implement all that fiddling.
Somewhere around here we got out of options for the first time. We thought about using regular Redis replication, but without Sentinel it's unclear how to set things up for fault-tolerance in case of master failure — there seem to be no auto-promotion for replicas, and no (easy) way to tell K8s that master has been changed — except maybe for inventing a custom K8s operator, but we are not that desperate (yet).
So, here we came to idea that Redis may be not very cloud-friendly, and started looking for alternatives. And so we found KeyDB, which has promising additional modes. That's besides impressing performance boost while having 100% compatible API — very impressive!
So here are the options that we considered with KeyDB:
Active replication with just two nodes. This would look like this:
This setup looks very promising at first — simple, clear, and even official KeyDB docs recommend this as a preferred HA setup, superior to Sentinel setup.
But there's a caveat. While the docs advocate this setup to be tolerant to split-brains (because the nodes would catch up one to another after connectivity is re-established), this would ruin our use-case, because two clients would be able to lock same resource id:
And there's no way to tell K8s that one node is OK, and another is unhealthy, because both nodes have lost their replicas.
Well, it's clear that it's impossible to make an even-node setup to be split-brain-tolerant, so next thing we considered was KeyDB 3-node multi-master, which allows each node to be an (active) replica of multiple masters:
Ok, things got more complicated, but it seems that the setup is brain-split proof:
Note that we had to add more stuff here:
health check — to consider a node that lost all its replicas as unhealthy, so K8s load balancer would not route new clients to this node
WAIT 1 command for SET/EXPIRE — to ensure that we are writing to a healthy split (preventing case when client connects to unhealthy node before load balancer learns it's ill).
And this is when a sudden thought struck: what's about consistency?? Both these setups with multiple writable nodes provide no guard against two clients both locking same key on different nodes!
Redis and KeyDB both have asynchronous replication, so there seem to be no warranty that if an (exclusive) SET succeeds as a command, it would not get overwritten by another SET with same key issued on another master a split-second later.
Adding WAITs does not help here, because it only covers spreading information from master to replicas, and seem to have no affect on these overlapping waves of overwrites spreading from multiple masters.
Okay now, this is actually the Distributed Lock problem, and both Redis and KeyDB provide the same answer — use the Redlock algorithm. But it seem to be quite too complex:
It requires client to communicate with multiple nodes explicitly (and we'd like to not do that)
These nodes are to be independent. Which is rather bad, because we are using Redis/KeyDB not only for this locking case, and we'd still like to have a reasonably fault-tolerant setup, not 5 separate nodes.
So, what options do we have? Both Redlock explanations do start from a single-node version, which is OK, if the node will never die and is always available. And while it's surely not the case, but we are willing to accept the problems that are explained in the section "Why failover-based implementations are not enough" — because we believe failovers would be quite rare, and we think that we fall under this clause:
Sometimes it is perfectly fine that under special circumstances, like during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution.
So, having said all of this, let me finally get to the question: how do I setup a fault-tolerant "replication-based solution" of KeyDB to work in Kubernetes, and having a single write node most of the time?
If it's a regular 'single master, multiple replicas' setup (without 'auto'), what mechanism would assure promoting replica in case of master failure, and what mechanism would tell Kubernetes that master node has changed? And how? By re-assigning labels on pods?
Also, what would restore a previously dead master node in such a way that it would not become a master again, but a replica of a substitute master?
Do we need some K8s operator for this? (Those that I found were not smart enough to do this).
Or if it's multi-master active replication from KeyDB (like in my last picture above), I'd still need to use something instead of LoadBalanced K8s Service, to route all clients to a single node at time, and then again — to use some mechanism to switch this 'actual master' role in case of failure.
And this is where I'd like to ask for your help!
I've found frustratingly little info on the topic. And it does not seem that many people have such problems that we face. What are we doing wrong? How do you cope with Redis in the cloud?

Increased latency when using #Transactional(readOnly=true)

I am working with a backend service (Spring Boot 2.2.13.RELEASE + Hikari + JOOQ) that uses an AWS Aurora PostgreSQL DB cluster configured with a Writer (primary) node and a Reader (Read Replica) node. The reader node has just been sitting there idle/warm waiting to be promoted to primary in case of fail-over.
Recently, we've decided to start serving queries exclusively from the Reader Node for some of our GET endpoints. To achieve this we used a "flavor" of RoutingDataSource so that whenever a service is annotated with #Transactional(readOnly=true) the queries are performed against the reader datasource.
Until here everything was going smooth. However after applying this solution I've noticed a latency increase up to 3x when compared with the primary datasource.
After drilling down on this I found out that each transaction was doing a couple of extra round trips to the db to SET SESSION CHARACTERISTICS:
SET SESSION CHARACTERISTICS READ ONLY
ACTUAL QUERY/QUERIES
SET SESSION CHARACTERISTICS READ WRITE
To improve this I tried to play with the readOnlyMode setting that was introduced in pg-jdbc pg-jdbc 42.2.10. This setting allows to control the behavior when a connection is set to read only (readOnly=true).
https://jdbc.postgresql.org/documentation/head/connect.html
In my first attempt I used readOnly=true and readOnlyMode=always. Even though I stooped seeing the SET SESSION CHARACTERISTICS statements, the latency remained unchanged. Finally I tried to use readOnly=false and readOnlyMode=ignore. This last option caused the latency to decrease however it is still worse than it was before.
Has someone else experience with this kind of setup? What is the optimal configuration?
I don't have a need to flag the transaction as read only (besides to tell the routing datasource to use the read replica instead) so I would like to figure out if it's possible to do anything else so that the latency remains the same between the Writer an Reader Nodes.
Note: At current moment the reader node is just service 1% of all the traffic (+- 20req/s).

MongoDB read preferences Secondary

I am just starting to use MongoDB while testing it with YCSB and I have a couple of questions about read preferences and its implementation.
I have setup 1 Primary and 2 Secondary nodes, and set reading preferences on YCSB java client like this mongo.setReadPreference(ReadPreference.secondary());
1. Why if I point YCSB to connect to primary node it still can perform read operations without generating error message? Also I checked the logs and I can see that Primary is the node that served these requests.
2 How do clients know about Secondary nodes in a production environment? Where do you connect clients by default? Do all the clients go to Primary, retrieve list of Secondaries and then reconnect to secondaries to perform reads ?
3 By browsing source code I have found that logic of selecting appropriate replica based on preferences is done in replica_set_monitor.cpp Although it is not yet clear to me where this code is executed, is it on Primary, Secondary or client?
Thank you
When your application connects only to the primary, it doesn't learn about any secondaries. ReadPreference.secondary() is just a preference, not a mandate. When the application doesn't know that a secondary exists, it will read from the primary.
To make your application aware of the secondaries, you need to use the class DBClientReplicaSet instead of DBClientConnection which takes an std::vector of hosts as a constructor argument. This array should include all members of the set.
When you would prefer to have the application unaware of the replica-set members, you could set up a sharded cluster (which might consist of only a single shard) and connect to the router. The mongos process will then handle the replica-set abstraction.
When an application connects to any active replica member it will issue a internal type of rs.status() which is infact a isMaster command (http://docs.mongodb.org/meta-driver/latest/legacy/connect-driver-to-replica-set/) and caches the response of that for a specific time until deemed fit to refresh that information, in fact in the c++ driver even tells you the class that will hold the cache: http://api.mongodb.org/cxx/current/classmongo_1_1_replica_set_monitor.html
Holds state about a replica set and provides a means to refresh the local view.
There are number of ways that the application can connect to a set to understand, the most common way is by providing a seed list into the connection string within your application code to the driver, that way it can connect to any member and ask: "What is there here?"

Zookeeper Newbie - Which nodes do I read and write? Should I load balance?

I am a zookeeper newbie. I have three nodes in three separate data centers. I will need to read and write data from the python pykeeper API? So...
1) which node to I read and write from? Does it matter? Round robin? Write to master, read from slaves?
2) How do I know wich server was elected as master? Do I care? That I have yet to figure out.
3) For now I am using the following to connect to zookeeper.
import zc.zk
from random import choice
zk_servers = ['111.111.111.111:2181','111.111.111.222:2181','111.111.111.333:2181']
zk = zc.zk.ZooKeeper(choice(zk_servers))
This begs the question, what if a zk node fails? Should I place nodes behind HA proxy to load-balance the requests?
Any advice for using best practice for reading and writing to zk nodes is mush appreciated.
Thanks
The general model is that you supply your clients with the list of server nodes and then connect to the cluster as a whole. ZooKeeper shuffles the list of server addresses and then connects to one. You don't pick various servers to do individual tasks...part of the point of zookeeper is that it scales horizontally by adding more nodes...each of which responds to reads and to writes based on what data is being requested and where the cluster has put it.

reliability: Master/slave pattern is doomed?

More and more of the noSQL databases that are in the spotlight uses the master/slave pattern to provide "availability", but what it does (at least from my perspective) is creating the weak link in a chain that will break anytime. - Master goes down, slaves stops to function.
It's a great way to handle big amounts of data and to even out reads/writes, but seen in an availability perspective? Not so much...
I understand from some noSQL's that the slaves can be changed to be a master easily, but doing this would be a headache to handle in most applications. Right?
So how do you people take care of this sort of stuff? How does master/slave-databases work in the real world?
This is a fairly generalized question; can you specify what data stores you're specifically talking about?
I've been working with MongoDB, and it handles this very gracefully; every member in a "replica set" (basically, a master-slave cluster) is eligible to become the master. On connecting to a replica set, the set will tell the connecting client about every member in the set. If the master in the set goes offline, the slaves will automatically elect a new master, and the client (since it has a list of all nodes in the set) will try new nodes until it connects; the node it connects to will inform the client about the new master, and the client switches its connection over. This allows for fully transparent master/slave failover without any changes in your application.
This is obviously fine for single connections, but what about restarts? The MongoDB driver handles this as well; it can accept a list of nodes to try connecting to, and will try them in serial until it finds one to connect to. Once it connects, it will ask the node who its master is, and forward the connection there.
The net result is that if you have a replica set established, you can effectively just not worry that any single node exploding will take you offline.