Apache Ignite Cluster with ZooKeeper discovery behavior in network segmentation situation - apache-zookeeper

We are developing architecture of the distributed system based on Apache Ignite.
The system has strict requirements for fault tolerance.
We have three data centers (DCs) for this: two of them are main DCs (DC1 and DC2) and one is reserve DC (DC3).
We have a fast Ethernet channel between main DCs. DC1 and DC2 are connected over 40 GbE.
Reserve DC3 is connected to the DC1 and DC2 over slow channels 1 GbE.
We plan to use ZooKeeper Discovery for Ignite Cluster and want to place ZooKeeper Cluster nodes to the three DCs: 1 node per each DC.
We plan to place Ignite Cluster nodes only in main DCs (to the DC1 and DC2).
DC1 and DC2 will have an equal number of Ignite Nodes.
Schema of architecture
What happens with Ignite Cluster when network segmentation occurs if 40GbE channel between main DCs DC1 and DC2 will be down?
For example, ZK3 node in DC3 is leader, ZK1 and ZK2 are followers, in this situation the leader node can communicate with both followers, followers lost connection with each other.
ZooKeeper Cluster keeps in the ensemble.
Ignite Cluster nodes from DC1 can communicate with ZK1 and ZK3 nodes and between each other in DC1.
Ignite Cluster nodes from DC2 can communicate with ZK2 and ZK3 nodes and between each other in DC2.
How the split-brain situation will be resolved in this network segmentation case or we get two independent Ignite clusters?
Documentation https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling tells:
Whenever a node discovers that it cannot connect to some of the other
nodes in the cluster, it initiates a communication failure resolve
process by publishing special requests to the ZooKeeper cluster. When
the process is started, all nodes try to connect to each other and
send the results of the connection attempts to the node that
coordinates the process (the coordinator node). Based on this
information, the coordinator node creates a connectivity graph that
represents the network situation in the cluster. Further actions
depend on the type of network segmentation.
Can the coordinator choose one of a half Ignite cluster as main in this case?

Regardless of the behaviour of ZooKeeper, in case if DC1 and DC2 lose connection between each other, the cluster won't be able to operate normally. ZooKeeper is responsible for DiscoverySpi, but not for CommunicationSpi that requires full connectivity between all nodes.
So, even if discovery SPI will manage to resolve the lost connectivity issue, cache operations won't be able to succeed anyway.
It's not recommended to include nodes that have slow connection with the rest of the cluster, since they will slow down all operations and make the cluster unstable. In this case it's better to set up two separate clusters and configure replication between them (for example, GridGain provides such capability: https://www.gridgain.com/products/software/enterprise-edition/data-center-replication)
You can also take a look at SegmentationResolver if you want to implement protection from lost connectivity between clusters.

You can look at the "ZooKeeper cluster segmentation" section of https://apacheignite.readme.io/docs/zookeeper-discovery#section-failures-and-split-brain-handling
You have 3 zknodes in total, and zookeeper requires more than half nodes which is 2 nodes at your case in a segmentation that still working to decide which segmentation should be shutdown. From your current architecture, you have three segmentations and with only one zknode in each. So no segmentation contains more than half nodes(2 nodes in your case) then the result I think should be shuting down all three zknodes your side then all ignite nodes also shutdown due to can't connecto to zknodes.

Related

Client cannot pubish because firewall issue with full replication factor

Setup: Three Node Kafka cluster running version 2.12-2.3.0. Replication factor is 3 and with 20 partition per topic.
Description:
All three nodes in Kafka cluster can communicate between themself without issue. An incorrect firewall is introduced with Kafka client which "block" client from communicating with one Kafka node. The client can no longer publish to any of the Kafka node. Two Kafka nodes are still network reachable from Kafka client. We understand this is a network split brain issue.
Question: Is there a way to configure Kafka so that kafka client can communicate with "surviving" Kafka nodes?
The client can no longer publish to any of the Kafka node
That shouldn't happen. The client should only be unable to communicate with leader partitions on that one node, and continue communicating with the leader partitions on the other, reachable nodes.
There are no changes you could make on the server-side if the client's host network/firewall is the issue.

Kafka setup strategy for replication?

I have two vm servers (say S1 and S2) and need to install kafka in cluster mode where there will be topic with only one partition and two replicas(one is leader in itself and other is follower ) for reliability.
Got high level idea from this cluster setup Want to confirm If below strategy is correct.
First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster
will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?
Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.
Create Topic on any node with partition 1 and replica as two.
zookeeper will select any broker on one node as leader and another as follower
Producer will connect to any broker and start publishing the message.
If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only
one node live now ?
Is above strategy correct ?
Useful resources
ISR
ISR vs replication factor
First set up zookeeper as cluster on both nodes for high
availability(HA). If I do setup zk on single node only and then that
node goes down, complete cluster will be down. Right ? Is it mandatory
to use zk in latest kafka version also ? Looks it is must for older
version Is Zookeeper a must for Kafka?
Answer: Yes. Zookeeper is still must until KIP-500 will be released. Zookeeper is responsible for electing controller, storing metadata about Kafka cluster and managing broker membership (link). Ideally the number of Zookeeper nodes should be at least 3. By this way you can tolerate one node failure. (2 healthy Zookeeper nodes (majority in cluster) are still capable of selecting a controller)) You should also consider to set up Zookeeper cluster on different machines other than the machines that Kafka is installed. Thus the failure of a server won't lead to loss of both Zookeeper and Kafka nodes.
Start the kafka broker on both nodes . It can be on same port as it is
hosted on different nodes.
Answer: You should first start Zookeeper cluster, then Kafka cluster. Same ports on different nodes are appropriate.
Create Topic on any node with partition 1 and replica as two.
Answer: Partitions are used for horizontal scalability. If you don't need this, one partition is okay. By having replication factor 2, one of the nodes will be leader and one of the nodes will be follower at any time. But it is not enough for avoiding data loss completely as well as providing HA. You should have at least 3 Kafka brokers, 3 replication factor of topics, min.insync.replicas=2 as broker config and acks=all as producer config in the ideal configuration for avoiding data loss by not compromising HA. (you can check this for more information)
zookeeper will select any broker on one node as leader and another as
follower
Answer: Controller broker is responsible for maintaining the leader/follower relationship for all the partitions. One broker will be partition leader and another one will be follower. You can check partition leaders/followers with this command.
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Producer will connect to any broker and start publishing the message.
Answer: Yes. Setting only one broker as bootstrap.servers is enough to connect to Kafka cluster. But for redundancy you should provide more than one broker in bootstrap.servers.
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
If leader goes down, zookeeper will select another node as leader
automatically . Not sure how replica of 2 will be maintained now as
there is only one node live now ?
Answer: If Controller broker goes down, Zookeeper will select another broker as new Controller. If broker which is leader of your partition goes down, one of the in-sync-replicas will be the new leader. (Controller broker is responsible for this) But of course, if you have just two brokers then replication won't be possible. That's why you should have at least 3 brokers in your Kafka cluster.
Yes - ZooKeeper is still needed on Kafka 2.4, but you can read about KIP-500 which plans to remove the dependency on ZooKeeper in the near future and start using the Raft algorithm in order to create the quorum.
As you already understood, if you will install ZK on a single node it will work in a standalone mode and you won't have any resiliency. The classic ZK ensemble consist 3 nodes and it allows you to lose 1 ZK node.
After pointing your Kafka brokers to the right ZK cluster you can start your brokers and the cluster will be up and running.
In your example, I would suggest you to create another node in order to gain better resiliency and met the replication factor that you wanted, while still be able to lose one node without losing data.
Bear in mind that using single partition means that you are bounded to single consumer per Consumer Group. The rest of the consumers will be idle.
I suggest you to read this blog about Kafka Best Practices and how to choose the number of topics/partitions in a Kafka cluster.

Kafka cluster with single broker

I'm looking to start using Kafka for a system and I'm trying to cover all use cases.
Normally it would be run as a cluster of brokers running on virtual servers (replication factor 3-5). but some customers though don't care about resilience and a broker failure needing a manual reboot of the whole system is fine with them, they just care about hardware costs.
So my question is, are there any issues with using Kafka as a single broker system for small installations with low throughput?
Cheers
It's absolutely OK to use a single Kafka broker. Note, however, that with a single broker you won't have a highly available service meaning that when the broker fails you will have a downtime.
Your replication-factor will be limited to 1 and therefore all of the partitions of a topic will be stored on the same node.
For a proof-of-concept or non-critical dev work, a single node cluster works just fine. However having a cluster has multiple benefits. It's okay to go with a single node cluster if the following are not important/relevant for you.
scalability [spreads load across multiple brokers to maintain certain throughput]
fail-over [guards against data loss in case one/more node(s) go down]
availability [system remains reachable and functioning even if one/more node(s) go down]

How to scale Zookeeper with kafka

I am working on scaling the kafka cluster in Prod. Confluent provides easy way to add kafka brokers. However, how do I know how to scale zookeeper along with Kafka. What should be the ratio? Right now we have 5 zookeeper nodes for 5 kafka brokers. If I have 10 kafka brokers how many zookeeper nodes should be there?
Zookeeper works as a coordination service for Apache Kafka which stores metadata of kafka cluster. Zookeeper cluster is called ensemble.
Number of servers in a zookeeper ensemble are an odd number(3,5 etc).These numbers represents, how much your cluster is fault tolerant.A three node ensemble ,you can run with one node missing.
With five node ensemble,you can run with two nodes missing and your cluster will be available.
You can add as many zookeeper servers based on how much you want system to be functional inspire of failures, however a ZooKeeper cluter of more than 7 nodes is not recommended for issues with overhead of latency and over-communication between those nodes.

Running zookeeper on a cluster of 2 nodes

I am currently working on trying to use zookeeper in a two node cluster. I have my own cluster formation algorithm running on the nodes based on configuration. We only need Zookeeper's distributed DB functionality.
Is it possible to use Zookeeper in a two node cluster ? Do you know of any solutions where this has been done ?
Can we still retain the zookeepers DB functionality without forming a quorum ?
Note: Fault tolerance is not the main concern in this project. If one of the nodes go down we have enough code logic to run without the zookeeper service. We use the zookeeper to share data when both the nodes are alive.
Would greatly appreciate any help.
Zookeeper is a coordination system which is basically used to coordinate among nodes. When writes are occurred to such a distributed system, in ordered to coordinate and agree upon values which are being stored, all the writes are gone through master (aka leader). Reads can occur through any node. Zookeeper requires a master/leader to be elected per a quorum in order to serve write requests consistently. Zookeeper make use of the ZAB protocol as the consensus algorithm.
In order to elect a leader, a quorum should ideally have an odd number of nodes (Otherwise, a node will not be able to win majority and become the leader). In your case, with two nodes, zookeeper will not possibly be able to elect a leader for a long time since both nodes will be candidates and wait for the other node to vote for it. Even though they elect a leader, your ensemble will not work properly in network patitioning situations.
As I said, zookeeper is not a distributed storage. If you need to use it in a distributed manner (more than one node), it need to form a quorum.
As I see, what you need is a distributed database. Not a distributed coordination system.