I need to use Kafka Connect to monitor changes to a MongoDB cluster with one primary and 2 replicas.
I see there is the official MongoDB connector, and I want to understand what would be the connector's behaviour, in case the primary replica would fail. Will it automatically read from one of the secondary replicas which will become the new primary? I couldn't find information for this in the official docs.
I've seen this post related to the tasks.max configuration, which I thought might be related to this scenario, but the answer implies that it always defaults to 1.
I've also looked at Debezium's implementation of the connector, which seems to support this scenario automatically:
The MongoDB connector is also quite tolerant of changes in membership
and leadership of the replica sets, of additions or removals of shards
within a sharded cluster, and network problems that might cause
communication failures. The connector always uses the replica set’s
primary node to stream changes, so when the replica set undergoes an
election and a different node becomes primary, the connector will
immediately stop streaming changes, connect to the new primary, and
start streaming changes using the new primary node.
Also, Debezium's version of the tasks.max configuration property states that:
The maximum number of tasks that should be created for this connector.
The MongoDB connector will attempt to use a separate task for each
replica set, [...] so that the work for each replica set can be
distributed by Kafka Connect.
The question is - can I get the same default behaviour with the default connector - as advertised for the Debezium one? Because of external reasons, I can't use the Debezium one for now.
In a PSS deployment:
If one node is not available, the other two nodes can elect a primary
If two nodes are not available, there can be no primary
The quote you referenced suggests the connector may be using primary read preference, which means as long as two nodes are up it will be working and if only one node is up it will not retrieve any data.
Therefore, bring down two of the three nodes and observe whether you are able to query.
Related
How to find the current controller ID, preferably using command-line, on a Kafka cluster which is using Kraft.
Kafka Version: 3.3
Probably you mean Active Controller ID.
Kafka 3.3 comes with the kafka-metadata-quorum tool.
> bin/kafka-metadata-quorum.sh --bootstrap-server broker_host:port describe --status
ClusterId: fMCL8kv1SWm87L_Md-I2hg
LeaderId: 3002
...
Docs: https://kafka.apache.org/documentation/#kraft_metadata_tool
When using KRaft, a cluster no longer has a single controller. Instead, nodes in the cluster that are running with the "controller" role, all take part in the controller metadata quorum.
The reason that tools are reporting seemingly random IDs is because of an intentional choice to return a random controller participant ID in the existing metadata APIs. This helps in distributing load equally on the nodes participating in the quorum.
Underneath, the participants of the metadata quorum are maintaining a special topic that is replicated with the Raft consensus algorithm. This topic has a leader and you can get the leader ID of that topic. But it is important to note that this is not equal to the controller on a ZK-backed cluster, that role no longer exists when running with KRaft, and as mentioned is now instead a role shared by many nodes.
You should be able to fetch the current leader ID of a cluster by requesting metadata for the __cluster_metadata topic, or as suggested in another answer by using the kafka-metadata-quorum script.
In a cluster scenario with data replication > 1, why is that we must always consume from a master/leader of a partition instead of being able to consume from a replica/follower node that contains a replica of this master node?
I understand the Kafka will always route the request to a master node(of that particular partition/topic) but doesn't this affect scalability (since all requests go to a single node)? Wouldnt it be better if we could read from any node containing the replica information and not necessarily the master?
Partition leader replicas, from which you can write/read data, are evenly distributed among available brokers. Anyway, you may also want to leverage the "fetch from closest replica" functionality, which is described in KIP-392, and available since Kafka 2.4.0.
I'm planning to build a Kafka Cluster using two servers, and host Zookeeper on these two servers as well.
The Question is, since Kafka requires Zookeeper to run, what is the best cluster build for zookeeper to implement Kafka Cluster on two servers?
for eg. I'm currently running two zookeepers on both servers and one Kafka on each server, and in the Kafka configuration they point to all Zookeepers.
Is there a better way to do this?
First of all, you don't have to setup Zookeper and Kafka in the same server. One of the roles of Zookeeper is electing controller. (one of the brokers which is responsible for maintaining the leader/follower relationship for all the partitions) For election; majority of Zookeper nodes must be alive. In your case even one Zookeeper instance is down, you cannot select controller. So there is no difference between having one Zookeper or two. That's why it is recommended to have at least 3 nodes in Zookeeper cluster. By this way you can handle failure of one Zookeeper node.
An addition to this, it is highly recommended to have at least three brokers in your Kafka cluster to maintain both consistency and high availability. (link1, link2)
UPDATE:
As long as you are limited to only two servers, then you can consider sacrificing from high availability by set up your broker by setting min.insync.replicas=2 and having topics with replication.factor=2. If HA is more important than data loss, then you can use min.insync.replicas=1 (default) broker config with again topic replication.factor=2. In this circumstance, your options are these IMHO. (Having one or two Zookeepers is not important as I mentioned above)
I am often faced with the same problem as you do #frisky5 where i would like to achieve a "suboptimal" HA system using only 2 nodes, and thus workarounds are always needed with cloud-native frameworks that rely on the assumption that clusters will have lot of nodes available.
That ain't always the case in real life, is it ;) ?
That being said, i see you essentially having 2 options:
Externalize zookeeper configuration on a replicated storage system using 2 nodes (e.g. DRBD)
Replicate Kafka data volumes entirely on the second nodes and use 2 one-node Kafka clusters that you switch on and off depending on who is the current master node.
I would go for the first option. In that case you would have 2 Kafka servers and one zookeeper server whose ip needs to be static (virtual ip). When the zookeeper node goes down, it is restarted one the second node with same VIP, but it needs to access the synchronized data folder.
I am not too familiar with zookeepers internals and i can't tell you whether it will go in conflict when starting up on a data store who "wasn't its own" but i would guess it makes sense for you to test it using a simple rsync setup.
Another way to achieve consensus if you are using a k3s based kubernetes cluster would be to rely on internal k8s distributed consensus mechanics to "tell Kafka" which node is the leader. This works for the postgresoperator by chruncydata because Patroni is cool ( https://patroni.readthedocs.io/en/latest/kubernetes.html ) 😎 but i am not sure if Kafka/zookeeper are that flexible and can communicate with a rest API to set their locks ...
Once you have achieved this intermediate step, then you can use a PostgreSQL db as external source of truth for k3s and then it is as simple as syncing the postgres data folder between the machines (easily done with rsync). The beauty of this approach is that it is way more generic and could be used for other systems too.
Let me know what do you think about these two approaches and whether you manage to setup a test environment. If you do on GitHub i can help you out with implementation
For a cross network confluent platform, we have one kafka cluster on-premise and another on AWS in which data is replicated from on-prem to AWS using mirror maker. Both clusters are independent with their own schema-registry, rest proxy and connect.Both clusters have different set of producers and consumers and selective topics are being mirrored between clusters.
What should be the best practice to deploy schema-registry ? Should we have one master (say on-premise) and others as non-eligible masters on on-prem and AWS ?
We suspect schema-registry can have issues with respect to schema ids when topics are replicated between clusters and we have 2 masters (aws and onprem).
Thanks!
If you use two different master registries, I find that would be difficult to manage. (See mistake #2 for self-managed registries). The purpose of master.eligble=false on a second instance/cluster is that all ID registration events have a single source of truth. As the docs say, The Schema Registry nodes in both datacenters link to the primary Kafka cluster in DC A, so you would need to establish a valid network link between AWS and onprem, anyway.
Otherwise, with multiple masters, you will need to mirror the schemas topic if you want exact same subjects and schema ids between environments. However, this is primarily meant to be used as a backup, and you would eventually run into conflicting schema IDs for any producer in the destination region pushing schemas to the other master. Hence why the first diagram shows only consumers in the remote datacenter.
If you do not do this, then let's say you mirrored a topic from cluster A to cluster B, and the consumer used registry B in the settings, it would attempt to lookup an ID from registry A (which is embedded in the message), and that either would not exist or would be an incorrect ID for the topic being read.
I wrote a Kafka Connect plugin to work around that issue by registering a new ID in a remote master registry - https://github.com/cricket007/schema-registry-transfer-smt , though you said you're using MirrorMaker, so you would need to take the logic there and apply it to the MessageHandler interface in MirrorMaker
I've really only worked with one master, on-prem, and in AWS, the registry settings have Zookeeper connection pointing to the on-prem cluster settings.
And we don't mirror everything as the docs suggest, only specific topics. The purpose of using Replicator rather than MirrorMaker is that consumer failover is better supported, rather than simply getting data "over the wire", your clients are less dependent upon where they are running as well.
We want to use debezium mongodb kafka source connector against the replica set secondary node rather than primary (at least for the start, to be sure that we are not affecting the main stream functionality in any way).
Debezium mongodb tutorial says that “the connector always uses the replica set’s primary node to tail the oplog”. However, it looks like that by setting auto.discovery to false and specifying secondary node in the connector config make the connector to tail oplog from the secondary node just fine.
So the first question – are we right about it (and it’s not that debezium connector "under the hood" finds a way to the primary node somehow)?
If, indeed, the oplog is tailed from the secondary node (as we want it to be), are there ways to switch to another secondary node automatically if the original one fails?
Thank you.
The MongoDB connector currently will always connect to the primary node of the replica set. Could you open a feature request in our JIRA tracker for optionally reading from secondary nodes? Any help with implementing it will be welcome of course, too.