Connecting Storm with remote Kafka cluster, what would happen if new brokers are added - apache-kafka

We are working on an application that uses Storm to pull data from a remote Kafka cluster. As the two cluster lies in different environment there is an issue with network connectivity between them. In simple term by default the remote zookeeper and Kafka brokers does not allow connection from our Storm's worker/supervisor nodes. In order to do that we need firewall access to be given.
My concern is what would happen if new Brokers or Zookeeper is added in the remote cluster ? I understand that we don't have to specify all the zk nodes in order to consume but say they add few brokers and we need to consume from a partition which is served by those new set of nodes ? What would be the impact on the running Storm application ?

Related

Using Kafka Connect in distributed mode, where are internal topics supposed to exist

As a follow up to my previous question here Attempting to run Kafka Connect in distributed mode locally, problem with internal topics, I have started to figure out what might really be going on (I'm learning Kafka as I go).
Kafka Connect, one way or another, requires three internal topics: config, offset, and status. Are these topics supposed to exist in the Kafka cluster where I am consuming data from? For context, what I'm doing is someone else has a Kafka cluster set up that has topics (messages?) for me to consume. I spin up a Kafka Connect cluster on my local machine (to test) and this local instance (we'll call it that going forward) then connects to the remote Kafka cluster (we'll call it the remote cluster) by way of me typing in the bootstrap servers, some callback handler classes, and a connect.jaas file.
Do these three topics need to already exist on the remote cluster? Here I have been trying to create them on my own broker on my local instance, but through continued research, I'm seeing maybe these three internal topics need to be on the remote cluster (where I'm getting my data from). Does the owner of the remote Kafka cluster need to create these three topics for me? Where would they create them exactly? What if their cluster is not a Kafka Connect cluster specifically?
The topics need to be created on the cluster defined by bootstrap.servers in the Connect worker properties. This can be local or remote, depending on what data you actually want the connector tasks to send/receive. Individual connect tasks cannot override what brokers are being used (not possible to use a source connector to write to multiple Kafka clusters, for example)
Latest versions of Kafka Connect will automatically create those internal topics, if it is authorized to do so. Otherwise, yes, they'll need to be created using kafka-topics --create with appropriate partition counts and replication factors.
If your data exists in a remote Kafka cluster, the only reason to run a local instance is if you want to use MirrorMaker, for example.
What if their cluster is not a Kafka Connect cluster specifically?
Unclear what this means. Kafka Connect is a client just like a Kafka Streams app or normal producer or consumer. It doesn't store topics itself.

In Kafka Connect, how to connect with multiple kafka clusters?

I set the kafka connect cluster in distributed mode and I wanna get connections with multiple kafka CLUSTERS, not just multiple brokers.
Target brokers can be set with bootstrap.servers in connect-distributed.properties.
So, at first, I set broker1 from kafka-cluster-A like below:
bootstrap.servers=broker1:9092
Absolutely, it worked well.
And then, I added broker2 from kafka-cluster-B like below:
bootstrap.servers=broker1:9092,broker2:9092
So, these two brokers are in the different clusters.
And this didn't work at all.
Without any error, it was just stuck and there was no answer with the request like creating connector through the REST API.
How can I connect with multiple kafka clusters?
As far as I know, you can only connect a Kafka Connect worker to one Kafka cluster.
If you have data on different clusters that you want to handle with Kafka Connect then run multiple Kafka Connect worker processes.

Kafka Cluster cotinues to run without zookeeper

I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances.
While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down.
I was able to produce , consume and also create new consumers.
why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
how do we handle such a scenario ?
Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost
Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod
How to handle? Restart Zookeeper...

Client cannot pubish because firewall issue with full replication factor

Setup: Three Node Kafka cluster running version 2.12-2.3.0. Replication factor is 3 and with 20 partition per topic.
Description:
All three nodes in Kafka cluster can communicate between themself without issue. An incorrect firewall is introduced with Kafka client which "block" client from communicating with one Kafka node. The client can no longer publish to any of the Kafka node. Two Kafka nodes are still network reachable from Kafka client. We understand this is a network split brain issue.
Question: Is there a way to configure Kafka so that kafka client can communicate with "surviving" Kafka nodes?
The client can no longer publish to any of the Kafka node
That shouldn't happen. The client should only be unable to communicate with leader partitions on that one node, and continue communicating with the leader partitions on the other, reachable nodes.
There are no changes you could make on the server-side if the client's host network/firewall is the issue.

2 cluster of zookeper servers in hadoop+kafka cluster - is it posible?

We have Kafka cluster with the following details
3 kafka machines
3 zookeeper servers
We also have Hadoop cluster that includes datanode machines
And all application are using the zookeeper servers, including the kafka machines
Now
We want to do the following changes
We want to add additional 3 zookeeper servers that will be in a separate cluster
And only kafka machine will use this additional zookeeper servers
Is it possible ?
Editing the ha.zookeeper.quorum in Hadoop configurations to be separate from zookeeper.connect in Kafka configurations, such that you have two individual Zookeeper clusters, can be achieved, yes.
However, I don't think Ambari or Cloudera Manager, for example, allow you to view or configure more than one Zookeeper cluster at a time.
Yes, that's possible. Kafka uses Zookeeper to perform various distributed coordination tasks, such as deciding which Kafka broker is responsible for allocating partition leaders, and storing metadata on topics in the broker.
After closing kafka, the original zookeeper cluster data will be copied to the new cluster using tools, this is a zookeeper cluster data transfer util zkcopy
But if your Kafka cluster didn't stop work, you should think about Zookeeper data transfer to additional zookeeper servers.