Kafka: Killing a consumers connection - apache-kafka

With EMS it is possible to see all connections to a particular EMS server, and kill any unwanted connections.
As far as I can tell, I have an unwanted process somewhere that is subscribing to my Kafka topic with the same consumer name as my process.
Therefore, my process is not receiving any messages and I don't know where this "rogue" process is located.
Is there any command I can run to kill such connections?
I am running Kafka 0.9

If you use the Confluent Control Center you can see each consumer group and all the clients in each consumer group. That might help you identify the "rogue" consumer.
Otherwise you might have to just pick a new group id so it won't matter what the other client is subscribing to (because it will be in another consumer group).
It sounds like you should also configure some security and ACLs so that rogue apps can't authenticate and subscribe to topics they are not allowed to access.

Related

Kafka has multiple bootstrap server but we will connect only one server. Can we still publish data to it?

Kafka has multiple bootstrap server like b1.com, b2.com, b3.com. While Producer Configuration, we are passing only b1.com as bootstrap server. What will happen once we will publish data to kafka?
As of my knowledge, it should not allow to publish the data if b1.com is not leader as kafka allow publishing data through leader only. Please guide me.
Even if b1.com is not the leader, you would still be able to publish data successfully. The reason being once you connect to a server, you can get the complete metadata of your topic (partitions, their respective leaders etc).
That being said, it is still recommended to provide all servers. Reason for this is the scenario where b1.com goes down. Now since you provided only one server to your producer, it will not be able to connect to kafka and your system effectively goes down.
On the other hand, if you had provided all the servers and assuming your topic was replicated - the system would still be functional even if b1.com had gone down.

Kafka - Error on specific consumer -Broker not available

We have deployed multiple Kafka consumers in container's clusters. All are working properly except for one, which is throwing warning "Connection to node 0 could not be established. Broker may not be available", however, this error appears only in one of the containers, and this consumer is running in the same network and server of the others. So I have ruled out issues with kafka server configuration.
I tried changing the groupid of the consumer and I got it working for some minutes, but now warn is appearing again. I consume all topics used by this consumer from a bash shell and I can consume.
Having into account the above context, I think it could be due to bad practice in the consumer software code, also, it could be about offsets got damaged. How could I identify if are there some of this kind using kafka logs?
You can exec into the container and netcat the broker's advertised addresses to verify connectivity.
You can also use the Kafka shell scripts to verify consuming functionality, as always.
Corrupted offsets would prevent any consumer from reading, not only one. Bad code practices wouldn't show up in logs
If you have the container running "on same server as others", I'd suggest working with affinity rules and constraints to spread your applications onto multiple servers before placing on the same machine

What happens to consumer groups in Kafka if the entire cluster goes down?

We have a consumer service that is always trying to read data from a topic using a consumer group. Due to redeployments, our Kafka cluster periodically is brought down and recreated again.
Whenever the cluster comes back again, we observed that although the previous topics are picked up (probably from zookeeper), the previous consumer groups are not created. Because of this, our running consumer process which is created with a previous consumer group gets stuck and never comes out.
Is this how the behavior of the consumer groups should be or is there a configuration we need to enable somewhere?
Any help is greatly appreciated.
Kafka Brokers keep a cache of healthy consumers and consumer groups, if the entire cluster is destroyed/recreated it no longer has knowledge of those consumers and groups, including offsets. The consumers will have to reconnect and re-establish the group and offsets from the beginning of the topic.
Operationally it makes more sense to keep the Kafka cluster running long-term, and do version upgrades in a rolling fashion so you don't interrupt the service.

Kafka: What happens when the entire Kafka Cluster is down?

We're testing out the Producer and Consumer using Kafka. A few questions:
What happens when all the brokers are down and they're not responding at all?
Does the Producer need to keep pinging the Kafka brokers to know when it is back up online? Or is there a more elegant way for the Producer application to know?
How does Zookeeper help in all this? What if the ZK is down as well?
If one or more brokers are down, the producer will re-try for a certain period of time (based on the settings). And during this time one or more of the consumers will not be able to read anything until the respective brokers are up.
But if the cluster is down for a longer period than your total re-try period, then probably you need to find a way to resend those failed messages again.
This is the one scenario where Kafka Mirroring(MirrorMaker tool) comes into picture.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
Producer will fail because cluster will be unavailable, this means they will get a non retriable error from kafka client implementation and depending on your client process, message will buffer on the local send queue of your application.
I'm sure that if zookeeper is down your system will not work anymore. This is one of the weakness of Kafka, he need zookeeper to work.

Understanding kafka broker vs zookeper

I notice that when sending messages to kafka (a producer) the samples show connecting to port 9092 -- writing directly to a broker. When consuming the examples show connecting to port 2181, presumably zookeeper.
The latter makes sense--I want to read from "the cluster", letting zookeeper figure out which broker the client should communicate with, and managing such things as knowing who's alive/dead in the cluster.
Why wouldn't publish/writes work the same way, i.e. write to "the cluster" (via zookeeper)?
Am I understanding this correctly, that for producing I'm bypassing zookeeper (cluster knowledge) and must know producer nodes (and presumably figure out what to do if one fails)?
The "high level consumer" of Kafka uses Zookeeper to keep track of which partitions each member in a consumer group is consuming and sometimes to track which offsets were read in which partition. Since access to Zookeeper is required, we may as well use it to figure out where are the brokers...
In the new consumer (coming soon in the next release), Zookeeper is no longer needed, and consumers connect directly to brokers, just like producers currently do.