Kafka consumer Behaviour - apache-kafka

I was writing Kafka consumer and I have a query related to consumer processes.
I have a consumer with groupId="testGroupId" and using the same groupId I consume from multiple topics say, "topic1" and "topic2".
Also, assume "topic1" is already created on broker whereas "topic2" is not yet created.
Now If I start the consumer I see consumer threads for "topic1" (which is already created) in zookeeper nodes, but I do not see any consumer thread(s) for "topic2".
My question is, will the consumer thread(s) for "topic2" will be created only after we create the topic on broker?

I assume you use Kafka ConsumerConnector method like createMessageStreamsByFilter. Consumer will subscribe to kafka topic events and in case of new topics it will subscribe to that topic automatically.

Related

Kafka consumer groups still exists after the zookeeper and Kafka servers are restarted

I'm using Zookeeper and Kafka for messaging use case using Java. I thought consumer group details will be removed when you restart Zookeeper and Kafka servers. But they don't. Does zookeeper keeps consumer groups details in some kind of a file?
Should I remove consumer group details manually if I want to reset the consumer groups?
Can anyone clarify this to me?
Since Kafka 0.9, Consumer Offsets are stored directly in Kafka in an internal topic called __consumer_offsets.
Consumer Offsets are preserved across restarts and are kept at least for offsets.retention.minutes (7 days by default).
If you want to reset a Consumer Group, you can:
use the kafka-consumer-groups.sh tool with the --reset-offsets option
use AdminClient.deleteConsumerGroups() to fully delete the Consumer group

How can I run Kafka Consumer processor instance on multiple nodes with Apache Nifi

Currently we are using Apache NiFi to consume messages via Kafka consumer. Output of kafka consumer is connected to hive processor.
I'm looking into how to run kafka consumer instance on a nifi cluster.
I have 3 nodes of nifi cluster and a kafka topic which have 3 partitions, I want the kafka consumer to be able run on each node so each consumer can poll message from one of topic partitions.
After I started the kafka consumer processor ,i can only see that the kafka consumer always run on a single node but not all nodes.
Is there any configuration that I missed?
NiFi uses the Apache Kafka client which is what performs the assignment of consumers to partitions. When you start the processor, assuming you have it set to 1 concurrent task, then you should have 1 consumer on each node of your cluster, and each consumer should get assigned a different partition.
https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka

Is it possible to kill a consumer from the Kafka server?

When I check for consumer lag, it shows that a particular consumer-id is running from a particular host consuming from a topic.
But when I go to that host there is no such consumer running.
How do I kill this consumer-id, so that I can reset consumer offset for the group that its part of.
Kafka server version: 0.11.0.1
Kafka client version(stuck): 0.10.0.2
This consumer-id got stuck in the first place as it was not able to consume messages because of some messages having headers in Kafka.
I've already tried the following:
Consuming from a different host and different Kafka version, it consumes messages but the consumer-id, host does not change.
Restarting kafka broker which is the leader for that topic.
Changing security groups to prevent the host from connecting to my broker.
Perhaps what you see is not a consumer id, but a consumer group, see Kafka docs, consumer config to learn about the difference.
Kafka uses consumer groups to keep track of the last consumed message (consumer offset), so when talking about the consumer lag this is probably the explanation.
This means there is no consumer running and you only need to get rid of the consumer offset for this group. See e.g. How do I delete a Kafka Consumer Group to reset offsets?

two kafka consumer in same group and one partition

I have kafka topic and one partiton .When i create consumer missed to mention the group .id . there are 2 different consumer for the same partition .
I can receive messages and my flink source extends FlinkKafkaConsumer011.
Ideally one consumer should receive message since its a same group but in my case both the consumers are receiving messages not sure why ???.
To test have restarted one of my job and newly started consumer is not picking where it has left because other consumer is commiting offset it seems.
Flink 1.6.0
Kafka 0.11.2

Kafka Topic Distribution among brokers

When creating topics, can we determine which broker will be the leader for the topic? Are topics balanced across brokers in Kafka? (Considering the topics have just one partition)
Kafka does manage this internally and you don't need to worry about this in general: http://kafka.apache.org/documentation/#basic_ops_leader_balancing
If you create a new topic, Kafka will select a broker based on load. If a topic has only one partitions, it will only be hosted on a single broker (plus followers if you have multiple replicas), because a partitions cannot be split over multiple brokers in Kafka.
Nevertheless, you can get the information which broker host what topic and you can also "move" topics and partitions: http://kafka.apache.org/documentation/#basic_ops_cluster_expansion