Best way to run multiple Kafka console consumer? - apache-kafka

I write Kafka consumed messages on a file (backup.log). To do this, I created a service on my CentOS which run kafka-console-consumer.sh --bootstrap-server kafka1:9092 --topic test --consumer.config consumer.properties >> /backup/backup.log
I have two Linux host.
Each Linux host one Kafka consumer and each consumer belong to a consumer group. So I have two consumer group.
This way, If a consumer is down, the other consumer has all the messages as he belongs to another consumer group.
Currently, I have only one topic with four partitions.
Question: I have to consume multiples topics. I read that it's recommended to add consumers (parallelize) instead of using only one consumer for too many topics/partitions.
-> What is the best way to achieve this? Duplicate the service which runs kafka-console-consumer.sh? Or another way?

Related

Kafka consumer group description does not include all topics [duplicate]

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

Unable to describe Kafka Streams Consumer Group

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

Kafka consumer groups still exists after the zookeeper and Kafka servers are restarted

I'm using Zookeeper and Kafka for messaging use case using Java. I thought consumer group details will be removed when you restart Zookeeper and Kafka servers. But they don't. Does zookeeper keeps consumer groups details in some kind of a file?
Should I remove consumer group details manually if I want to reset the consumer groups?
Can anyone clarify this to me?
Since Kafka 0.9, Consumer Offsets are stored directly in Kafka in an internal topic called __consumer_offsets.
Consumer Offsets are preserved across restarts and are kept at least for offsets.retention.minutes (7 days by default).
If you want to reset a Consumer Group, you can:
use the kafka-consumer-groups.sh tool with the --reset-offsets option
use AdminClient.deleteConsumerGroups() to fully delete the Consumer group

Kafka topics not created empty

I have a Kafka cluster consisting on 3 servers all connected through Zookeeper. But when I delete a topic that has some information and create the topic again with the same name, the offset does not start from zero.
I tried restarting both Kafka and Zookeeper and deleting the topics directly from Zookeeper.
What I expect is to have a clean topic When I create it again.
I found the problem. A consumer was consuming from the topic and the topic was never actually deleted. I used this tool to have a GUI that allowed me to see the topics easily https://github.com/tchiotludo/kafkahq. Anyway, the consumers can be seen running this:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Messages sent to all consumers with the same consumer group name

There is following consumer code:
from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
kafka = KafkaClient("localhost", 9092)
consumer = SimpleConsumer(kafka, "my-group", "my-topic")
consumer.seek(0, 2)
for message in consumer:
print message
kafka.close()
Then I produce message with script:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
The thing is that when I start consumers as two different processes then I receive new messages in each process. However I want it to be sent to only one consumer, not broadcasted.
In documentation of Kafka (https://kafka.apache.org/documentation.html) there is written:
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
I see that group for these consumers is the same - my-group.
How to make it so that new message is read by exactly one consumer instead of broadcasting it?
the consumer-group API was not officially supported untilĀ kafka v. 0.8.1 (released Mar 12, 2014). For server versions prior, consumer groups do not work correctly. And as of this post the kafka-python library does not currently attempt to send group offset data:
https://github.com/mumrah/kafka-python/blob/c9d9d0aad2447bb8bad0e62c97365e5101001e4b/kafka/consumer.py#L108-L115
Its hard to tell from the example above what your Zookeeper configuration is or if there's one at all. You'll need a Zookeeper cluster for the consumer group information to be persisted WRT what consumer within each group has consumed to a given offset.
A solid example is here:
Official Kafka documentation - Consumer Group Example
This should not happen - make sure that both of the consumers are being registered under the same consumer group in the zookeeper znodes. Each message to a topic should be consumed by a consumer group exactly once, so one consumer out of everyone in the group should receive the message, not what you are experiencing. What version of Kafka are you using?