Kafka does not retrieve messages which are sent when it is offline - apache-kafka

I have setup a kafka cluster on a machine and testing with kafka-console-producer.bat and kafka-console-producer.bat
I started zookeeper, kafka-server, and produce some test messages using kakfa-console-producer, followed by kakfa-console-consumer, no messages were printed.
However, if I start zookeeper, kakfa-server, kakfa-console-consumer and lastly, kakfa-console-producer and produce the test messages, the messages were printed out on the kakfa-console-consumer.
Why is it Kafka is unable to pickup messages when it is offline? I am only using 1 broker.

Kafka has a concept called consumer groups, every consumer when it connects to a broker joins one. For every consumer group, Kafka keeps track of the last message offset that was read. If a consumer group is unknown to the broker, a consumer parameter called auto.offset.reset influences what happens:
earliest: start reading messages from the beginning of the topic
latest: start reading from the current end of the topic (so any messages produces after the consumer was started)
The default for this parameter is latest and since the console consumer randomizes its consumer group, this is what will happen in your case and why you don't see any messages that were produced before the consumer was started.
You can add the parameter --from-beginning to your console consumer command which is used to control this behavior for this tool. Then you should see all messages.
Update:
If you want to ensure you pick up where your consumer left of, you will need to manually set a consumer group and keep this the same every time you call your consumer.
You can do this by creating a text file with this parameter and passing this in to your console consumer.
echo "group.id=test" > consumer.config
./kafka-console-consumer --topic test --new-consumer --bootstrap-server 127.0.0.1:9092 --consumer.config consumer.config

Related

MM2.0 consumer group behavior

I'm trying to run some tests to understand MM2 behavior. As part of that I had the following questions:
How to correctly pass a custom consumer group for MM2 in mm2.properties?
Based on this question, tried passing <alias>.group.id=temp_cons_group in mm2.properties and on restarting the MM2 instance could see the consumer group mentioned in the MM2 logs.
However, when I try listing consumer groups registered in the source broker, the group doesn't show up?
How to test if the property <alias>.consumer.auto.offset.reset works?
Here, I want to consume the same messages again so in reference to the question, tried setting <source_alias>.consumer.auto.offset.reset to earliest and restarted MM2.
I was able to see the property set correctly in MM2 logs but did not get the messages from the beginning in the target cluster topic.
How do I start a MM2 instance to start consuming messages from a specific offset for a topic present in the source cluster?
MirrorMaker does not use a consumer group to run and instead uses the assign() API, so it's expected that you don't see a group.
It's hard to "test". One way to verify this configuration was picked up is to check it's present in the logs when MirrorMaker starts its consumers.
This is currently not trivial to do. There's a KIP in progress to improve the process but at the moment it requires manually updating the internal offset topic from your Connect instance. At a very high level, here's the process:
First, ensure MirrorMaker is not running. Then you need to find the offset records for MirrorMaker in the offsets topic using a command like:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic <CONNECT_OFFSET_TOPIC \
--from-beginning \
--property print.key=true | grep <SOURCE_CONNECTOR_NAME>
You will see records with offsets for each partition MirrorMaker handles. To update the offsets, you need to produce new records to this topic with the offsets you want. For each partition, ensure your record has the same key as the existing message so it replaces the existing stored offsets.

Kafka consumer group description does not include all topics [duplicate]

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

Unable to describe Kafka Streams Consumer Group

What I want to achieve is to be sure that my Kafka streams consumer does not have lag.
I have simple Kafka streams application that materialized one topic as store in form of GlobalKTable.
When I try to describe consumer on Kafka by command:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group my-application-id
I can't see any results. And there is no error either. When I list all consumers by:
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
my application consumer is listed correctly.
Any idea where to find additional information what is happening that I can't describe consumer?
(Any other Kafka streams consumers that write to topics can be described correctly.)
If your application does only materialize a topic into a GlobalKTable no consumer group is formed. Internally, the "global consumer" does not use subscribe() but assign() and there is no consumer group.id configured (as you can verify from the logs) and no offset are committed.
The reason is, that all application instances need to consume all topic partitions (ie, broadcast pattern). However, a consumer group is designed such that different instances read different partitions for the same topic. Also, per consumer group, only one offset can be committed per partition -- however, if multiple instance read the same partition and would commit offsets using the same group.id the commits would overwrite each other.
Hence, using a consumer group while "broadcasting" data does not work.
However, all consumers should expose a "lag" metrics records-lag-max and records-lag (cf https://kafka.apache.org/documentation/#consumer_fetch_monitoring). Hence, you should be able to hook in via JMX to monitor the lag. Kafka Streams includes client metrics via KafkaStreams#metrics(), too.

Cannot setup consumer group in Kafka with Python

I'm new to Kafka and I've tried the Kafka-Python package.
I managed to setup a simple producer and consumer, which can send and receive messages. In this case the consumer is without using consumer group as below:
consumer = KafkaConsumer(queue_name, bootstrap_servers='kafka:9092')
However, when I started to use the group_id as below, it stops receiving any messages:
consumer = KafkaConsumer(bootstrap_servers='kafka:9092', auto_offset_reset='earliest', group_id='my-group')
consumer.subscribe([queue_name])
For comparison, I've also tried the confluent-kafka-python package, where I have the following consumer code, which also doesn't work:
consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'mygroup',
'auto.offset.reset': 'earliest'
})
consumer.subscribe([queue_name])
Also running ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list gives empty result.
Any configuration I'm missing here?
By default, the consumer starts consuming from the last committed offsets which is probably the last offset in your case.
The auto.offset.reset only applies when there are no committed offsets. As by default the consumer automatically commits offsets, it usually only applies the first time your run it (there are a few other cases but they don't matter in this example).
So to see messages flowing, you need to either start producing once your consumer is running or use a different group name to allow auto.offset.reset to apply.

Messages sent to all consumers with the same consumer group name

There is following consumer code:
from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
kafka = KafkaClient("localhost", 9092)
consumer = SimpleConsumer(kafka, "my-group", "my-topic")
consumer.seek(0, 2)
for message in consumer:
print message
kafka.close()
Then I produce message with script:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
The thing is that when I start consumers as two different processes then I receive new messages in each process. However I want it to be sent to only one consumer, not broadcasted.
In documentation of Kafka (https://kafka.apache.org/documentation.html) there is written:
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
I see that group for these consumers is the same - my-group.
How to make it so that new message is read by exactly one consumer instead of broadcasting it?
the consumer-group API was not officially supported untilĀ kafka v. 0.8.1 (released Mar 12, 2014). For server versions prior, consumer groups do not work correctly. And as of this post the kafka-python library does not currently attempt to send group offset data:
https://github.com/mumrah/kafka-python/blob/c9d9d0aad2447bb8bad0e62c97365e5101001e4b/kafka/consumer.py#L108-L115
Its hard to tell from the example above what your Zookeeper configuration is or if there's one at all. You'll need a Zookeeper cluster for the consumer group information to be persisted WRT what consumer within each group has consumed to a given offset.
A solid example is here:
Official Kafka documentation - Consumer Group Example
This should not happen - make sure that both of the consumers are being registered under the same consumer group in the zookeeper znodes. Each message to a topic should be consumed by a consumer group exactly once, so one consumer out of everyone in the group should receive the message, not what you are experiencing. What version of Kafka are you using?