Kafka console producer skipping messages - apache-kafka

I'm trying to send a file to a topic using:
cat myfile | kafka-console-producer.sh --broker-list $BROKER_URL --topic mytopic
When I check the count of messages on the topic I see few hundred messages less than actual.
During the write I see a message:
[2017-11-15 14:05:26,864] WARN Error while fetching metadata with correlation id 0 : {abc123=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I have correctly set the advertised hostname and listeners.
What confuses me is that if leader is not available how does it manage to put any messages into the topic? Furthermore, the message appears randomly, sometimes it doesn't.
How can I debug this?

As pointed out by vahid in comments this is a know issue.
The workaround is to specify --request-required-acks 1 to the console producer.
The random occurence of LEADER_NOT_AVAILABLE happens when I write to a new topic without explicitly creating it first. (Thanks to amethystic)

Related

Kafka topic partition has missing offsets

I have a Flink streaming application which is consuming data from a Kafka topic which has 3 partitions. Even though, the application is continuously running and working without any obvious errors, I see a lag in the consumer group for the flink app on all 3 partitions.
./kafka-consumer-groups.sh --bootstrap-server $URL --all-groups --describe
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
group-1 topic-test 0 9566 9568 2 - - -
group-1 topic-test 1 9672 9673 1 - - -
group-1 topic-test 2 9508 9509 1 - - -
If I send new records, they get processed but the lag still exists. I tried to view the last few records for partition 0 and this is what I got (ommiting the message part) -
./kafka-console-consumer.sh --topic topic-test --bootstrap-server $URL --property print.offset=true --partition 0 --offset 9560
Offset:9560
Offset:9561
Offset:9562
Offset:9563
Offset:9564
Offset:9565
The log-end-offset value is at 9568 and the current offset is at 9566. Why are these offsets not available in the console consumer and why does this lag exist?
There were a few instances where I noticed missing offsets. Example -
Offset:2344
Offset:2345
Offset:2347
Offset:2348
Why did the offset jump from 2345 to 2347 (skipping 2346)? Does this have something to do with how the producer is writing to the topic?
You can describe your topic for any sort of configuration added while it was created. If the log compaction is enabled through log.cleanup.policy=compact, then the behaviour will be different in the runtime. You can see these lags, due to log compactions lags value setting or missing offsets may be due messages produced with a key but null value.
Configuring The Log Cleaner
The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable log cleaning on a particular topic, add the log-specific property log.cleanup.policy=compact.
The log.cleanup.policy property is a broker configuration setting defined in the broker's server.properties file; it affects all of the topics in the cluster that do not have a configuration override in place. The log cleaner can be configured to retain a minimum amount of the uncompacted "head" of the log. This is enabled by setting the compaction time lag log.cleaner.min.compaction.lag.ms.
This can be used to prevent messages newer than a minimum message age from being subject to compaction. If not set, all log segments are eligible for compaction except for the last segment, i.e. the one currently being written to. The active segment will not be compacted even if all of its messages are older than the minimum compaction time lag.
The log cleaner can be configured to ensure a maximum delay after which the uncompacted "head" of the log becomes eligible for log compaction log.cleaner.max.compaction.lag.ms.
The lag is calculated based on the latest offset committed by the Kafka consumer (lag=latest offset-latest offset committed). In general, Flink commits Kafka offsets when it performs a checkpoint, so there is always some lag if check it using the consumer groups commands.
That doesn't mean that Flink hasn't consumed and processed all the messages in the topic/partition, it just means that it has still not committed them.

unable to send large files to Kafka broker | Error message: The request included a message larger than the max message size the server will accept

I am unable to send large file to the broker, I have edited the configuration files and inside my client program but it didn't work.
Error Message
The request included a message larger than the max message size the server will accept.
Server properties
message.max.bytes=900000000
replica.fetch.max.bytes=900000000
Producer
max.request.size=100000000
Consumer
max.partition.fetch.bytes=100000000
fetch.message.max.bytes=100000000
Client app
props.put("max.request.size",96214400);
props.put("message.max.bytes",96214400);
Moving comment to answer...
The message.max.bytes is a cluster-wide setting and you need to make sure that the corresponding max.message.bytes configuration is also set at topic-level. As the topic-level configuration defaults to 1048588 you are seeing this error.
kafka-topics.bat --create --zookeeper 127.0.0.1:2181 --replication-factor 1 --partitions 1 --topic test --config max.message.bytes=9000000

Kafka-Verifiable-Producer and Consumer Problem

I am doing experimenting with kafka. I already start the
kafka-console-producer and
kafka-console-consumer.
I send messages with kafka-producer and successfully receive at the kafka-console-consumer.
Now I want to produce and consume around 5000 messages at once. I look into documentation and get to know that there are two commands.
kafka-verifiable-producer.sh
kafka-verifiable-consumer.sh
I tried to use these commands .
kafka-verifiable-producer.sh --broker-list localhost:9092 --max-messages 5000 --topic data-sending
kafka-verifiable-consumer.sh --group-instance-id 1 --group-id data-world --topic data-sending --broker-list localhost:9092
The result is as follow
"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4996","offset":44630,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4997","offset":44631,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4998","offset":44632,"topic":"try_1","partition":0}
{"timestamp":1581268289761,"name":"producer_send_success","key":null,"value":"4999","offset":44633,"topic":"try_1","partition":0}
{"timestamp":1581268289769,"name":"shutdown_complete"}
{"timestamp":1581268289771,"name":"tool_data","sent":5000,"acked":5000,"target_throughput":-1,"avg_throughput":5285.412262156448}
On the consumer console the result is as follow
{"timestamp":1581268089357,"name":"records_consumed","count":352,"partitions":[{"topic":"try_1","partition":0,"count":352,"minOffset":32777,"maxOffset":33128}]}
{"timestamp":1581268089359,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33129}],"success":true}
{"timestamp":1581268089384,"name":"records_consumed","count":500,"partitions":[{"topic":"try_1","partition":0,"count":500,"minOffset":33129,"maxOffset":33628}]}
{"timestamp":1581268089391,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33629}],"success":true}
{"timestamp":1581268089392,"name":"records_consumed","count":270,"partitions":[{"topic":"try_1","partition":0,"count":270,"minOffset":33629,"maxOffset":33898}]}
{"timestamp":1581268089394,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":33899}],"success":true}
{"timestamp":1581268089415,"name":"records_consumed","count":500,"partitions":[{"topic":"try_1","partition":0,"count":500,"minOffset":33899,"maxOffset":34398}]}
{"timestamp":1581268089416,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":34399}],"success":true}
{"timestamp":1581268089417,"name":"records_consumed","count":235,"partitions":[{"topic":"try_1","partition":0,"count":235,"minOffset":34399,"maxOffset":34633}]}
{"timestamp":1581268089419,"name":"offsets_committed","offsets":[{"topic":"try_1","partition":0,"offset":34634}],"success":true}
In above results, the key is null.
How i can send a bulk of messages with this command ?
I tried to look into one example how to use them but didn't found any. It produces integer number like values but where i can insert the messages?.
Is there any way i can use this command to produce messages in bulk? Also is it possible to implement such commands in windows or it is just for linux?
Any link to the examples would be greatly appreciated.
The script kafka-verifiable-producer.sh executes the classorg.apache.kafka.tools.VerifiableProducer.
(https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/VerifiableProducer.java)
Its program arguments --throughput, --repeating-keys and --value-prefix may fulfil your needs.
For example, the following produces messages with prefix value, 111 and with an incremental key which rotates for every 5 messages. You can also configure the throughput of the messages with the --throughput option. Int this example, it produces an average of 5 messages per second.
./kafka-verifiable-producer.sh --broker-list localhost:9092 --max-messages 10 --repeating-keys 5 --value-prefix 100 --throughput 5 --topic test
{"timestamp":1581271492652,"name":"startup_complete"}
{"timestamp":1581271492860,"name":"producer_send_success","key":"0","value":"100.0","offset":45,"topic":"test","partition":0}
{"timestamp":1581271492862,"name":"producer_send_success","key":"1","value":"100.1","offset":46,"topic":"test","partition":0}
{"timestamp":1581271493048,"name":"producer_send_success","key":"2","value":"100.2","offset":47,"topic":"test","partition":0}
{"timestamp":1581271493254,"name":"producer_send_success","key":"3","value":"100.3","offset":48,"topic":"test","partition":0}
{"timestamp":1581271493256,"name":"producer_send_success","key":"4","value":"100.4","offset":49,"topic":"test","partition":0}
{"timestamp":1581271493457,"name":"producer_send_success","key":"0","value":"100.5","offset":50,"topic":"test","partition":0}
{"timestamp":1581271493659,"name":"producer_send_success","key":"1","value":"100.6","offset":51,"topic":"test","partition":0}
{"timestamp":1581271493860,"name":"producer_send_success","key":"2","value":"100.7","offset":52,"topic":"test","partition":0}
{"timestamp":1581271494063,"name":"producer_send_success","key":"3","value":"100.8","offset":53,"topic":"test","partition":0}
{"timestamp":1581271494268,"name":"producer_send_success","key":"4","value":"100.9","offset":54,"topic":"test","partition":0}
{"timestamp":1581271494483,"name":"shutdown_complete"}
{"timestamp":1581271494484,"name":"tool_data","sent":10,"acked":10,"target_throughput":5,"avg_throughput":5.452562704471101}
The easiest is to modify/extend the above class If you are looking for more customized message keys and values.

Error while fetching metadata kafka {test=LEADER_NOT_AVAILABLE}?

I get the error after running those simple commands -
I started the Zookeeper and the Kafka servers,
I execute the command:
./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
and execute the command:
./kafka-console-producer --broker-list localhost:9092 --topic test
I obtain a list of WARN like:
[2019-12-08 21:36:13,024] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 37 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
what did I do wrong?
Thanks
If your broker has the auto.create.topics.enable set to true, then this error will be transient and you should be able to produce message without any further error.
It happens just because the producer is asking for metadata about the topic it wants to write to but that topic doesn't exist in the cluster and the partition leader (where the producer wants to write) doesn't exist yet.
If you retry, the broker will create the topic and the command will work fine.
If the above configuration is set to false, then the broker doesn't create the topic automatically on the first request from a client so you have to create it upfront.
Finally, but it's not your case, the above error could even happen when the topic exist but, for example, the broker which is leader for the specific topic partition is down and a new leader election is in progress.
I wanted to add a comment, but its seems I can't. Just go through this link. Someone had a similar problem and it seems the problem is not what you have done but can be somethings different.
Link: https://grokbase.com/t/kafka/users/134qvay38q/leadernotavailable-exception

Missing required argument "[zookeeper]"

i'm trying to start a consumer using Apache Kafka, it used to work well, but i had to format my pc and reinstall everything again, and now when trying to run this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
this is what i'm getting:
Missing required argument "[zookeeper]"
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect
to>
--consumer.config <config file> Consumer config properties file.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--formatter <class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--key-deserializer <deserializer for
key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <metrics directory> If csv-reporter-enable is set, and
this parameter isset, the csv
metrics will be outputed here
--new-consumer Use the new consumer implementation.
--property <prop>
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <topic> The topic id to consume on.
--value-deserializer <deserializer for
values>
--whitelist <whitelist> Whitelist of topics to include for
consumption.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
my guess is that there's some kind of problem with the zookeeper connection port, because it's telling me to specify the port which zookeeper has to use to get connected to kafka. I'm not sure of this though, and don't know how to figure out the port to specify if this was the problem. Any suggestions??
Thanks in advance for the help
It looks like you are using an old version of the Kafka tools that requires to set --new-consumer if you want to directly connect to the brokers.
I'd recommend picking a recent version of Kafka so you only need to specify --bootstrap-server like in your example: http://kafka.apache.org/downloads