Constant difference between producer and consumer Kafka stream metrics - apache-kafka

Using Kafka stream metrics: sum(irate(kafka_producer_producer_metrics_record_send_total{}[1m])) and sum(irate(kafka_consumer_consumer_fetch_manager_metrics_records_consumed_total{}[1m])) I have noticed that for every Kafka stream app - consumer rate is always twice higher than producer rate. What is reason of this behavior?
If this is always higher I would suppose that there should be some kind of buffer and sometimes producer rate should be higher than consumer rate but it is not and memory doesn't explode.
Other info:
Topic has 1 partition with replication 2.
Kafka stream app from image is simple map.
│ Kafka:
│ Config:
│ default.replication.factor: 3
│ inter.broker.protocol.version: 3.3
│ min.insync.replicas: 2
│ offsets.topic.replication.factor: 3
│ transaction.state.log.min.isr: 2
│ transaction.state.log.replication.factor: 3
Kafka streams config:
props.put(StreamsConfig.TOPOLOGY_OPTIMIZATION_CONFIG, StreamsConfig.OPTIMIZE);
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1);
props.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 2);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, earliest);

Related

Kafka console consumer commits wrong offset when using --max-messages

I have a kafka console consumer in version 1.1.0 that i use to get messages from Kafka.
When I use kafka-console-consumer.sh script with option --max-messages it seems like it is commiting wrong offsets.
I've created a topic and a consumer group and read some messages:
/kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server 192.168.1.23:9092 --describe --group my-consumer-group
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test.offset 1 374 374 0 - - -
test.offset 0 0 375 375 - - -
Than I read 10 messages like this:
/kafka_2.11-1.1.0/bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.23:9092 --topic test.offset --timeout-ms 1000 --max-messages 10 --consumer.config /kafka_2.11-1.1.0/config/consumer.properties
1 var_1
3 var_3
5 var_5
7 var_7
9 var_9
11 var_11
13 var_13
15 var_15
17 var_17
19 var_19
Processed a total of 10 messages
But now offsets show that it read all the messages in a topic
/kafka_2.11-1.1.0/bin/kafka-consumer-groups.sh --bootstrap-server 192.168.1.23:9092 --describe --group my-consumer-group
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'my-consumer-group' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
test.offset 1 374 374 0 - - -
test.offset 0 375 375 0 - - -
And now when I want to read more messages I get an error that there are no more messages in a topic:
/kafka_2.11-1.1.0/bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.23:9092 --topic test.offset --timeout-ms 1000 --max-messages 10 --consumer.config /kafka_2.11-1.1.0/config/consumer.properties
[2020-02-28 08:27:54,782] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$)
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:98)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:129)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:84)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
Processed a total of 0 messages
What do I do wrong? Why the offset moved to last message in topic and not just by 10 messages?
This is about auto commit feature of Kafka consumer. As mentioned in this link:
The easiest way to commit offsets is to allow the consumer to do it
for you. If you configure enable.auto.commit=true, then every five
seconds the consumer will commit the largest offset your client
received from poll(). The five-second interval is the default and is
controlled by setting auto.commit.interval.ms. Just like everything
else in the consumer, the automatic commits are driven by the poll
loop. Whenever you poll, the consumer checks if it is time to commit,
and if it is, it will commit the offsets it returned in the last poll.
So in your case when your consumer poll, it receives messages up to 500 (default value of max.poll.records) and after 5 seconds it commits largest offset that return from last poll (375 in your case) even you specify max-messages as 10.
--max-messages: The maximum number of messages to
consume before exiting. If not set,
consumption is continual.

Understanding Kafka poll(), flush() & commit()

I’m new to Kafka and trying out few small usecase for my new application. The use case is basically,
Kafka-producer —> Kafka-Consumer—> flume-Kafka source—>flume-hdfs-sink.
When Consuming(step2), below is the sequence of steps..
1. consumer.Poll(1.0)
1.a. Produce to multiple topics (multiple flume agents are listening)
1.b. Produce. Poll()
2. Flush() every 25 msgs
3. Commit() every msgs (asynchCommit=false)
Question 1: Is this sequence of action right!?!
Question2: Will this cause any data loss as the flush is every 25 msgs and commit is for every msg?!?
Question3 :Difference between poll() for producer and poll ()consumer?
Question4 :What happens when messages are committed but not flushed!?!
I will really appreciate if someone can help me understand with offset examples between producer/consumer for poll,flush and commit.
Thanks in advance!!
Let us first understand Kafka in short:
what is kafka producer:
t.turner#devs:~/developers/softwares/kafka_2.12-2.2.0$ bin/kafka-console-producer.sh --broker-list 100.102.1.40:9092,100.102.1.41:9092 --topic company_wallet_db_v3-V3_0_0-transactions
>{"created_at":1563415200000,"payload":{"action":"insert","entity":{"amount":40.0,"channel":"INTERNAL","cost_rate":1.0,"created_at":"2019-07-18T02:00:00Z","currency_id":1,"direction":"debit","effective_rate":1.0,"explanation":"Voucher,"exchange_rate":null,expired","id":1563415200,"instrument":null,"instrument_id":null,"latitude":null,"longitude":null,"other_party":null,"primary_account_id":2,"receiver_phone":null,"secondary_account_id":362,"sequence":1,"settlement_id":null,"status":"success","type":"voucher_expiration","updated_at":"2019-07-18T02:00:00Z","primary_account_previous_balance":0.0,"secondary_account_previous_balance":0.0}},"track_id":"a011ad33-2cdd-48a5-9597-5c27c8193033"}
[2019-07-21 11:53:37,907] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 7 : {company_wallet_db_v3-V3_0_0-transactions=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
You can ignore the warning. It appears as Kafka could not find the topic and auto-creates the topic.
Let us see how kafka has stored this message:
The producer creates a directory in the broker server at /kafka-logs (for apache kafka) or /kafka-cf-data (for the confluent version)
drwxr-xr-x 2 root root 4096 Jul 21 08:53 company_wallet_db_v3-V3_0_0-transactions-0
cd into this directory and then list the files. You will see the .log file that stores the actual data:
-rw-r--r-- 1 root root 10485756 Jul 21 08:53 00000000000000000000.timeindex
-rw-r--r-- 1 root root 10485760 Jul 21 08:53 00000000000000000000.index
-rw-r--r-- 1 root root 8 Jul 21 08:53 leader-epoch-checkpoint
drwxr-xr-x 2 root root 4096 Jul 21 08:53 .
-rw-r--r-- 1 root root 762 Jul 21 08:53 00000000000000000000.log
If you open the log file, you will see:
^#^#^#^#^#^#^#^#^#^#^Bî^#^#^#^#^B<96>T<88>ò^#^#^#^#^#^#^#^#^Al^S<85><98>k^#^#^Al^S<85><98>kÿÿÿÿÿÿÿÿÿÿÿÿÿÿ^#^#^#^Aö
^#^#^#^Aè
{"created_at":1563415200000,"payload":{"action":"insert","entity":{"amount":40.0,"channel":"INTERNAL","cost_rate":1.0,"created_at":"2019-07-18T02:00:00Z","currency_id":1,"direction":"debit","effective_rate":1.0,"explanation":"Voucher,"exchange_rate":null,expired","id":1563415200,"instrument":null,"instrument_id":null,"latitude":null,"longitude":null,"other_party":null,"primary_account_id":2,"receiver_phone":null,"secondary_account_id":362,"sequence":1,"settlement_id":null,"status":"success","type":"voucher_expiration","updated_at":"2019-07-18T02:00:00Z","primary_account_previous_balance":0.0,"secondary_account_previous_balance":0.0}},"track_id":"a011ad33-2cdd-48a5-9597-5c27c8193033"}^#
Let us understand how the consumer would poll and read records :
What is Kafka Poll :
Kafka maintains a numerical offset for each record in a partition.
This offset acts as a unique identifier of a record within that
partition, and also denotes the position of the consumer in the
partition. For example, a consumer which is at position 5 has consumed
records with offsets 0 through 4 and will next receive the record with
offset 5. There are actually two notions of position relevant to the
user of the consumer: The position of the consumer gives the offset of
the next record that will be given out. It will be one larger than the
highest offset the consumer has seen in that partition. It
automatically advances every time the consumer receives messages in a
call to poll(long).
So, poll takes a duration as input, reads the 00000000000000000000.log file for that duration, and returns them to the consumer.
When are messages removed :
Kafka takes care of the flushing of messages.
There are 2 ways:
Time-based : Default is 7 days. Can be altered using
log.retention.ms=1680000
Size-based : Can be set like
log.retention.bytes=10487500
Now let us look at the consumer:
t.turner#devs:~/developers/softwares/kafka_2.12-2.2.0$ bin/kafka-console-consumer.sh --bootstrap-server 100.102.1.40:9092 --topic company_wallet_db_v3-V3_0_0-transactions --from-beginning
{"created_at":1563415200000,"payload":{"action":"insert","entity":{"amount":40.0,"channel":"INTERNAL","cost_rate":1.0,"created_at":"2019-07-18T02:00:00Z","currency_id":1,"direction":"debit","effective_rate":1.0,"explanation":"Voucher,"exchange_rate":null,expired","id":1563415200,"instrument":null,"instrument_id":null,"latitude":null,"longitude":null,"other_party":null,"primary_account_id":2,"receiver_phone":null,"secondary_account_id":362,"sequence":1,"settlement_id":null,"status":"success","type":"voucher_expiration","updated_at":"2019-07-18T02:00:00Z","primary_account_previous_balance":0.0,"secondary_account_previous_balance":0.0}},"track_id":"a011ad33-2cdd-48a5-9597-5c27c8193033"}
^CProcessed a total of 1 messages
The above command instructs the consumer to read from offset = 0. Kafka assigns this console consumer a group_id and maintains the last offset that this group_id has read. So, it can push newer messages to this consumer-group
What is Kafka Commit:
Commit is a way to tell kafka the messages the consumer has successfully processed. This can be thought as updating the lookup between group-id : current_offset + 1.
You can manage this using the commitAsync() or commitSync() methods of the consumer object.
Reference: https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html

Kafka broker are keep on restarting?

I have 4 kafka brokers and 3 zookeepers deployed upon kubernetes. Out of 4 only 2 kafka brokers are working and rest 2 keep shutting down and restart with below error:
Exiting because log truncation is not allowed for partition byfn-sys-channel-0, current leader's latest offset 2 is less than replica's latest offset 21 (kafka.server.ReplicaFetcherThread)
Below is the config of kafka
KAFKA_ZOOKEEPER_CONNECT zookeeper0:2181,zookeeper1:2181,zookeeper2:2181
KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE false
KAFKA_REPLICA_FETCH_MAX_BYTES 103809024
KAFKA_MIN_INSYNC_REPLICAS 1
KAFKA_MESSAGE_MAX_BYTES 103809024
KAFKA_LOG_RETENTION_MS -1
KAFKA_LOG_DIRS /var/kafkas/kafka2
KAFKA_DEFAULT_REPLICATION_FACTOR 3
KAFKA_BROKER_ID 2
KAFKA_ADVERTISED_LISTENERS PLAINTEXT://kafka2:9092
Please let me know how can i fix this ?
Kafka halting because log truncation is not allowed for topic error shuttng down kafka nodes
Above link shows if log truncation is not allowed for a topic

Kafka Mirror Maker : Sync __consumer_offsets topic duplicates

Following to the solution mentioned here kafka-mirror-maker-failing-to-replicate-consumer-offset-topic. I was able to start mirror maker without any error across DC1(Live Kafka cluster) and DC2(Backup Kafka cluster) clusters.
Looks like it is also able to sync __consumer_offsets topic across DC2 cluster form DC1 cluster.
Issue
If I close down consumer for DC1 and point same consumer(same group_id) to DC2 it reads the same messages again even though mirror maker is able sync offsets for this topic and partitions.
I can see that LOG-END-OFFSET is showing correctly but CURRENT-OFFSET is still pointing to old causing LAG.
Example
Mirror Maker is still running in DC2.
Before consumer shut down in DC1
//DC1 __consumer_offsets topic
+-----------------------------------------------------------------+
| TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG |
+-----------------------------------------------------------------+
| gs.suraj.test.1 0 10626 10626 0 |
| gs.suraj.test.1 2 10619 10619 0 |
| gs.suraj.test.1 1 10598 10598 0 |
+-----------------------------------------------------------------+
Stop consumer in DC1
Before consumer start up in DC2
//DC2 __consumer_offsets topic
+-----------------------------------------------------------------+
| TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG |
+-----------------------------------------------------------------+
| gs.suraj.test.1 0 9098 10614 1516 |
| gs.suraj.test.1 2 9098 10614 1516 |
| gs.suraj.test.1 1 9098 10615 1517 |
+-----------------------------------------------------------------+
Because of this lag, when I start same consumer in DC2 in reads 4549 messages again, which should not happen as it is already read an commited in DC1 and mirror maker have sync __consumer_offsets topic from DC1 to DC2
Please let me know if I am missing anything in here.
If you are using Mirror Maker 2.0 they say explicitly on the motivation that there is no support for exactly-once:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-Motivation
But they intend to do it in the future.

Why isn't kafka continuing to work on fail of one of the brokers?

I am under the impression that with two brokers with sync turned on my kafka setup should keep on working even on fail of one of the broker.
To test it I made a new topic named topicname. Its description is as follows:
Topic:topicname PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topicname Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Then I ran producer.sh and consumer.sh in the following way:
bin/kafka-console-producer.sh --broker-list localhost:9092,localhost:9095 sync --topic topicname
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topicname --from-beginning
Till both the brokers were working I saw that messages were being received properly by the consumer, but when I killed one of the instance of the brokers through kill command then the consumer stopped showing me any new messages. Instead it showed me the following error message:
WARN [ConsumerFetcherThread-console-consumer-57116_ip-<internalipvalue>-1438604886831-603de65b-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 865; ClientId: console-consumer-57116; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [topicname,0] -> PartitionFetchInfo(9,1048576). Possible cause: java.nio.channels.ClosedChannelException (kafka.consumer.ConsumerFetcherThread)
[2015-08-03 12:29:36,341] WARN Fetching topic metadata with correlation id 1 for topics [Set(topicname)] from broker [id:0,host:<hostname>,port:9092] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)
at kafka.producer.SyncProducer.send(SyncProducer.scala:113)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
I had this similar problem, setting the producer config "topic.metadata.refresh.interval.ms" to -1 (or whatever value is suitable for you) solved the issue for me.
So in my case , I had 3 broker (multi broker set up on my local machine) and created the topic with 3 partitions and replication factor 2.
Test set up:
Before the producer config:
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR and topic metadata info (removed down broker as leader) but the producer did not pick it up (may be due to default 10 mins refresh time).So messages end up failing. I get send exceptions.
After the producer config (-1 in my case):
Tried 3 brokers running , killed one of the brokers after producer started, the local Zookeeper updated the ISR info (removed down broker as leader), the producer refreshed the new ISR/topic metadata info and messages send did not fail.
-1 makes it refresh topic metadata on each failed attempt so may be you want to reduce the refresh time to something reasonable instead.
I think there are two things can make your consumer not work after a broker down for kafka HA cluster:
--replication-factor should bigger than 1 for your topic. so every topic partition can have at least one backup.
replication factor for internal topics for kafka configuration should also bigger than 1:
offsets.topic.replication.factor = 3
transaction.state.log.replication.factor = 3
transaction.state.log.min.isr = 2
This two modification make my producer and consumer still work after broker shutdown (5 broker and every broker goes down once) .
You can see in the topic description that you posted that your topic has only a single replica.
With a single replica there is no fault tolerance and if broker 0 (the broker that contains the replica) goes away, the topic will be unavailable.
Create a topic with more replicas (with --replication-factor 3) to have fault tolerance in case of crashes.
I had run into into the same problem even when using a topic with replication factor of 2.
Setting the following property on the producer worked for me.
"metadata.max.age.ms". (Kafka-0.8.2.1)
Else, my Producer was waiting for 1 minute by default to fetch the new leader and start contacting it
For a topic with replication factor N, Kafka tolerate up to N-1 server failures. E.g. having a replication factor 3 will allow you to handle upto 2 server failure.