Kafka issue in kubernetes - kubernetes

When any message push to Kafka topic i'm seeing below message.
my message contains only 5 elements and it is plain text only it wont exceed 300 characters still seeing below error.
WARN [SocketServer brokerId=0] Unexpected error from /127.0.0.1; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1920298859 larger than 504857600)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:104)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:381)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:342)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:609)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:541)
at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
at kafka.network.Processor.poll(SocketServer.scala:689)
at kafka.network.Processor.run(SocketServer.scala:594)
at java.lang.Thread.run(Thread.java:748)
If no active consumer for long time around 8 to 10 hours kafka itself killing in Kubernetes.
I want some good article to setup Kafka cluster in Kubernetes any help appreciated.

1920298859 is a recognizable value, it's ruok decoded as a integer:
>>> struct.unpack("!I", struct.pack("!4s", "ruok".encode("UTF8")))
(1920298859,)
It looks like you have some Zookeeper tooling/healthcheck trying to connect to Kafka. The ruok command should only be sent to Zookeeper hosts. Kafka only accepts connections from Kafka clients.

Related

How to set consumer config values for Kafka Mirrormaker-2 2.6.1?

I am attempting to use mirrormaker 2 to replicate data between AWS Managed Kafkas (MSK) in 2 different AWS regions - one in eu-west-1 (CLOUD_EU) and the other in us-west-2 (CLOUD_NA), both running Kafka 2.6.1. For testing I am currently trying just to replicate topics 1 way, from EU -> NA.
I am starting a mirrormaker connect cluster using ./bin/connect-mirror-maker.sh and a properties file (included)
This works fine for topics with small messages on them, but one of my topic has binary messages up to 20MB in size. When I try to replicate that topic I get an error every 30 seconds
[2022-04-21 13:47:05,268] INFO [Consumer clientId=consumer-29, groupId=null] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2: {}. (org.apache.kafka.clients.FetchSessionHandler:481)
org.apache.kafka.common.errors.DisconnectException
When logging in DEBUG to get more information we get
[2022-04-21 13:47:05,267] DEBUG [Consumer clientId=consumer-29, groupId=null] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient:784)
[2022-04-21 13:47:05,268] DEBUG [Consumer clientId=consumer-29, groupId=null] Cancelled request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=consumer-29, correlationId=35) due to node 2 being disconnected (org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient:593)
It gets stuck in a loop constantly disconnecting with request timeout every 30s and then trying again.
Looking at this, I suspect that the problem is the request.timeout.ms is on the default (30s) and it times out trying to read the topic with many large messages.
I followed the guide at https://github.com/apache/kafka/tree/trunk/connect/mirror to attempt to configure the consumer properties, however, no matter what I set, the timeout for the consumer remains fixed at the default, confirmed both by kafka outputting its config in the log and by timing how long between the disconnect messages. e.g. I set:
CLOUD_EU.consumer.request.timeout.ms=120000
In the properties that I start MM-2 with.
based on various guides I have found while looking at this, I have also tried
CLOUD_EU.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.override.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.override.request.timeout.ms=120000
None of which have worked.
How can I change the consumer request.timeout setting? The log is approx 10,000 lines long, but everywhere where the ConsumerConfig is logged out it logs request.timeout.ms = 30000
Properties file I am using:
# specify any number of cluster aliases
clusters = CLOUD_EU, CLOUD_NA
# connection information for each cluster
CLOUD_EU.bootstrap.servers = kafka.eu-west-1.amazonaws.com:9092
CLOUD_NA.bootstrap.servers = kafka.us-west-2.amazonaws.com:9092
# enable and configure individual replication flows
CLOUD_EU->CLOUD_NA.enabled = true
CLOUD_EU->CLOUD_NA.topics = METRICS_ATTACHMENTS_OVERSIZE_EU
CLOUD_NA->CLOUD_EU.enabled = false
replication.factor=3
tasks.max = 1
############################# Internal Topic Settings #############################
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
############################ Kafka Settings ###################################
# CLOUD_EU cluster over writes
CLOUD_EU.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.session.timeout.ms=150000

Cannot create more than 15 topics in Kafka

Me and my colleague were testing out Kafka on a 3 nodes cluster and we encountered this problem were trying to test the performance of sending message to multiple topics. We can't create more than 15 topics. The first 15 topics works fine. But when trying to create the 16th topic(and topics onward), a lot of errors started to appear in the 2 follower servers.
One with a lot of errors like this:
ERROR [ReplicaFetcherThread-0-1], Error for partition [__consumer_offsets,36] to broker 1:org.apache.kafka.common.errors.UnknownServerException: The server experienced an unexpected error when processing the request (kafka.server.ReplicaFetcherThread)
The other with errors like this:
[2017-06-16 18:44:07,146] ERROR [KafkaApi-1] Error when handling request {replica_id=2,max_wait_time=500,min_bytes=1,max_bytes=10485760,topics=[{topic=__consumer_offsets,partitions=[{partition=6,fetch_offset=5,max_bytes=1048576},{partition=36,fetch_offset=3,max_bytes=1048576},{partition=18,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-12,partitions=[{partition=1,fetch_offset=1,max_bytes=1048576}]},{topic=multi-test-11,partitions=[{partition=2,fetch_offset=1,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=0,fetch_offset=5,max_bytes=1048576},{partition=45,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-16,partitions=[{partition=0,fetch_offset=0,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=27,fetch_offset=0,max_bytes=1048576},{partition=12,fetch_offset=0,max_bytes=1048576},{partition=9,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-10,partitions=[{partition=0,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-9,partitions=[{partition=2,fetch_offset=0,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=39,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-4,partitions=[{partition=1,fetch_offset=1,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=21,fetch_offset=10,max_bytes=1048576}]},{topic=multi-test-3,partitions=[{partition=2,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-13,partitions=[{partition=0,fetch_offset=1,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=3,fetch_offset=10,max_bytes=1048576},{partition=48,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-8,partitions=[{partition=0,fetch_offset=0,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=33,fetch_offset=0,max_bytes=1048576},{partition=30,fetch_offset=15,max_bytes=1048576},{partition=15,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-1,partitions=[{partition=1,fetch_offset=1,max_bytes=1048576}]},{topic=multi-test-0,partitions=[{partition=2,fetch_offset=0,max_bytes=1048576}]},{topic=multi-test-2,partitions=[{partition=1,fetch_offset=1,max_bytes=1048576}]},{topic=__consumer_offsets,partitions=[{partition=42,fetch_offset=3,max_bytes=1048576},{partition=24,fetch_offset=0,max_bytes=1048576}]}]} (kafka.server.KafkaApis)
kafka.common.NotAssignedReplicaException: Leader 1 failed to record follower 2's position -1 since the replica is not recognized to be one of the assigned replicas for partition multi-test-16-0.
at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:246)
at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920)
at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:917)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:917)
at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:462)
at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:530)
at kafka.server.KafkaApis.handle(KafkaApis.scala:81)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
at java.lang.Thread.run(Thread.java:748)
We assigned a replication factor of 2 and a partition of 3 for each topic and every topic is created in the same way.I deleted and recreated each topic manually just to make sure that 15-16 is the exact number that everything went wrong.
Well, weird problems seems to always have weird answers.
Turns out, the problem is that one of our node is using a x32 cpu, switching it to a x64 machine solved the problem.

Kafka Connection error in contoller.logs

I am using single node Kafka(v 0.10.2) and single node zookeeper (v 3.4.8) and my controller.log file is filled with this exception
java.io.IOException: Connection to 1 was disconnected before the response was read
at kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114)
at kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:257)
at kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112)
at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
at kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142)
at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:192)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
I googled this exception but was not able to find the root cause for this exception. Can someone suggest me why this error is happening and how to prevent it?
I also encounter same issue in multi-node cluster scenario. It was because of connection shutdown between kafka-node and zookeeper. I would suggest to restart zookeeper server then kafka-node to re-establish the connection therefore broker should be handle pub/sub message transition.
Hope it would raise you from this.

Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for test-0

today a message prompt out when I try to send message to consumer console through producer console
[2016-11-02 15:12:58,168] ERROR Error when sending message to topic test with
key: null, value: 5 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s)
expired due to timeout while requesting metadata from brokers for test-0
Why is this happened? Is this consider as Kafka problem or Zookeeper problem?
Seems that client failed to retrieve metadata for test-0 from the kafka brokers.
Either make sure you are able to connect to the kafka brokers or check if 'advertised.listeners' is set if you are running kafka on IaaS machines.
Well after I rebooted the whole server the problem is gone.

Kafka Console Producer sending message failed

I would like to ask anyone face before this kind of stuation?
Few days ago the kafka is able to work properly but today it starting having problem. The console producer is unable to send message and receive by console consumer. After few seconds it prompt :
" ERROR Error when sending message to topic test with key: null, value: 11 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. "
Anyone can help? :'(
You'll see this if the broker you are bootstrapping from is not reachable. Try to telnet to the host and port you are trying to access from the producer. If that communication is OK, turn on debug logging by locating the tools-log4j.properties and changing the warn level to debug.