Kafka gives Invalid receive size with Hyperledger Fabric Orderer connection - apache-kafka

I was setting up a new cluster for Hyperledger Fabric on EKS. The cluster has 4 kafka nodes, 3 zookeeper nodes, 4 peers, 3 orderers, 1 CA. All the containers come up individually, and the kafka/zookeeper backend is also stable. I can SSH into any kafka/zookeeper and check for connections to any other nodes, create topics, post messages etc. The kafka is accessible via Telnet from all orderers.
When I try to create a channel I get the following error from the orderer:
2019-04-25 13:34:17.660 UTC [orderer.common.broadcast] ProcessMessage -> WARN 025 [channel: channel1] Rejecting broadcast of message from 192.168.94.15:53598 with SERVICE_UNAVAILABLE: rejected by Consenter: backing Kafka cluster has not completed booting; try again later
2019-04-25 13:34:17.660 UTC [comm.grpc.server] 1 -> INFO 026 streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Broadcast grpc.peer_address=192.168.94.15:53598 grpc.code=OK grpc.call_duration=14.805833ms
2019-04-25 13:34:17.661 UTC [common.deliver] Handle -> WARN 027 Error reading from 192.168.94.15:53596: rpc error: code = Canceled desc = context canceled
2019-04-25 13:34:17.661 UTC [comm.grpc.server] 1 -> INFO 028 streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=192.168.94.15:53596 error="rpc error: code = Canceled desc = context canceled" grpc.code=Canceled grpc.call_duration=24.987468ms
And the Kafka leader reports the following error:
[2019-04-25 14:07:09,453] WARN [SocketServer brokerId=2] Unexpected error from /192.168.89.200; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 369295617 larger than 104857600)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:132)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:231)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:192)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:528)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:469)
at org.apache.kafka.common.network.Selector.poll(Selector.java:398)
at kafka.network.Processor.poll(SocketServer.scala:535)
at kafka.network.Processor.run(SocketServer.scala:452)
at java.lang.Thread.run(Thread.java:748)
[2019-04-25 14:13:53,917] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

The error indicates that you are receiving messages larger than the permitted maximum size, that defaults to ~100MB. Try to increase the following property in server.properties file, so that it can fit larger receive (in this case at least 369295617 bytes):
# Set to 500MB
socket.request.max.bytes=500000000
and then restart your Kafka Cluster.
If this doesn't work for you, then I guess that you are trying to connect to a non-SSL listener. Therefore, you'd have to verify that broker's SSL listener port is 9092 (or the corresponding port in case you are not using the default one) . The following should do the trick:
listeners=SSL://:9092
advertised.listeners=SSL://:9092
inter.broker.listener.name=SSL

Related

Kafka producer does not signal that all brokers are unreachable

When all brokers/node of a cluster are unreachable, the error in the Kafka producer callback is a generic "Topic XXX not present in metadata after 60000 ms".
When I activate the DEBUG log level, I can see that all attempts to deliver the message to any node are failing:
DEBUG org.apache.kafka.clients.NetworkClient - Initialize connection to node node2.url:443 (id: 2 rack: null) for sending metadata request
DEBUG org.apache.kafka.clients.NetworkClient - Initiating connection to node node2.url:443 (id: 2 rack: null) using address node2.url:443/X.X.X.X:443
....
DEBUG org.apache.kafka.clients.NetworkClient - Disconnecting from node 2 due to socket connection setup timeout. The timeout value is 16024 ms.
DEBUG org.apache.kafka.clients.NetworkClient - Initialize connection to node node0.url:443 (id: 0 rack: null) for sending metadata request
DEBUG org.apache.kafka.clients.NetworkClient - Initiating connection to node node0.url:443 (id: 0 rack: null) using address node0.url:443/X.X.X.X:443
....
DEBUG org.apache.kafka.clients.NetworkClient - Disconnecting from node 0 due to socket connection setup timeout. The timeout value is 17408 ms.
and so on, until, after the deliver timeout, the send() Callback gets the error:
ERROR my.kafka.SenderClass - Topic XXX not present in metadata after 60000 ms.
Unlike bootstrap url, all nodes could be unreachable for example for wrong DNS entries or whatever.
How can the application understand that all nodes were not reachable? This is traced just as DEBUG info and is not avialable to the producer send() callback.
Such an error detail at application level would speed up troubleshoooting.
This error is usually signaled by standard webservice SOAP/REST interface.
The producer only cares about the cluster Controller for bootstrapping and the leaders of the partitions it needs to write to (one of those leaders could be the Controller). That being said, it doesn't need to know about "all" brokers.
How can the application understand that all nodes were not reachable?
If you set acks=1 or acks=all, then the callback should know at least one broker had the data written. If not, there was some error.
You can use an AdminClient outside of the Producer client to describe the topic(s) and fetch metadata about the leader partitions, then use standard TCP socket network requests to try and ping those advertised listeners from Java
FWIW, port 443 should ideally be reserved for HTTPS traffic, not Kafka. Kafka is not a REST/SOAP service.

filebeat to kafka : Failed to connect to broker

Im new in apache environment and currently im trying to send log data from filebeat producer to kafka broker.
environment :
kafka 2.11 (installed via ambari)
filebeat 7.4.2 (windows)
I tried to send logs from filebeat into ambari, I've started kafka servers and created the topic named "test" and it was listed on --list. Im pretty confused about kafka broker's port. In some tutorials i saw they were using 9092 instead 2181. So now ,what port i should use to send logs from filebeat?
here is my filebeat.conf
filebeat.inputs:
- type: log
paths:
- C:/Users/A/Desktop/DATA/mailbox3.csv
output.kafka:
hosts: ["x.x.x.x:9092"]
topic: "test"
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
result
2020-06-10T09:00:32.214+0700 INFO kafka/log.go:53 Failed to connect to broker x.x.x.x:9092: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:32.214+0700 INFO kafka/log.go:53 client/metadata got error from broker -1 while fetching metadata: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:32.215+0700 INFO kafka/log.go:53 kafka message: client/metadata no available broker to send metadata request to
2020-06-10T09:00:32.215+0700 INFO kafka/log.go:53 client/brokers resurrecting 1 dead seed brokers
2020-06-10T09:00:32.215+0700 INFO kafka/log.go:53 client/metadata retrying after 250ms... (3 attempts remaining)
2020-06-10T09:00:32.466+0700 INFO kafka/log.go:53 client/metadata fetching metadata for [test] from broker x.x.x.x:9092
2020-06-10T09:00:34.475+0700 INFO kafka/log.go:53 Failed to connect to broker x.x.x.x:9092: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:34.475+0700 INFO kafka/log.go:53 client/metadata got error from broker -1 while fetching metadata: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:34.477+0700 INFO kafka/log.go:53 kafka message: client/metadata no available broker to send metadata request to
2020-06-10T09:00:34.477+0700 INFO kafka/log.go:53 client/brokers resurrecting 1 dead seed brokers
2020-06-10T09:00:34.478+0700 INFO kafka/log.go:53 client/metadata retrying after 250ms... (2 attempts remaining)
2020-06-10T09:00:34.729+0700 INFO kafka/log.go:53 client/metadata fetching metadata for [test] from broker x.x.x.x:9092
2020-06-10T09:00:36.737+0700 INFO kafka/log.go:53 Failed to connect to broker x.x.x.x:9092: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:36.737+0700 INFO kafka/log.go:53 client/metadata got error from broker -1 while fetching metadata: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:36.738+0700 INFO kafka/log.go:53 kafka message: client/metadata no available broker to send metadata request to
2020-06-10T09:00:36.738+0700 INFO kafka/log.go:53 client/brokers resurrecting 1 dead seed brokers
2020-06-10T09:00:36.738+0700 INFO kafka/log.go:53 client/metadata retrying after 250ms... (1 attempts remaining)
2020-06-10T09:00:36.989+0700 INFO kafka/log.go:53 client/metadata fetching metadata for [test] from broker x.x.x.x:9092
2020-06-10T09:00:39.002+0700 INFO kafka/log.go:53 Failed to connect to broker x.x.x.x:9092: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:39.002+0700 INFO kafka/log.go:53 client/metadata got error from broker -1 while fetching metadata: dial tcp x.x.x.x:9092: connectex: No connection could be made because the target machine actively refused it.
2020-06-10T09:00:39.004+0700 INFO kafka/log.go:53 kafka message: client/metadata no available broker to send metadata request to
2020-06-10T09:00:39.004+0700 INFO kafka/log.go:53 client/brokers resurrecting 1 dead seed brokers
2020-06-10T09:00:39.004+0700 INFO kafka/log.go:53 client/metadata fetching metadata for [test] from broker x.x.x.x:9092
it makes me wonder if i really got 9092 port. So i check out server.properties.Some that i concern the most that:
port=6667
listeners=PLAINTEXT://x.x.x.x:6667
so then i try again to do the filebeat.conf and change port 9092 to 6667 and here is the result
2020-06-10T09:18:01.448+0700 INFO kafka/log.go:53 client/metadata fetching metadata for [test] from broker x.x.x.x:6667
2020-06-10T09:18:01.450+0700 INFO kafka/log.go:53 producer/broker/1001 starting up
2020-06-10T09:18:01.451+0700 INFO kafka/log.go:53 producer/broker/1001 state change to [open] on test/0
2020-06-10T09:18:01.451+0700 INFO kafka/log.go:53 producer/leader/test/0 selected broker 1001
2020-06-10T09:18:01.451+0700 INFO kafka/log.go:53 Failed to connect to broker x.x.x.x:6667: dial tcp: lookup x.x.x.x: no such host
2020-06-10T09:18:01.452+0700 INFO kafka/log.go:53 producer/broker/1001 state change to [closing] because dial tcp: lookup x.x.x.x: no such host
2020-06-10T09:18:01.453+0700 DEBUG [kafka] kafka/client.go:264 finished kafka batch
2020-06-10T09:18:01.453+0700 DEBUG [kafka] kafka/client.go:278 Kafka publish failed with: dial tcp: lookup x.x.x.x: no such host
2020-06-10T09:18:01.454+0700 INFO kafka/log.go:53 producer/leader/test/0 state change to [flushing-3]
2020-06-10T09:18:01.456+0700 INFO kafka/log.go:53 producer/leader/test/0 state change to [normal]
2020-06-10T09:18:01.456+0700 INFO kafka/log.go:53 producer/leader/test/0 state change to [retrying-3]
2020-06-10T09:18:01.456+0700 INFO kafka/log.go:53 producer/leader/test/0 abandoning broker 1001
2020-06-10T09:18:01.456+0700 INFO kafka/log.go:53 producer/broker/1001 shut down
questions
What happened? Which port should be used? What is the use of each port?
Any respond will be appreciated so much. Thank you
UPDATE
according this source the right source is 6667 since kafka was installed via ambari
There is no restriction on the port that can be used, it only depends on the availability.
In the first case, as you said, the broker could have been started on 6667 and hence no process was running on 9092.
2020-06-10T09:18:01.451+0700 INFO kafka/log.go:53 Failed to
connect to broker x.x.x.x:6667: dial tcp: lookup x.x.x.x: no such host
Next, when you mention advertised.listeners property, you should ensure that the IP you mention in the advertised.listeners is the IP assigned to that machine. You cannot assign 1.1.1.1:9092 (just to mention some example).
Execute ifconfig (linux), ipconfig (windows) and see the IP of your machine on the network interface that is accessible from your application machine.
In linux, it will mostly be eth0
This IP must be accessible from the machine where you are running your application.
So the machine your application is running on should be able to resolve that IP. You may also want to check your network connection between your Kafka broker and the machine you are running your application on.

Kafka: Continuously getting FETCH_SESSION_ID_NOT_FOUND

I am continuously getting FETCH_SESSION_ID_NOT_FOUND. I'm not sure why its happening. Can anyone please me here what is the problem and what will be the impact on consumers and brokers.
Kafka Server Log:
INFO [2019-10-18 12:09:00,709] [ReplicaFetcherThread-1-8][] org.apache.kafka.clients.FetchSessionHandler - [ReplicaFetcher replicaId=6, leaderId=8, fetcherId=1] Node 8 was unable to process the fetch request with (sessionId=258818904, epoch=2233): FETCH_SESSION_ID_NOT_FOUND.
INFO [2019-10-18 12:09:01,078] [ReplicaFetcherThread-44-10][] org.apache.kafka.clients.FetchSessionHandler - [ReplicaFetcher replicaId=6, leaderId=10, fetcherId=44] Node 10 was unable to process the fetch request with (sessionId=518415741, epoch=4416): FETCH_SESSION_ID_NOT_FOUND.
INFO [2019-10-18 12:09:01,890] [ReplicaFetcherThread-32-9][] org.apache.kafka.clients.FetchSessionHandler - [ReplicaFetcher replicaId=6, leaderId=9, fetcherId=32] Node 9 was unable to process the fetch request with (sessionId=418200413, epoch=3634): FETCH_SESSION_ID_NOT_FOUND.
Kafka Consumer Log:
12:29:58,936 INFO [FetchSessionHandler:383] [Consumer clientId=bannerGroupMap#87e2af7cf742#test, groupId=bannerGroupMap#87e2af7cf742#test] Node 8 was unable to process the fetch request with (sessionId=1368981303, epoch=60): FETCH_SESSION_ID_NOT_FOUND.
12:29:58,937 INFO [FetchSessionHandler:383] [Consumer clientId=bannerGroupMap#87e2af7cf742#test, groupId=bannerGroupMap#87e2af7cf742#test] Node 3 was unable to process the fetch request with (sessionId=1521862194, epoch=59): FETCH_SESSION_ID_NOT_FOUND.
12:29:59,939 INFO [FetchSessionHandler:383] [Consumer clientId=zoneGroupMap#87e2af7cf742#test, groupId=zoneGroupMap#87e2af7cf742#test] Node 7 was unable to process the fetch request with (sessionId=868804875, epoch=58): FETCH_SESSION_ID_NOT_FOUND.
12:30:06,952 INFO [FetchSessionHandler:383] [Consumer clientId=creativeMap#87e2af7cf742#test, groupId=creativeMap#87e2af7cf742#test] Node 3 was unable to process the fetch request with (sessionId=1135396084, epoch=58): FETCH_SESSION_ID_NOT_FOUND.
12:30:12,965 INFO [FetchSessionHandler:383] [Consumer clientId=creativeMap#87e2af7cf742#test, groupId=creativeMap#87e2af7cf742#test] Node 6 was unable to process the fetch request with (sessionId=1346340004, epoch=56): FETCH_SESSION_ID_NOT_FOUND.
Cluster Details:
Broker: 13 (1 Broker : 14 cores & 36GB memory)
Kafka cluster version: 2.0.0
Kafka Java client version: 2.0.0
Number topics: ~15.
Number of consumers: 7K (all independent and manually assigned all partitions of a topic to a consumers. One consumer is consuming all partitions from a topic only)
This is not an error, it's INFO and it's telling you that you are connected but it can't fetch a session id because there's none to fetch.
It's normal to see this message and the flushing message in the log.
Increase the value of max.incremental.fetch.session.cache.slots. The default value is 1K, in my case I have increased it to 10K and it fixed.
I have increased it at first from 1K to 2K, and in the second step from 2K to 4K, and as long as the limit was not exhausted, there was no appearance of error:
As it seemed to me like a session leak by certain unidentified consumer, I didn't try 10K limit yet, but reading Hrishikesh Mishra's answer, I definitely will. Because, increasing the limit also decreased the frequency of error, so the question of identifying individual consumer groups that are opening excessive number of incremental fetch sessions, mentioned here How to check the actual number of incremental fetch session cache slots used in Kafka cluster? , may be irrelevant in the end.

Kafka mirror maker duplicates when DCs are isolated

We have 5 kafka 1.0.0 clusters:
4 of them are made of 3 nodes and are in different regions in the world
the last one is made of 5 nodes and is an aggregate only cluster.
We are using MirrorMaker (later referenced as MM) to read from the regional clusters and copy the data in the aggregate cluster in our HQ datacenter.
And not sure about where to run it we have currently 2 cases in our prod environment:
MM in the region: reading locally and pushing to aggregate cluster in remote data-center (DC), before committing offsets locally. I tend to call this the push mode (pushing the data)
MM in the DC of the aggregate cluster: reading remotely the data, writing it locally before committing the offsets on remote DC.
What happened is that we got the entire DC where we have our aggregate server totally isolated from a network point of view. And in both cases, we got duplicated records in our aggregate cluster.
Push mode = MM local to the regional cluster, pushing data to remote aggregate cluster
MM started to throw errors like this:
WARN [Producer clientId=producer-1] Got error produce response with correlation id 674364 on topic-partition <topic>-4, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION (org.apache.kafka.clients.producer.internals.Sender)
then:
WARN [Producer clientId=producer-1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
which is ok so far because of idempotence.
But finally we got errors like:
ERROR Error when sending message to topic debug_sip_callback-delivery with key: null, value: 1640 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for <topic>-4: 30032 ms has passed since batch creation plus linger time
ERROR Error when sending message to topic <topic> with key: null, value: 1242 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
java.lang.IllegalStateException: Producer is closed forcefully.
causing MM to stop and I think this is the problem causing duplicates (I need to dig the code, but could be that it lost information about idempotence and on restart it resumed from previously committed offsets).
Pull mode = MM local to the aggregate cluster, pulling data from remote regional cluster
MM instances (with logs at INFO level in this case) started seeing the broker as dead:
INFO [Consumer clientId=mirror-maker-region1-agg-0, groupId=mirror-maker-region1-agg] Marking the coordinator kafka1.region1.internal:9092 (id: 2147483646 rack: null) dead (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
At the same time on the broker side, we got:
INFO [GroupCoordinator 1]: Member mirror-maker-region1-agg-0-de2af312-befb-4af7-b7b0-908ca8ecb0ed in group mirror-maker-region1-agg has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
...
INFO [GroupCoordinator 1]: Group mirror-maker-region1-agg with generation 42 is now empty (__consumer_offsets-2) (kafka.coordinator.group.GroupCoordinator)
Later on MM side, a lot of:
WARN [Consumer clientId=mirror-maker-region1-agg-0, groupId=mirror-maker-region1-agg] Connection to node 2 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
and finally when network came back:
ERROR [Consumer clientId=mirror-maker-region1-agg-0, groupId=mirror-maker-region1-agg] Offset commit failed on partition <topic>-dr-8 at offset 382424879: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
i.e., it could not commit in region1 the offsets written on agg because of the rebalancing. And it resumed after rebalance from previously successfully committed offset causing duplicates.
Configuration
Our MM instances are configured like this:
For our consumer:
bootstrap.servers=kafka1.region1.intenal:9092,kafka2.region1.internal:9092,kafka3.region1.internal:9092
group.id=mirror-maker-region-agg
auto.offset.reset=earliest
isolation.level=read_committed
For our producer:
bootstrap.servers=kafka1.agg.internal:9092,kafka2.agg.internal:9092,kafka3.agg.internal:9092,kafka4.agg.internal:9092,kafka5.agg.internal:9092
compression.type=none
request.timeout.ms=30000
max.block.ms=60000
linger.ms=15000
max.request.size=1048576
batch.size=32768
buffer.memory=134217728
retries=2147483647
max.in.flight.requests.per.connection=1
acks=all
enable.idempotence=true
Any idea how we can get the "only once" delivery on top of exactly once in case of 30 min isolated DCs?

Kafka user authentication using SASL_PLAINTEXT

I'm trying to implement security in Kafka to authenticate the clients using username and password. The jaas config file is configured properly. I'm starting zookeeper first and then starting just one kafka node. However kafka fails to start with the below error:
[2017-08-07 13:07:08,029] INFO Registered broker 0 at path /brokers/ids/0 with addresses: EndPoint(localhost,9092,ListenerName(SASL_PLAINTEXT),SASL_PLAINTEXT) (kafka.utils.ZkUtils)
[2017-08-07 13:07:08,035] INFO Kafka version : 0.11.0.0 (org.apache.kafka.common.utils.AppInfoParser) [2017-08-07 13:07:08,036] INFO Kafka commitId : cb8625948210849f (org.apache.kafka.common.utils.AppInfoParser)
[2017-08-07 13:07:08,037] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2017-08-07 13:07:08,447] WARN Connection to node 0 terminated during authentication. This may indicate that authentication failed due to invalid credentials. (org.apache.kafka.clients.NetworkClient)
[2017-08-07 13:07:08,554] WARN Connection to node 0 terminated during authentication. This may indicate that authentication failed due to invalid credentials. (org.apache.kafka.clients.NetworkClient)
[2017-08-07 13:07:08,662] WARN Connection to node 0 terminated during authentication. This may indicate that authentication failed due to invalid credentials. (org.apache.kafka.clients.NetworkClient)