kafka client getting TOPIC_AUTHORIZATION_FAILED on cluster restart - apache-kafka

I am using an SSL enabled kafka cluster to connect to consumer and publish messages. Below is the tech stack.
spring-kafka : 2.6.6
spring-boot : 2.4.3
Kafka properties
kafka:
bootstrap-servers: ${BOOTSTRAP-SERVERS-HOST}
subscription-topic: TEST
properties:
security.protocol: SSL
ssl.truststore.location: ${SUBSCRIPTION_TRUSTSTORE_PATH}
ssl.truststore.password: ${SUBSCRIPTION_TRUSTSTORE_PWD}
ssl.keystore.location: ${SUBSCRIPTION_KEYSTORE_PATH}
ssl.keystore.password: ${SUBSCRIPTION_KEYSTORE_PWD}
Issue:
Kafka Client application is up connected to the kafka cluster consumer and publishing messages as expected.
Now we stop the kafka broker/cluster below error is logged.
could not be established. Broker may not be available.
This is fine and expected as broker/cluster is down.
Now We start the broker/cluster and below error start appearing and kafka consumer stops consuming messages from topic however kafka publisher is able to send message to the topic. [application restart resolves this issue]
Trying to understand the root cause any help is much appreciated.
2022-01-13 13:34:52.078 [TEST.CONSUMER-GROUP-0-C-1] ERROR--SUBSCRIPTION - -org.apache.kafka.clients.Metadata.checkUnauthorizedTopics - [Consumer clientId=consumer-TEST.CONSUMER-GROUP-1, groupId=TEST.CONSUMER-GROUP] Topic authorization failed for topics [TEST]
2022-01-13 13:34:52.078 [TEST.CONSUMER-GROUP-0-C-1] ERROR- -SUBSCRIPTION - -org.springframework.core.log.LogAccessor.error - Authorization Exception and no authorizationExceptionRetryInterval set
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [TEST]
2022-01-13 13:34:52.081 [TEST.CONSUMER-GROUP-0-C-1] ERROR- IRVS-SUBSCRIPTION - -org.springframework.core.log.LogAccessor.error - Fatal consumer exception; stopping container
2022-01-13 13:34:52.083 [TEST.CONSUMER-GROUP-0-C-1] INFO - IRVS-SUBSCRIPTION - -org.springframework.scheduling.concurrent.ExecutorConfigurationSupport.shutdown - Shutting down ExecutorService

Above issue was resolved after adding
AuthorizationException RetryInterval
Below is an example illustrating this
#Bean
ConcurrentKafkaListenerContainerFactory<Object, Object> kafkaListenerContainerFactory(
ConcurrentKafkaListenerContainerFactoryConfigurer configurer,
ConsumerFactory<Object, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<Object, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConcurrency(2);
configurer.configure(factory, kafkaConsumerFactory);
// other setters like error handler , retry handler
// setting Authorization Exception Retry Interval
factory.getContainerProperties()
.setAuthorizationExceptionRetryInterval(Duration.ofSeconds(5l));
return factory;
}

Related

Kafka Admin client unregistered causing metadata issues

After migrating our microservice functionality to Spring Cloud function we have been facing issues with one of the producer topics.
Event of type: abc and key: xxx_yyy could not be sent to kafka org.springframework.messaging.MessageHandlingException: error occurred in message handler [org.springframework.cloud.stream.binder.kafka.KafkaMessageChannelBinder$ProducerConfigurationMessageHandler#2333d598]; nested exception is org.springframework.kafka.KafkaException: Send failed; nested exception is org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
o.s.kafka.support.LoggingProducerListener - Exception thrown when sending a message with key='byte[15]' and payload='byte[256]' to topic pc-abc and partition 6: org.apache.kafka.common.errors.TimeoutException: Topic pc-abc not present in metadata after 60000 ms.
FYI: Topics are already created in our staging/prod environment and are not to be created as the application starts.
My producer config:
spring.cloud.stream.bindings.pc-abc-out-0.content-type=application/json
spring.cloud.stream.bindings.pc-abc-out-0.destination=pc-abc
spring.cloud.stream.bindings.pc-abc-out-0.producer.header-mode=headers
***spring.cloud.stream.bindings.pc-abc-out-0.producer.partition-count=5***
spring.cloud.stream.bindings.pc-abc-out-0.producer.partitionKeyExpression=payload.key
spring.cloud.stream.kafka.bindings.pc-abc-out-0.producer.sync=true
I am kind of stuck at this point and exhausted. Has anyone else faced this issue?
Spring Cloud version: 2.5.5
Kafka: 2.7.1
The issue is :
The producer is configured with partition-count=5
and Kafka is looking for partition number 6 , which obviously does not exist , I have commented the auto-add partitions property, but the issue still turns up !! Is it stale configuration? How do I force kafka to take up new configuration.

Not authorized to access topics inside Event Hub namespace

I have Event Hub Namespace with two Event Hubs (event-hub and event-hub-2). To establish connection I use Kafka - of course namespace is with Standard Tier. When I try to connect to the second EH (event-hub-2 as a Kafka Topic, Connection String as a Kafka Password) I got following stacktrace:
2021-06-17T15:56:04.976Z - WARN: [NetworkClient] [Consumer clientId=consumer-$Default-1, groupId=$Default] Error while fetching metadata with correlation id 11 : {event-hub=TOPIC_AUTHORIZATION_FAILED}
2021-06-17T15:56:04.980Z - ERROR: [Metadata] [Consumer clientId=consumer-$Default-1, groupId=$Default] Topic authorization failed for topics [event-hub]
2021-06-17T15:56:05.007Z - ERROR: [KafkaConsumerActor] [9e1ad] Exception when polling from consumer, stopping actor: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [event-hub]
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [event-hub]
My question is: WHY I could got this kind of stacktrace when I didn't even try to connect to topic/EH from stacktrace? It's a weird...
If you are using the same consumer group in both scenarios, your consumer needs read access to all topics used in the consumer group, try changing the group.id and test again.
The problem came back when I connect my subscribers to Event Hubs simultaneously. Just like Ran said, connecting to different consumer groups resolved problem. Many thanks!

Can't start Kafka Connect: Timeout expired while fetching topic metadata

Trying to run Kafka Connect for the first time, with an existing Kafka deployment. using SASL_PLAINTEXT and kerberos authentication.
The first time I try and start connect-distributed, I see:
ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
If I immediately run a second time, not changing anything, instead I see:
ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:227)
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Offsets]
This is reproducible.
Worker config:
producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
bootstrap.servers=mybroker:9092
rest.port=28082
group.id=some-group
config.storage.topic=Configs
offset.storage.topic=Offsets
status.storage.topic=Status
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
rest.advertised.host.name=localhost
log4j.root.loglevel=INFO
security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka
sasl.mechanism=GSSAPI
consumer.security.protocol=SASL_PLAINTEXT
consumer.sasl.kerberos.service.name=kafka
consumer.sasl.mechanism=GSSAPI
producer.security.protocol=SASL_PLAINTEXT
producer.sasl.kerberos.service.name=kafka
producer.sasl.mechanism=GSSAPI
A career in software has taught me to always assume that the problem is completely unrelated to the error log, but for once it was correct:
Ranger was configured incorrectly and I genuinely wasn't authorized to access that topic.

Kafka - org.apache.kafka.common.errors.NetworkException

I have a kafka client code which connects to Kafka( Server 0.10.1 and client is 0.10.2) brokers. There are 2 topics with 2 different consumer group in the code and also there is a producer. Getting the NetworkException from the producer code once in a while( once in 2 days, once in 5 days, ...). We see consumer group (Re)joining info in the logs for both the consumer group followed by the NetworkException from the producer future.get() call. Not sure why are we getting this error.
Code :-
final Future<RecordMetadata> futureResponse =
producer.send(new ProducerRecord<>("ping_topic", "ping"));
futureResponse.get();
Exception :-
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
Kafka API definition for NetworkException,
"A misc. network-related IOException occurred when making a request.
This could be because the client's metadata is out of date and it is
making a request to a node that is now dead."
Thanks
I had the same error while testing the Kafka Consumer. I was using a sender template for it.
In the consumer configuration I set additionally the following properties:
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15000);
After sending the Message I added a thread sleep:
ListenableFuture<SendResult<String, String>> future =
senderTemplate.send(MyConsumer.TOPIC_NAME, jsonPayload);
Thread.Sleep(10000).
It was necessary to make the test work, but maybe not suitable for your case.

Kafka Producer: Got error produce response with correlation NETWORK_EXCEPTION

We are running kafka in distributed mode across 2 servers.
I'm sending messages to Kafka through Java sdk to a Queue which has Replication factor 2 and 1 partition.
We are running in async mode.
I don't find anything abnormal in Kafka logs.
Can anyone help in finding out what could be cause?
Properties props = new Properties();
props.put("bootstrap.servers", serverAdress);
props.put("acks", "all");
props.put("retries", "1");
props.put("linger.ms",0);
props.put("buffer.memory",10240000);
props.put("max.request.size", 1024000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, Object> producer = new org.apache.kafka.clients.producer.KafkaProducer<>(props);
Exception trace:
-2017-08-15T02:36:29,148 [kafka-producer-network-thread | producer-1] WARN producer.internals.Sender - Got error produce response with
correlation id 353736 on topic-partition BPA_BinLogQ-0, retrying (0
attempts left). Error: NETWORK_EXCEPTION
You are getting a NETWORK_EXCEPTION so this should tell you that something is wrong with the network connection to the Kafka Broker you were producing toward. Either the broker shutdown or the TCP connection was shutdown for some reason.
A quick code dive shows the most probable cause: lost connection to the upstream broker, what causes the delivery method to fail internally inside a sender (link) - you might want to start logging trace in Sender to confirm that:
if (response.wasDisconnected()) {
log.trace("Cancelled request with header {} due to node {} being disconnected",
requestHeader, response.destination());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION, String.format("Disconnected from node %s", response.destination())),
correlationId, now);
}
Now with the batch completed in a non-success fashion, it gets retried, but from the logs you have attached it looks like, you ran out of retries (0 attempts left), so it propagates to your level (link):
if (canRetry(batch, response, now)) {
log.warn(
"Got error produce response with correlation id {} on topic-partition {}, retrying ({} attempts left). Error: {}",
....
reenqueueBatch(batch, now);
}
So the ideas are:
investigate your network connectivity - unfortunately this might mean tracing at least on client-side (esp. NetworkClient that does all the upstream broker management) to see if there's any connection loss;
increase producer's retries value (though newer versions of Kafka set it to MAX_INT or so).