Kafka Stream Exception: GroupAuthorizationException - apache-kafka

I'm developing a Kafka-Stream application, which will read the message from input Kafka topic and filter unwanted data and push to output Kafka topic.
Kafka Stream Configuration:
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public KafkaStreamsConfiguration kStreamsConfigs() {
Map<String, Object> streamsConfiguration = new HashMap<>();
streamsConfiguration.put(ConsumerConfig.GROUP_ID_CONFIG, "abcd");
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "QC-NormalizedEventProcessor-v1.0.0");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "test:9072");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
streamsConfiguration.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
streamsConfiguration.put(StreamsConfig.producerPrefix(ProducerConfig.ACKS_CONFIG), -1);
streamsConfiguration.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
streamsConfiguration.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, kafkaConsumerProperties.getConsumerJKSFileLocation());
streamsConfiguration.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, kafkaConsumerProperties.getConsumerJKSPwd());
streamsConfiguration.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_SSL");
streamsConfiguration.put(SASL_MECHANISM, "PLAIN");
return new KafkaStreamsConfiguration(streamsConfiguration);
}
KStream Filter Logic:
#Bean
public KStream<String, String> kStreamJson(StreamsBuilder builder) {
KStream<String, String> stream = builder.stream(kafkaConsumerProperties.getConsumerTopic(), Consumed.with(Serdes.String(), Serdes.String()));
/** Printing the source message */
stream.foreach((key, value) -> LOGGER.info(THREAD_NO + Thread.currentThread().getId() + " *****Message From Input Topic: " + key + ": " + value));
KStream<String, String> filteredDocument = stream.filter((k, v) -> filterCondition.test(k, v));
filteredDocument.to(kafkaConsumerProperties.getProducerTopic(), Produced.with(Serdes.String(), Serdes.String()));
/** After filtering printing the same message */
filteredDocument.foreach((key, value) -> LOGGER.info(THREAD_NO + Thread.currentThread().getId() + " #####Filtered Document: " + key + ": " + value));
return stream;
}
While starting above spring based Kafka Stream application, i was getting below exception.
2019-05-27T07:58:36.018-0500 ERROR stream-thread [QC-NormalizedEventProcessor-v1.0.0-e9cb1bed-3d90-41f1-957a-4fc7efc12a02-StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: QC-NormalizedEventProcessor-v1.0.0
Our Kafka Infra team given necessary permission to "group.id", using this same "group id" i can consume the message using other Kafka Consumer applications and I was using name as per my wish in "application.id". We are not adding/updating "application.id" in Kafka Access Control List.
Am really not sure we need to give any permission for "application.id" or am missing something in the Kafka Stream Configuration. Please advice.
Please Note: I have tried with using with "group.id" and without "group.id" in Kafka Stream Configuration, all the time i am getting same exception.
Thanks!
Bharathiraja Shanmugam

I am not at my desk but I think Streams sets the group.id to application.id.

We need to set access for application.id as well.
For more information please refer -> https://docs.confluent.io/current/streams/developer-guide/security.html
Required ACL setting for secure Kafka clusters
Kafka clusters can use ACLs to control access to resources (like the ability to
create topics), and for such clusters each client, including Kafka
Streams, is required to authenticate as a particular user in order to
be authorized with appropriate access. In particular, when Streams
applications are run against a secured Kafka cluster, the principal
running the application must have the ACL set so that the application
has the permissions to create internal topics.
Since all internal topics as well as the embedded consumer group name are prefixed with the application ID, it is recommended to use ACLs on prefixed resource pattern to configure control lists to allow client to manage all topics and consumer groups started with this prefix as --resource-pattern-type prefixed --topic --operation All (see KIP-277 and KIP-290 for details).
For example, given the following setup of your Streams application:
• Config application.id value is team1-streams-app1.
• Authenticating with the Kafka cluster as a team1 user. • The
application's coded topology reads from input topics input-topic1
and input-topic2. • The application's topology write to output
topics output-topic1 and output-topic2.
Then the following commands would create the necessary ACLs in the
Kafka cluster to allow your application to operate:
# Allow Streams to read the input topics:
bin/kafka-acls ... --add --allow-principal User:team1 --operation Read --topic input-topic1 --topic input-topic2
# Allow Streams to write to the output topics:
bin/kafka-acls ... --add --allow-principal User:team1 --operation Write --topic output-topic1 --topic output-topic2
# Allow Streams to manage its own internal topics and consumer groups:
bin/kafka-acls ... --add --allow-principal User:team1 --operation All --resource-pattern-type prefixed --topic team1-streams-app1 --group team1-streams-app1

Related

Changing the Dynamic Default Broker Config using Java AdminClient

Currently I am changing the default broker configurations in my kafka cluster using the kafka-configs.sh script.
./kafka-configs.sh --bootstrap-server <bootstrap_server> --entity-type brokers --entity-default --alter --add-config max.connections=100
The above command would set the default value of max.connections configuration to 100 in all my brokers of the cluster. I would like to achieve the same through Java.
I tried using the alterConfigs method in the AdminClient class. Using this method I am able to set the configuration value, but this value getting at the broker level.
Due to this I would have to execute the alterConfigs for each and every broker in the cluster which is not scalable.
Could anyone help me with changing the default broker configuration using AdminClient class similar to what I was doing with the shell script.
Thank you.
You could use the code below to set configs at broker-default level:
Properties props = new Properties();
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
Map<String, NewPartitions> newPartitions = new HashMap<>();
ConfigResource configResource = new ConfigResource(ConfigResource.Type.BROKER, "");
ConfigEntry entry = new ConfigEntry("max.connections", String.valueOf(100));
AlterConfigOp op = new AlterConfigOp(entry, AlterConfigOp.OpType.SET);
Map<ConfigResource, Collection<AlterConfigOp>> configs = new HashMap<>(1);
configs.put(configResource, Arrays.asList(op));
try (Admin admin = AdminClient.create(props)) {
admin.incrementalAlterConfigs(configs).all().get();
}

Kafka streams fail on decoding timestamp metadata inside StreamTask

We got strange errors on Kafka Streams during starting app
java.lang.IllegalArgumentException: Illegal base64 character 7b
at java.base/java.util.Base64$Decoder.decode0(Base64.java:743)
at java.base/java.util.Base64$Decoder.decode(Base64.java:535)
at java.base/java.util.Base64$Decoder.decode(Base64.java:558)
at org.apache.kafka.streams.processor.internals.StreamTask.decodeTimestamp(StreamTask.java:985)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:303)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:265)
at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:71)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:385)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
and, as a result, error about failed stream: ERROR KafkaStreams - stream-client [xxx] All stream threads have died. The instance will be in error state and should be closed.
According to code inside org.apache.kafka.streams.processor.internals.StreamTask, failure happened due to error in decoding timestamp metadata (StreamTask.decodeTimestamp()). It happened on prod, and can't reproduce on stage.
What could be the root cause of such errors?
Extra info: our app uses Kafka-Streams and consumes messages from several kafka brokers using the same application.id and state.dir (actually we switch from one broker to another, but during some period we connected to both brokers, so we have two kafka streams, one per each broker). As I understand, consumer group lives on broker side (so shouldn't be a problem), but state dir is on client side. Maybe some race condition occurred due to using the same state.dir for two kafka streams? could it be the root cause?
We use kafka-streams v.2.4.0, kafka-clients v.2.4.0, Kafka Broker v.1.1.1, with the following configs:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.timestamp.extractor: org.apache.kafka.streams.processor.WallclockTimestampExtractor
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
commit.interval.ms: 5000
num.stream.threads: 1
auto.offset.reset: latest
Finally, we figured out what is the root cause of corrupted metadata by some consumer groups.
It was one of our internal monitoring tool (written with pykafka) that corrupted metadata by temporarily inactive consumer groups.
Metadata were unencrupted and contained invalid data like the following: {"consumer_id": "", "hostname": "monitoring-xxx"}.
In order to understand what exactly we have in consumer metadata, we could use the following code:
Map<String, Object> config = Map.of( "group.id", "...", "bootstrap.servers", "...");
String topicName = "...";
Consumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<byte[], byte[]>(config, new ByteArrayDeserializer(), new ByteArrayDeserializer());
Set<TopicPartition> topicPartitions = kafkaConsumer.partitionsFor(topicName).stream()
.map(partitionInfo -> new TopicPartition(topicName, partitionInfo.partition()))
.collect(Collectors.toSet());
kafkaConsumer.committed(topicPartitions).forEach((key, value) ->
System.out.println("Partition: " + key + " metadata: " + (value != null ? value.metadata() : null)));
Several options to fix already corrupted metadata:
change consumer group to a new one. caution that you might lose or duplicate messages depending on the latest or earliest offset reset policy. so for some cases, this option might be not acceptable
overwrite metadata manually (timestamp is encoded according to logic inside StreamTask.decodeTimestamp()):
Map<TopicPartition, OffsetAndMetadata> updatedTopicPartitionToOffsetMetadataMap = kafkaConsumer.committed(topicPartitions).entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, (entry) -> new OffsetAndMetadata((entry.getValue()).offset(), "AQAAAXGhcf01")));
kafkaConsumer.commitSync(updatedTopicPartitionToOffsetMetadataMap);
or specify metadata as Af////////// that means NO_TIMESTAMP in Kafka Streams.

Too many UnknownProducerIdException in kafka broker for kafka internal topics created for kafka streams application

One of Kafka stream application is generating a lot of Unknown Producer Id errors in the Kafka brokers as well as on the consumer side.
Stream Configs are as below:
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, appName);
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG,appName + "-Client");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServer);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,StreamsConfig.EXACTLY_ONCE);
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offset);
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG,state_dir);
streamsConfiguration.put(StreamsConfig.REPLICATION_FACTOR_CONFIG,defaultReplication);
return streamsConfiguration;
Error on the broker side:
Error on the consumer side:
custom configuration for repartition internal topic:
prod.Prod-Job-Summary-v0.4-KTABLE-AGGREGATE-STATE-STORE-0000000049-repartition
What can be the reason behind these?
It's a known issue. See KAFKA-7190
Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID and KIP-360: Improve handling of unknown producer.

Kafka: dynamically query configurations

Is there a way to access the configuration values in server.properties without direct access to that file itself?
I thought that:
kafka-configs.sh --describe --entity-type topics --zookeeper localhost:2181
might give me what I want, but I did not see the values set in server.properties. Just the following (I set 'ddos' as my own topic from kafka-topics.sh):
Configs for topics:ddos are
Configs for topics:__consumer_offsets are segment.bytes=104857600,cleanup.policy=compact
I was thinking I'd also see globally configured options, like this from the default configuration I have:
log.retention.hours=168
Thanks in advance.
Since Kafka 0.11, you can use the AdminClient describeConfigs() API to retrieve configuration of brokers.
For example, skeleton code to retrieve configuration for broker 0:
Properties adminProps = new Properties();
adminProps.load(new FileInputStream("admin.properties"));
AdminClient admin = KafkaAdminClient.create(adminProps);
Collection<ConfigResource> resources = new ArrayList<>();
ConfigResource cr = new ConfigResource(Type.BROKER, "0");
resources.add(cr);
DescribeConfigsResult dcr = admin.describeConfigs(resources);
System.out.println(dcr.all().get());

How can I get the group.id of a topic in command line in Kafka?

I installed kafka on my server and want to learn how to use it,
I found a sample code written by scala, below is part of it,
def createConsumerConfig(zookeeper: String, groupId: String): ConsumerConfig = {
val props = new Properties()
props.put("zookeeper.connect", zookeeper)
props.put("group.id", groupId)
props.put("auto.offset.reset", "largest")
props.put("zookeeper.session.timeout.ms", "400")
props.put("zookeeper.sync.time.ms", "200")
props.put("auto.commit.interval.ms", "1000")
val config = new ConsumerConfig(props)
config
}
but I don't know how to find the group id on my server.
The group id is something you define yourself for your consumer by providing a string id for it. All consumers started with the same id will "cooperate" and read topics in a coordinated way where each consumer instance will handle a subset of the messages in a topic. Providing a non-existent group id will be considered to be a new consumer and create a new entry in Zookeeper where committed offsets will be stored.
You could get a Zookeeper shell and list path where Kafka stores consumers' offsets like this:
./bin/zookeeper-shell.sh localhost:2181
ls /consumers
You'll get a list of all groups.
EDIT: I missed the part where you said that you're setting this up yourself so I thought that you want to list the consumer groups of an existing cluster.
Lundahl is right, this is a property that you define, which is used to coordinate consumer threads so that they don't consume "each other's" messages (each consumes a subset). If you, for example, use 2 consumers with different groups, they'll each consume the whole topic.
/kafkadir/kafka-consumer-groups.sh --all-topics --bootstrap-server hostname:port --list