Kafka streams fail on decoding timestamp metadata inside StreamTask - apache-kafka

We got strange errors on Kafka Streams during starting app
java.lang.IllegalArgumentException: Illegal base64 character 7b
at java.base/java.util.Base64$Decoder.decode0(Base64.java:743)
at java.base/java.util.Base64$Decoder.decode(Base64.java:535)
at java.base/java.util.Base64$Decoder.decode(Base64.java:558)
at org.apache.kafka.streams.processor.internals.StreamTask.decodeTimestamp(StreamTask.java:985)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:303)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:265)
at org.apache.kafka.streams.processor.internals.AssignedTasks.initializeNewTasks(AssignedTasks.java:71)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:385)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
and, as a result, error about failed stream: ERROR KafkaStreams - stream-client [xxx] All stream threads have died. The instance will be in error state and should be closed.
According to code inside org.apache.kafka.streams.processor.internals.StreamTask, failure happened due to error in decoding timestamp metadata (StreamTask.decodeTimestamp()). It happened on prod, and can't reproduce on stage.
What could be the root cause of such errors?
Extra info: our app uses Kafka-Streams and consumes messages from several kafka brokers using the same application.id and state.dir (actually we switch from one broker to another, but during some period we connected to both brokers, so we have two kafka streams, one per each broker). As I understand, consumer group lives on broker side (so shouldn't be a problem), but state dir is on client side. Maybe some race condition occurred due to using the same state.dir for two kafka streams? could it be the root cause?
We use kafka-streams v.2.4.0, kafka-clients v.2.4.0, Kafka Broker v.1.1.1, with the following configs:
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.timestamp.extractor: org.apache.kafka.streams.processor.WallclockTimestampExtractor
default.deserialization.exception.handler: org.apache.kafka.streams.errors.LogAndContinueExceptionHandler
commit.interval.ms: 5000
num.stream.threads: 1
auto.offset.reset: latest

Finally, we figured out what is the root cause of corrupted metadata by some consumer groups.
It was one of our internal monitoring tool (written with pykafka) that corrupted metadata by temporarily inactive consumer groups.
Metadata were unencrupted and contained invalid data like the following: {"consumer_id": "", "hostname": "monitoring-xxx"}.
In order to understand what exactly we have in consumer metadata, we could use the following code:
Map<String, Object> config = Map.of( "group.id", "...", "bootstrap.servers", "...");
String topicName = "...";
Consumer<byte[], byte[]> kafkaConsumer = new KafkaConsumer<byte[], byte[]>(config, new ByteArrayDeserializer(), new ByteArrayDeserializer());
Set<TopicPartition> topicPartitions = kafkaConsumer.partitionsFor(topicName).stream()
.map(partitionInfo -> new TopicPartition(topicName, partitionInfo.partition()))
.collect(Collectors.toSet());
kafkaConsumer.committed(topicPartitions).forEach((key, value) ->
System.out.println("Partition: " + key + " metadata: " + (value != null ? value.metadata() : null)));
Several options to fix already corrupted metadata:
change consumer group to a new one. caution that you might lose or duplicate messages depending on the latest or earliest offset reset policy. so for some cases, this option might be not acceptable
overwrite metadata manually (timestamp is encoded according to logic inside StreamTask.decodeTimestamp()):
Map<TopicPartition, OffsetAndMetadata> updatedTopicPartitionToOffsetMetadataMap = kafkaConsumer.committed(topicPartitions).entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, (entry) -> new OffsetAndMetadata((entry.getValue()).offset(), "AQAAAXGhcf01")));
kafkaConsumer.commitSync(updatedTopicPartitionToOffsetMetadataMap);
or specify metadata as Af////////// that means NO_TIMESTAMP in Kafka Streams.

Related

Kafka stream: "TopicAuthorizationException: Not authorized to access topics" for an internal state store

Java: OpenJdk 11
Kafka: 2.2.0
Kafka streams lib: 2.3.0
I am trying to deploy my Kafka streams application in a docker container and it fails while trying to create an internal state store with a TopicAuthorizationException.
It works well locally. The main difference between locally and on the server is that there it connects to a server deployed Kafka and authenticates using the usual Kerberos auth.
I fail to understand the link between authentication and the local stores.
My stream looks like that:
StreamsBuilder builder = new StreamsBuilder();
//We stream from the source topic
KStream<String, EnrichedMessagePayload> sourceMessagesStream = builder.stream(sourceTopic, Consumed
.with(Serdes.serdeFrom(String.class), INPUT_SERDE));
//We group per room and window
TimeWindowedKStream<String, EnrichedMessagePayload> windowed = sourceMessagesStream
.groupByKey().windowedBy(TimeWindows.of(Duration.ofMillis(windowSize)).grace(Duration.ZERO));
//We make them a list
KStream<Windowed<String>, WindowedMessages> grouped = windowed
.aggregate(WindowedMessages::new,
(key, value, aggregate) -> aggregate.add(value),
Materialized.with(Serdes.String(), Serdes.serdeFrom(windowSerializer, windowSerializer)))
.suppress(Suppressed.untilWindowCloses(unbounded()))
.toStream();
//Filter
KStream<Windowed<String>, FilterResult> filtered = grouped
.mapValues((readOnlyKey, value) -> filterWindow(value.getMessages()));
//Re map to its original form
KStream<String, OutputPayload> reduced = filtered
.flatMap((KeyValueMapper<Windowed<String>, WindowedMessages, Iterable<KeyValue<String, OutputPayload>>>) (key, value) -> value
.getMessages()
.stream().map(payload -> new KeyValue<>(key.key(), payload))
.collect(toList()));
//Target topic
reduced.to(sinkTopic, Produced
.with(Serdes.serdeFrom(String.class), SERDE));
return builder.build();
It receives a stream of messages, windows it, aggregates all the messages per window, keeps only the last version of the list with a 'Suppressed' and then flatMaps the whole to forward it to another topic.
Every time i get that kind of exception:
Error message was: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
2019-10-09 06:44:03.255 +0000 ERROR [filterer-d83f2f60-b2bd-40b2-a314-4b20f32918f7-StreamThread-1] [StreamThread.java:777] - stream-thread [filterer-d83f2f60-b2bd-40b2-a314-4b20f32918f7-StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors: - [rapid_r-live-message-filterer-0-0-1-snapshot-10.1e842f1a-ea60-11e9-9c7d-024298932744] - [] - []
org.apache.kafka.streams.errors.StreamsException: Could not create topic filterer-KTABLE-SUPPRESS-STATE-STORE-0000000005-changelog.
at org.apache.kafka.streams.processor.internals.InternalTopicManager.getNumPartitions(InternalTopicManager.java:212)
at org.apache.kafka.streams.processor.internals.InternalTopicManager.validateTopics(InternalTopicManager.java:226)
at org.apache.kafka.streams.processor.internals.InternalTopicManager.makeReady(InternalTopicManager.java:104)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.prepareTopic(StreamsPartitionAssignor.java:971)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:618)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:424)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:622)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:107)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:544)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:527)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:978)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:958)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:578)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:388)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:415)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:358)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:353)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:941)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:846)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
Caused by: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
It is not "authentication" but "authorization". Look at your log messages, it says "Not authorized to access topics". As far as I can see, you are not authorized to create the internal topic 'filterer-KTABLE-SUPPRESS-STATE-STORE-0000000005-changelog' that backs your local suppress state store. State stores included in Kafka Streams are backed by default by a topic on the Kafka brokers. This internal topics are used during failover to restore local state stores. These internal topics are created automatically by the Kafka Streams application, thus the application needs to have appropriate permissions to create them.
See https://kafka.apache.org/23/documentation/streams/developer-guide/security.html#id1 for more information. There it says "the principal running the application must have the ACL set so that the application has the permissions to create, read and write internal topics."

Cannot Restart Kafka Consumer Application, Failing due to OffsetOutOfRangeException

Currently, my Kafka Consumer streaming application is manually committing the offsets into Kafka with enable.auto.commit set to false.
The application failed when I tried restarting it throwing below exception:
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:{partition-12=155555555}
Assuming the above error is due to the message not present/partition deleted due to retention period, I tried below method:
I disabled the manual commit and enabled auto commit(enable.auto.commit=true and auto.offset.reset=earliest)
Still it fails with the same error
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:{partition-12=155555555}
Please suggest ways to restart the job so that it can successfully read the correct offset for which message/partition is present
You are trying to read offset 155555555 from partition 12 of topic partition, but -most probably- it might have already been deleted due to your retention policy.
You can either use Kafka Streams Application Reset Tool in order to reset your Kafka Streams application's internal state, such that it can reprocess its input data from scratch
$ bin/kafka-streams-application-reset.sh
Option (* = required) Description
--------------------- -----------
* --application-id <id> The Kafka Streams application ID (application.id)
--bootstrap-servers <urls> Comma-separated list of broker urls with format: HOST1:PORT1,HOST2:PORT2
(default: localhost:9092)
--intermediate-topics <list> Comma-separated list of intermediate user topics
--input-topics <list> Comma-separated list of user input topics
--zookeeper <url> Format: HOST:POST
(default: localhost:2181)
or start your consumer using a fresh consumer group ID.
I met the same problem and I use package org.apache.spark.streaming.kafka010 in my application.In the begining,I suscepted the auto.offset.reset strategy take no effect,but when I read the description of the method fixKafkaParams in the object KafkaUtils,i found the configuration has been overwrited.I guess the reason why it tweak the configuration ConsumerConfig.AUTO_OFFSET_RESET_CONFIG for executor is to keep consistent offset obtained by driver and executor.

Kafka - how to use #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'?

I'm trying to implement Kafka consumer with topic names as a pattern. E.g. #KafkaListener(topicPattern="${kafka.topics}") where property kafka.topics is 'sss.*'. Now when I send message to topic 'sss.test' or any other topic name like 'sss.xyz', 'sss.pqr', it's throwing error as below:
WARN o.apache.kafka.clients.NetworkClient - Error while fetching metadata with correlation id 12 : {sss.xyz-topic=LEADER_NOT_AVAILABLE}
I tried to enable listeners & advertised.listeners in the server.properties file but when I re-start Kafka it consumes messages from all old topics which were tried. The moment I use new topic name, it throws above error.
Kafka doesn't support pattern matching? Or there's some configuration which I'm missing? Please suggest.

How to recover lost message in kafka consumer

I'm writing an application in Apache camel. I am consume messages from some Kafka topic via camel Kafka component and dumps into database for recovery in case of any crash/restart happens. Below is the camel URI
kafka:?autoCommitEnable=false&groupId=r&keySerializerClass=org.apache.kafka.common.serialization.StringSerializer&serializerClass=org.apache.kafka.common.serialization.StringSerializer&topic=
My use case is - I have consumed some message(s) from Kafka but could not dumped the same into the database for recovery and crash happens.Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Thanks
Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Actually kafka store the consumer offset for you, If you do commit offset in your application. So when you restart the application, It will consume message from the last offset stored in kafka.
You could set the AutocommitEnable=true OR
I also found this https://github.com/apache/camel/blob/camel-2.18.2/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaConsumer.java.
There are some piece code :
if (endpoint.getConfiguration().isAutoCommitEnable() != null
&& !endpoint.getConfiguration().isAutoCommitEnable()) {
long partitionLastoffset = partitionRecords.get(partitionRecords.size() - 1).offset();
consumer.commitSync(Collections.singletonMap(
partition, new OffsetAndMetadata(partitionLastoffset + 1)));
}
The Camel will take care of this even you do not set the AutocommitEnable.

flink kafka consumer groupId not working

I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.