java.lang.NullPointerException: Cannot invoke "org.apache.kafka.streams.processor.api.ProcessorSupplier.get()" because this.processorSupplier is null - apache-kafka

I use Kafka Streams SDK 3.3.1 and pretty simple topology (that calculates overall number of messages) but got rather weird exception when creating topology:
java.lang.NullPointerException: Cannot invoke "org.apache.kafka.streams.processor.api.ProcessorSupplier.get()" because "this.processorSupplier" is null
at org.apache.kafka.streams.kstream.internals.graph.ProcessorParameters.toString(ProcessorParameters.java:133)
at java.base/java.lang.String.valueOf(String.java:4225)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:173)
at org.apache.kafka.streams.kstream.internals.graph.BaseRepartitionNode.toString(BaseRepartitionNode.java:77)
at org.apache.kafka.streams.kstream.internals.graph.OptimizableRepartitionNode.toString(OptimizableRepartitionNode.java:63)
My code to create topology:
private static Topology createTopology() {
StreamsBuilder builder = new StreamsBuilder();
KTable<Integer, Long> table = builder.stream("messages")
.selectKey((key, value) -> 1)
.groupByKey()
.count();
table.toStream().to("stats");
return builder.build();
}

This is a bug that has already been fixed:
https://issues.apache.org/jira/browse/KAFKA-14325
It should only affect you when calling toString on the topology. Please upgrade to 3.3.2 or, as a workaround, catch the exception around the code that calls toString (could be Kafka itself if DEBUG log level is enabled).

Related

Kafkastream springcloud kafka join selectKey

could you please help to configure a spring cloud stream app based on Kafka, I'm facing issue on the selectKey operation.
Let's explain what i m try to reach
2 incoming topics Person, RefGenre
Person contain the key of Refgenre (in value)
public class Person {
String nom;
String prenom;
String codeGenre; <<--- here is the key of the second topic refgenre
}
So I m using the selectKey operator to prepare my stream before the join operation.
a new topic is created with selectByKey (my-app-KSTREAM-KEY-SELECT-0000000004-repartition), and then serialization issue happens :
Exception in thread "my-app-3c57b31c-28e5-4199-b07d-87f8940425ab-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: ClassCastException while producing data to topic my-app-KSTREAM-KEY-SELECT-0000000004-repartition. A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: statefull.serde.PersonWithGenreSerde) is not compatible to the actual key or value type (key type: java.lang.String / value type: statefull.model.Person). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters (for example if using the DSL, #to(String topic, Produced<K, V> produced) with Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))).
Where can i specify serde for this repartition topic and can i specify the name of this "internal" topic ?
#Bean
public BiFunction<KStream<String, Person>, KTable<String, ReferentielGenre>, KStream<Long, PersonWithGenre>> joinKtable() {
return (persons, referentielGenres) ->
persons.selectKey((k,v) -> v.getCodeGenre())
.join(referentielGenres,
(person, genre) -> new PersonWithGenre(person.getNom(), person.getPrenom(),genre),
Joined.with(Serdes.String(), new PersonWithGenreSerde(), null));
}
here is the full code of my not working job : https://github.com/YohanAlard/joinkstream
Is there a better way to handle this usecase ?

apache flink with Kafka: InvalidTypesException

I have following code:
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, MyCustomClassDeserializer.class.getName());
FlinkKafkaConsumer<MyCustomClass> kafkaConsumer = new FlinkKafkaConsumer(
"test-kafka-topic",
new SimpleStringSchema(),
properties);
final StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<MyCustomClass> kafkaInputStream = streamEnv.addSource(kafkaConsumer);
DataStream<String> stringStream = kafkaInputStream
.map(new MapFunction<MyCustomClass,String>() {
#Override
public String map(MyCustomClass message) {
logger.info("--- Received message : " + message.toString());
return message.toString();
}
});
streamEnv.execute("Published messages");
MyCustomClassDeserializer is implemented to convert byte array to java object.
When I run this program locally, I get error:
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: Input mismatch: Basic type expected.
And I get this for code line:
.map(new MapFunction<MyCustomClass,String>() {
Not sure why I get this?
So, You have a deserializer that returns POJO, yet You are telling Flink that it should deserialize record from byte[] to String by using SimpleStringSchema.
See the problem now? :)
I don't think You should use the custom Kafka deserializers in FlinkKafkaConsumer in general. What You should aim for instead is to instead create a custom class that extends DeserializationSchema from Flink. It should be much better in terms of type safety and testability.

Kafka Streams persistent store error: the state store, may have migrated to another instance

I am using Kafka Streams with Spring Boot. In my use case when I receive customer event from other microservice I need to store in customer materialized view and when I receive order event, I need to join customer and order then store in customer-order materialized view. To achieve this I created persistent key-value store customer-store and updating this when a new event comes.
StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer"),Serdes.String(), customerSerde).withLoggingEnabled(new HashMap<>());
streamsBuilder.addStateStore(customerStateStore);
KTable<String,Customer> customerKTable=streamsBuilder.table("customer",Consumed.with(Serdes.String(),customerSerde));
customerKTable.foreach(((key, value) -> System.out.println("Customer from Topic: "+value)));
Configured Topology, Streams and started streams object. When I try to access store using ReadOnlyKeyValueStore, I got the following exception, even though I stored some objects few moments ago
streams.start();
ReadOnlyKeyValueStore<String, Customer> customerStore = streams.store("customer", QueryableStoreTypes.keyValueStore());
System.out.println("customerStore.approximateNumEntries()-> " + customerStore.approximateNumEntries());
Code uploaded to Github for reference. Appreciate your help.
Exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, customer, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1043)
at com.kafkastream.service.EventsListener.main(EventsListener.java:94)
The state store needs some time to be prepared usually. The simplest approach is like below. (code from the official document)
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
You can find additional info in the document.
https://docs.confluent.io/current/streams/faq.html#interactive-queries

Spring Kafka: How to discard messages already retrieved by poll() after doing a seek()?

This is a followup question to - Reading the same message several times from Kafka. If there is a better way to ask this question without posting a new question, let me know. In this post Gary mentions
"But you will still see later messages first if they have already been retrieved so you will have to discard those too."
Is there a clean way to discard messages already read by poll() after calling seek()? I started implementing logic to do this by saving the initial offset (in recordOffset), incrementing it on success. On failure, I call seek() and set the value of recordOffset to record.offset(). Then for every new message I check to see if the record.offset() is greater than recordOffset. If it is, I simply call acknowledge(), thereby "discarding" all the previously read messages. Here is the code -
// in onMessage()...
if (record.offset() > recordOffset){
acknowledgment.acknowledge();
return;
}
try {
processRecord(record);
recordOffset = record.offset()+1;
acknowledgment.acknowledge();
} catch (Exception e) {
recordOffset = record.offset();
consumerSeekCallback.seek(record.topic(), record.partition(), record.offset());
}
The problem with this approach is that it gets complicated with multiple partitions. Is there an easier/cleaner way?
EDIT 1
Based on Gary's suggestion below, I tried adding an errorHandler like this -
#KafkaListener(topicPartitions =
{#org.springframework.kafka.annotation.TopicPartition(topic = "${kafka.consumer.topic}", partitions = { "1" })},
errorHandler = "SeekToCurrentErrorHandler")
Is there something wrong with this syntax as I get "Cannot resolve method 'errorHandler'"?
EDIT 2
After Gary explained the 2 error handlers, I removed the above errorHandler and added below to the config file -
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(new DefaultKafkaConsumerFactory<>(kafkaProps()));
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
When I start the application, I get this error now...
java.lang.NoSuchMethodError: org.springframework.util.Assert.state(ZLjava/util/function/Supplier;)V
at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.determineInferredType(MessagingMessageListenerAdapter.java:396)
Here is line 396 -
Assert.state(!this.isConsumerRecordList || validParametersForBatch,
() -> String.format(stateMessage, "ConsumerRecord"));
Assert.state(!this.isMessageList || validParametersForBatch,
() -> String.format(stateMessage, "Message<?>"));
Starting with version 2.0.1, if the container's ErrorHandler is a RemainingRecordsErrorHandler, such as the SeekToCurrentErrorHandler, the remaining records (including the failed one) are sent to the error handler instead of the listener.
This allows the SeekToCurrentErrorHandler to reposition every partition so the next poll will return the unprocessed record(s).
/**
* An error handler that seeks to the current offset for each topic in the remaining
* records. Used to rewind partitions after a message failure so that it can be
* replayed.
*
* #author Gary Russell
* #since 2.0.1
*
*/
public class SeekToCurrentErrorHandler implements RemainingRecordsErrorHandler
EDIT
There are two types of error handler. The KafkaListenerErrorHandler (specified in the annotation) works at the listener level; it is wired into the listener adapter that invokes the #KafkaListener annotation and thus only has access to the current record.
The second error handler (configured on the listener container) works at the container level and thus has access to the remaining records. The SeekToCurrentErrorHandler is a container-level error handler.
It is configured on the container properties in the container factory...
#Bean
public ConcurrentKafkaListenerContainerFactory kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(this.consumerFactory);
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.RECORD);
return factory;
}
You go right way and yes you have to deal with different partitions as well. There is a FilteringMessageListenerAdapter, but you still have to write the logic.

Kafka Streams dynamic routing (ProducerInterceptor might be a solution?)

I'm working with Apache Kafka and I've been experimenting with the Kafka Streams functionality.
What I'm trying to achieve is very simple, at least in words and it can be achieved easily with the regular plain Consumer/Producer approach:
Read a from a dynamic list of topics
Do some processing on the message
Push the message to another topic which name is computed based on the message content
Initially I thought I could create a custom Sink or inject some kind of endpoint resolver in order to programmatically define the topic name for each single message, although ultimately couldn't find any way to do that.
So I dug into the code and found the ProducerInterceptor class that is (quoting from the JavaDoc):
A plugin interface that allows you to intercept (and possibly mutate)
the records received by the producer before they are published to the
Kafka cluster.
And it's onSend method:
This is called from KafkaProducer.send(ProducerRecord) and
KafkaProducer.send(ProducerRecord, Callback) methods, before key and
value get serialized and partition is assigned (if partition is not
specified in ProducerRecord).
It seemed like the perfect solution for me as I can effectively return a new ProducerRecord with the topic name I want.
Although apparently there's a bug (I've opened an issue on their JIRA: KAFKA-4691) and that method is called when the key and value have already been serialized.
Bummer as I don't think doing an additional deserialization at this point is acceptable.
My question to you more experienced and knowledgeable users would be your input and ideas and any kind of suggestions on how would be an efficient and elegant way of achieving it.
Thanks in advance for your help/comments/suggestions/ideas.
Below are some code snippets of what I've tried:
public static void main(String[] args) throws Exception {
StreamsConfig streamingConfig = new StreamsConfig(getProperties());
StringDeserializer stringDeserializer = new StringDeserializer();
StringSerializer stringSerializer = new StringSerializer();
MyObjectSerializer myObjectSerializer = new MyObjectSerializer();
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.addSource("SOURCE", stringDeserializer, myObjectSerializer, Pattern.compile("input-.*"));
.addProcessor("PROCESS", MyCustomProcessor::new, "SOURCE");
System.out.println("Starting PurchaseProcessor Example");
KafkaStreams streaming = new KafkaStreams(topologyBuilder, streamingConfig);
streaming.start();
System.out.println("Now started PurchaseProcessor Example");
}
private static Properties getProperties() {
Properties props = new Properties();
.....
.....
props.put(StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG), "com.test.kafka.streams.OutputTopicRouterInterceptor");
return props;
}
OutputTopicRouterInterceptor onSend implementation:
#Override
public ProducerRecord<String, MyObject> onSend(ProducerRecord<String, MyObject> record) {
MyObject obj = record.value();
String topic = computeTopicName(obj);
ProducerRecord<String, MyObject> newRecord = new ProducerRecord<String, MyObject>(topic, record.partition(), record.timestamp(), record.key(), obj);
return newRecord;
}