KStream: Error Reading and Writing Avro records - apache-kafka

I’m trying to write avro record that I read from a topic into another topic, intentions it to augment it with transformation after I get this routing working. I have used the KStream with avro code from one of the example with some modifications to connect to Schema Registry for retrieving the avro schema.
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "mysql-stream-processing");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
final Serde<GenericRecord> keySerde = new GenericAvroSerde(
new CachedSchemaRegistryClient(schemaRegistryUrl, 100),
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
schemaRegistryUrl));
final Serde<GenericRecord> valueSerde = new GenericAvroSerde(
new CachedSchemaRegistryClient(schemaRegistryUrl, 100),
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
schemaRegistryUrl));
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
final KStreamBuilder builder = new KStreamBuilder();
final KStream<GenericRecord, GenericRecord> record = builder.stream("dbserver1.employees.employees");
record.print(keySerde, valueSerde);
record.to(keySerde, valueSerde, "newtopic");
record.foreach((key, val) -> System.out.println(key.toString()+" "+val.toString()));
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
When run print() works as I can see the record in the console, but Im unable to get the record written to the “newtopic”, failing with the below error
Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=dbserver1.employees.employees, partition=0, offset=0
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:217)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: io.confluent.examples.streams.utils.GenericAvroSerializer / value: io.confluent.examples.streams.utils.GenericAvroSerializer) is not compatible to the actual key or value type (key type: [B / value type: [B). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:81)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:198)
... 2 more
Caused by: java.lang.ClassCastException: [B cannot be cast to org.apache.avro.generic.GenericRecord
at io.confluent.examples.streams.utils.GenericAvroSerializer.serialize(GenericAvroSerializer.java:25)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:77)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:79)
... 5 more

I guess you need to configure the correct Serdes:
https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-key-serde
https://docs.confluent.io/current/streams/developer-guide/dsl-api.html
Either set correct global Serdes, or specify Serdes for each operator. If an operator needs a Serde, than it has a corresponding overload taking Serdes are parameters.

Related

Ktable to KGroupTable - Schema Not available (State Store ChangeLog Schema not registered)

I have a Kafka topic - let's activity-daily-aggregate,
and I want to do aggregate (add/sub) using KGroupTable. So I read the topic using the
final KTable<String, GenericRecord> inputKTable =
builder.table("activity-daily-aggregate",Consumed.with(new StringSerde(), getConsumerSerde());
Note: getConsumerSerde - returns >> new GenericAvroSerde(mockSchemaRegistryClient)
2.Next Step,
inputKTable.groupBy(
(key,value)->KeyValue.pair(KeyMapper.generateGroupKey(value), new JsonValueMapper().apply(value)),
Grouped.with(AppSerdes.String(), AppSerdes.jsonNode())
);
Before Step 1 and 2 I have configured MockSchemaRegistryClient with
mockSchemaRegistryClient.register("activity-daily-aggregate-key",
Schema.parse(AppUtils.class.getResourceAsStream("/avro/key.avsc")));
mockSchemaRegistryClient.register("activity-daily-aggregate-value",
Schema.parse(AppUtils.class.getResourceAsStream("/avro/daily-activity-aggregate.avsc")))
While I run the topology - using test cases, I get an error at Step 2.
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000011, topic=activity-daily-aggregate, partition=0, offset=0, stacktrace=org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema: {"type":"record","name":"FactActivity","namespace":"com.ascendlearning.avro","fields":.....}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema Not Found; error code: 404001
The Error goes off when i register the schema with mockSchemaRegistryClient,
stream-app-id-activity-daily-aggregate-STATE-STORE-0000000010-changelog-key
stream-app-id-activity-daily-aggregate-STATE-STORE-0000000010-changelog-value
=> /avro/daily-activity-aggregate.avsc
Do we need to do this step? I thought it might be handled automatically by the topology
From the blog,
https://blog.jdriven.com/2019/12/kafka-streams-topologytestdriver-with-avro/
When you configure the same mock:// URL in both the Properties passed into TopologyTestDriver, as well as for the (de)serializer instances passed into createInputTopic and createOutputTopic, all (de)serializers will use the same MockSchemaRegistryClient, with a single in-memory schema store.
// Configure Serdes to use the same mock schema registry URL
Map<String, String> config = Map.of(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, MOCK_SCHEMA_REGISTRY_URL);
avroUserSerde.configure(config, false);
avroColorSerde.configure(config, false);
// Define input and output topics to use in tests
usersTopic = testDriver.createInputTopic(
"users-topic",
stringSerde.serializer(),
avroUserSerde.serializer());
colorsTopic = testDriver.createOutputTopic(
"colors-topic",
stringSerde.deserializer(),
avroColorSerde.deserializer());
I was not passing the mock registry client - schema URL in the serdes passed to input /output topic.

Kafka stream : class cast exception during left join

I am new to kafka. I am trying to leftJoin a kafka stream (named as inputStream) to kafka-table(named as detailTable) where the kafka-stream is built as:
//The consumer to consume the input topic
Consumed<String, NotifyRecipient> inputNotificationEventConsumed = Consumed
.with(Constants.CONSUMER_KEY_SERDE, Constants.CONSUMER_VALUE_SERDE);
//Now create the stream that is directly reading from the topic
KStream<NotifyKey, NotifyVal> initialInputStream =
streamsBuilder.stream(properties.getInputTopic(), inputNotificationEventConsumed);
//Now re-key the above stream for the purpose of left join
KStream<String, NotifyVal> inputStream = initialInputStream
.map((notifyKey,notifyVal) ->
KeyValue.pair(notifyVal.getId(),notifyVal)
);
And the kafka-table is created this way:
//The consumer for the table
Consumed<String, Detail> notifyDetailConsumed =
Consumed.with(Serdes.String(), Constants.DET_CONSUMER_VALUE_SERDE);
//Now consume from the topic into ktable
KTable<String, Detail> detailTable = streamsBuilder
.table(properties.getDetailTopic(), notifyDetailConsumed);
Now I am trying to join the inputStream to the detailTable as:
//Now join
KStream<String,Pair<Long, SendCmd>> joinedStream = inputStream
.leftJoin(detailTable, valJoiner)
.filter((key,value)->value!=null);
I am getting an error from which it seems that during the join, the key and value of the inputStream were tried to cast into the default key-serde and default value-serde and getting a class cast exception.
Not sure how to fix this and need help there.
Let me know if I should provide more info.
Because you use a map(), key and value type might have changes and thus you need to specify the correct Serdes via Joined.with(...) as third parameter to .leftJoin().

How to handle avro deserialization exceptions when writing kafka stream to a topic

I am seeing some exceptions while writing stream to a topic
Output:
Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: Failed to deserialize
value for record. topic=input_topic, partition=4, offset=9048083
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 572
Here is the code .
Key is null(String) and value is avroserde
streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
I am using specific Avro serde .So i gave the endpoint of schema registry
final Map<String, String> serdeConfig = Collections.singletonMap(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
final Serde<avroschema> avroserde = new SpecificAvroSerde<>();
MasterSpinsSerde.configure(serdeConfig, false); // `false` for record values
Reading the source stream as below
final KStreamBuilder builder = new KStreamBuilder();
final KStream<String, avroschema> feeds = builder.stream("input_topic");
feeds.to(Serdes.String(), avroserde,"output_topic");
return new KafkaStreams(builder, streamsConfiguration);

Null Pointer Exception / Not Found Exception when I tried to process & sink data in Avro schema

I am using a processor to consume byte array data with byte array serdes from a topic, process them into a generic record (based on schema I got from my HTTP GET request) and send them over to a topic with formatted avro schema registry.
I had no problem retrieving the schema from HTTP GET request and map my data according to it to generate a generic record that follows the schema. However when I tried to sink it to the topic I get a Null Pointer Exception:
org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.lang.NullPointerException
atio.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.
java:72 )
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:54)
at
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:78)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:79)
atorg.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl
.java:83)
at streamProcessor.XXXXprocessor.process(XXXXprocessor.java:80)
at streamProcessor.XXXXprocessor.process(XXXXprocessor.java:1)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48)
atorg.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetr
icsImpl.java:188)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134)
atorg.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl
.java:111)
at streamProcessor.SelectorProcessor.process(SelectorProcessor.java:33)
at streamProcessor.SelectorProcessor.process(SelectorProcessor.java:1)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48)
atorg.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetr
icsImpl.java:188)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134)
atorg.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl
.java:83)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
This is my topology code:
//Stream Properties
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "processor-kafka-streams234");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "xxxxxxxxxxxxxxxxxxxxxx:xxxx");
config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
Serdes.ByteArray().getClass().getName());
config.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,
WallclockTimestampExtractor.class);
//Build topology
TopologyBuilder builder = new TopologyBuilder();
builder.addSource("messages-source", "mytest2");
builder.addProcessor("selector-processor", () -> new SelectorProcessor(), "messages-source");
builder.addProcessor("XXXX-processor", () -> new XXXXprocessor(), "selector-processor");
builder.addSink("XXXX-sink", "XXXXavrotest", new KafkaAvroSerializer(), new
KafkaAvroSerializer(), "XXXX-processor");
//Start Streaming
KafkaStreams streaming = new KafkaStreams(builder, config);
streaming.start();
System.out.println("processor streaming...");
After some readings on the issues forum I discovered that I might need to inject a client when I am creating the KafkaAvroSerializers, so I changed that line to:
SchemaRegistryClient client = new
CachedSchemaRegistryClient("xxxxxxxxxxxxxxxxxxxxxx:xxxx/subjects/xxxxschemas/versions", 1000);
builder.addSink("XXXX-sink", "XXXXavrotest", new KafkaAvroSerializer(client), new
KafkaAvroSerializer(client), "XXXX-processor");
Which resulted in a HTTP 404 Not Found Exception ...
I had my URL wrong :P
Also the key had to be init to something besides null because of the cleanup.policy setting on my topic.

Kafka Stream giving weird output

I'm playing around with Kafka Streams trying to do basic aggregations (for the purpose of this question, just incrementing by 1 on each message). On the output topic that receives the changes done to the KTable, I get really weird output:
#B�
#C
#C�
#D
#D�
#E
#E�
#F
#F�
I recognize that the "�" means that it's printing out some kind of character that doesn't exist in the character set, but I'm not sure why. Here's my code for reference:
public class KafkaMetricsAggregator {
public static void main(final String[] args) throws Exception {
final String bootstrapServers = args.length > 0 ? args[0] : "my-kafka-ip:9092";
final Properties streamsConfig = new Properties();
streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, "metrics-aggregator");
// Where to find Kafka broker(s).
streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
// Specify default (de)serializers for record keys and for record values.
streamsConfig.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfig.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// Records should be flushed every 10 seconds. This is less than the default
// in order to keep this example interactive.
streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
// For illustrative purposes we disable record caches
streamsConfig.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
// Class to extract the timestamp from the event object
streamsConfig.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, "my.package.EventTimestampExtractor");
// Set up serializers and deserializers, which we will use for overriding the default serdes
// specified above.
final Serde<JsonNode> jsonSerde = Serdes.serdeFrom(new JsonSerializer(), new JsonDeserializer());
final Serde<String> stringSerde = Serdes.String();
final Serde<Double> doubleSerde = Serdes.Double();
final KStreamBuilder builder = new KStreamBuilder();
final KTable<String, Double> aggregatedMetrics = builder.stream(jsonSerde, jsonSerde, "test2")
.groupBy(KafkaMetricsAggregator::generateKey, stringSerde, jsonSerde)
.aggregate(
() -> 0d,
(key, value, agg) -> agg + 1,
doubleSerde,
"metrics-table2");
aggregatedMetrics.to(stringSerde, doubleSerde, "metrics");
final KafkaStreams streams = new KafkaStreams(builder, streamsConfig);
// Only clean up in development
streams.cleanUp();
streams.start();
// Add shutdown hook to respond to SIGTERM and gracefully close Kafka Streams
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
EDIT: Using aggregatedMetrics.print(); does print out the correct output to the console:
[KSTREAM-AGGREGATE-0000000002]: my-generated-key , (43.0<-null)
Any ideas about what's going on?
You're using Serdes.Double() for your values, that uses a binary efficient encoding [1] for the serialised values and that's what you're seeing on your topic. To get human-readable numbers on the console, you'd need to instruct the consumer to use the DoubleDeserializer too.
[1] https://github.com/apache/kafka/blob/e31c0c9bdbad432bc21b583bd3c084f05323f642/clients/src/main/java/org/apache/kafka/common/serialization/DoubleSerializer.java#L29-L44
Specify DoubleDeserializer as value deserializer at consumer's command line as shown below
--property value.deserializer=org.apache.kafka.common.serialization.DoubleDeserializer