Avro data not appearing in ksql query - apache-kafka

I am trying to set up a topic with an avro schema on confluent platform (with docker).
My topic is running and I have messages.
I also configured the avro schema for the value for this specific topic:
Thus, I can't use the data from for example ksql.
Any idea of what I am doing wrong?
EDIT 1:
So what I expect is:
From the confluent platform, on the topic view, I expect to see the value in a readable format (not Avro), when the schema is in the registry.
From KSQL, I tried to create a Stream with the following command:
CREATE STREAM hashtags
WITH (KAFKA_TOPIC='mytopic',
VALUE_FORMAT='AVRO');
But when I try to visualize my created stream, no data are showing up.

Related

How does a kafka connect connector know which schema to use?

Let's say I have a bunch of different topics, each with their own json schema. In schema registry, I indicated which schemas exist within the different topics, not directly refering to which topic a schema applies. Then, in my sink connector, I only refer to the endpoint (URL) of the schema registry. So to my knowledge, I never indicated which registered schema a kafka connector (e.g., JDBC sink) should be used in order to deserialize a message from a certain topic?
Asking here as I can't seem to find anything online.
I am trying to decrease my kafka message size by removing overhead of having to specify the schema in each message, and using schema registry instead. However, I cannot seem to understand how this could work.
Your producer serializes the schema id directly in the bytes of the record. Connect (or consumers with the json deserializer) use the schema that's part of each record.
https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#wire-format
If you're trying to decrease message size, don't use JSON, but rather a binary format and enable topic compression such as ZSTD

ksqlDB can't get data from Schema Registry

Case: I have topic in Kafka with name some_table_prefix.table_name. Data is serialized with AVRO, but for historical reasons I have record in Schema Registry named table_name-value.
When I'm trying to setup ksqlDB stream
CREATE STREAM some_stream_name
WITH (KAFKA_TOPIC='some_table_prefix.table_name', VALUE_FORMAT='AVRO');
I'm getting error Schema for message values on topic some_table_prefix.table_name does not exist in the Schema Registry.Subject: some_table_prefix.table_name-value.
I have Schema registry integrated correctly, for others topics everything works ok.
So, is it possible to specify Schema Registry record name in ksqlDB stream creation or resolve this issue some other way?
If you have a topic named table_name, that has Avro being produced to it (which would automatically create table_name-value in the Registry), then that's what ksqlDB should consume from. If you'd manually created that subject by posting the schema on your own, without matching the topic name, then that's part of the problem.
As the error says, it's looking for a specific subject in the Registry based on the topic you've provided. To my knowledge, its not possible to use another subject name, so the workaround is to POST the old subject's schemas into the new one

How to monitor 'bad' messages written to kafka topic with no schema

I use Kafka Connect to take data from RabbitMQ into kafka topic. The data comes without schema so in order to associate schema I use ksql stream. On top of the stream I create a new topic that now has a defined schema. At the end I take the data to BQ database. My question is how do I monitor messages that have not passed the stream stage? in this way, do i support schema evolution? and if not, how can use the schema registry functionality?
Thanks
use Kafka Connect to take data ... data comes without schema
I'm not familiar specifically with Rabbitmq connector, but if you use the Confluent converter classes that do use schemas, then it would have one, although maybe only a string or bytes schema
If ksql is consuming the non-schema topic, then there's a consumer group associated with that process. You can monitor its lag to know how many messages have not yet been processed by ksql. If ksql is unable to parse a message because it's "bad", then I assume it's either skipped or the stream stops consuming completely; this is likely configurable
If you've set the output topic format to Avro, for example, then the schema will automatically be registered to the Registry. There will be no evolution until you modify the fields of the stream

Specify KSQL Stream Subject names explicitly

I have two KSQL topics my-topic-1 and my-topic-2, with messages serialised via AVRO. For historical reasons, the my-topic-1 schema is not in the recommended topic-value format, but is instead my-custom-subject-name.
I want to move records from one topic to the other via KSQL.
First up, let's create a stream:
CREATE STREAM my-stream-1
WITH (KAFKA_TOPIC='my-topic-1', VALUE_FORMAT='AVRO');
oops:
Avro schema for message values on topic my-topic-1 does not exist in the Schema Registry.
Subject: my-topic-1-value
Possible causes include:
- The topic itself does not exist
-> Use SHOW TOPICS; to check
- Messages on the topic are not Avro serialized
-> Use PRINT 'my-topic-1' FROM BEGINNING; to verify
- Messages on the topic have not been serialized using the Confluent Schema Registry Avro serializer
-> See https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html
- The schema is registered on a different instance of the Schema Registry
-> Use the REST API to list available subjects
https://docs.confluent.io/current/schema-registry/docs/api.html#get--subjects
It's looking for the subject my-topic-1-value
Does anyone have any idea if this is possible? VALUE_AVRO_SCHEMA_FULL_NAME mentioned here doesn't do what I want it to.
This appears to be a bug. I've updated https://github.com/confluentinc/ksql/issues/3188 with an example to reproduce. I suggest we track it there.

How to define schema while creating a Kafka connector using REST API

I have configured Kafka connect workers to run in cluster and able to get DB data. Also I have stored DB data in Kafka topics in JSON format. Here I used JSON converter for serializing the data
On viewing the DB data in Kafka consumer console I can see that UserCreatedon column value is displayed as integer. The data type of the UserCreatedon column value in DB is int64 (unix epoch time), that’s why timestamp value is displayed as int by Kafka consumer
Is there any way to send schema during connector creation. Because i want UserCreatedon should be displayed in timestamp format instead of int
Sample output
{"schema":{"type":"struct","fields":[{"type":"string","optional":false,"field":"NAME"},{"type":"int64","optional":true,"name":"org.apache.kafka.connect.data.Timestamp","version":1,"field":"UserCreatedON"}],"optional":false},"payload":{"NAME":"UserProvision","UserCreatedon ":1567688965261}}
Kindly requesting your support a lot here.
You have not mentioned what type of connector you are using to bring data to Kafka from DB. Kafka connect supports transformers.
Single Message Transformations (SMTs) are applied to messages as they
flow through Connect. SMTs transform inbound messages after a source
connector has produced them, but before they are written to Kafka
See here
Specifically for your case you can use TimestampConvertor