Kafka multiple Topics into the same avro file - apache-kafka

i am new of the KAFKA protocol's world and i would like to ask you some inportant information related to my project.
I am using AVRO file for producing and consuming messages, i want to know if i can use the same avro file for multiple Topics maybe for example by using a different "name" attribute into the producer and by using a specific "name" attribute in the consumer.
Thanks a lot.
Stefano

You can use one file to send data to multiple topics, yes, although I'm not sure why one would do that
I would be cautious about merging multiple topics into one Avro file because the schema must match in every topic for that file
It would be suggested that you use the Confluent Schema Registry, for example, rather than sending individual Avro events because if you are not using some registry, then you're likely sending the Avro schema as part of every message, which will slow down the possible throughput of your topic. And then, the name of the Avro schema record in the register will correspond to the topic name

Related

How does a kafka connect connector know which schema to use?

Let's say I have a bunch of different topics, each with their own json schema. In schema registry, I indicated which schemas exist within the different topics, not directly refering to which topic a schema applies. Then, in my sink connector, I only refer to the endpoint (URL) of the schema registry. So to my knowledge, I never indicated which registered schema a kafka connector (e.g., JDBC sink) should be used in order to deserialize a message from a certain topic?
Asking here as I can't seem to find anything online.
I am trying to decrease my kafka message size by removing overhead of having to specify the schema in each message, and using schema registry instead. However, I cannot seem to understand how this could work.
Your producer serializes the schema id directly in the bytes of the record. Connect (or consumers with the json deserializer) use the schema that's part of each record.
https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#wire-format
If you're trying to decrease message size, don't use JSON, but rather a binary format and enable topic compression such as ZSTD

How to monitor 'bad' messages written to kafka topic with no schema

I use Kafka Connect to take data from RabbitMQ into kafka topic. The data comes without schema so in order to associate schema I use ksql stream. On top of the stream I create a new topic that now has a defined schema. At the end I take the data to BQ database. My question is how do I monitor messages that have not passed the stream stage? in this way, do i support schema evolution? and if not, how can use the schema registry functionality?
Thanks
use Kafka Connect to take data ... data comes without schema
I'm not familiar specifically with Rabbitmq connector, but if you use the Confluent converter classes that do use schemas, then it would have one, although maybe only a string or bytes schema
If ksql is consuming the non-schema topic, then there's a consumer group associated with that process. You can monitor its lag to know how many messages have not yet been processed by ksql. If ksql is unable to parse a message because it's "bad", then I assume it's either skipped or the stream stops consuming completely; this is likely configurable
If you've set the output topic format to Avro, for example, then the schema will automatically be registered to the Registry. There will be no evolution until you modify the fields of the stream

How to consume and parse different Avro messages in kafka consumer

In My application Kafka topics are dedicated to a domain(can't change that) and multiple different types of events (1 Event = 1 Avro schema message) related to that domain being produced by different micro-services in that one topic.
Now I have only one consumer app in which I should be able to apply different schema dynamically (by inspecting event name in message) and transform in appropriate pojo object(generated by specific Avro schema) for further event specific actions.
Whatever example I find on net is all about single schema type message consumer so need some help.
Related blog post: https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/
How to configure the consumer:
https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-avro.html#avro-deserializer
https://github.com/openweb-nl/kafka-graphql-examples/blob/307bbad6f10e4aaa6b797a3bbe3b6620d3635263/graphql-endpoint/src/main/java/nl/openweb/graphql_endpoint/service/AccountCreationService.java#L47
https://github.com/openweb-nl/kafka-graphql-examples/blob/307bbad6f10e4aaa6b797a3bbe3b6620d3635263/graphql-endpoint/src/main/resources/application.yml#L20
You need the generated Avro classes on the classpath. Most likely by adding a dependency.

Sink events of a same Kafka topic into multiple paths in GCS

I am using the Schema Registry with RecordNameStrategy naming policy so I have events with totally different avro schemas into the same Kafka topic.
I am doing that as I want to group logically related events that may have different data structures under a same topic to keep order for these data.
For instance:
user_created event and user_mail_confirmed event might have different schemas but it's important to keep them into a same topic partition to guarantee order for consumers.
I am trying to sink these data, coming from a single topic, into GCS in multiple paths (one path for each schema)
Does someone know if the Confluent Kafka connect GCS Sink connector (or any other connector) provide us with that feature please ?
I haven't used GCS connector, but I suppose that this is not possible with Confluent connectors in general.
You should probably copy your source topic with different data structures to a new set of topics, where data have common data structure. This is possible with ksqlDB (check an example) or Kafka Streams application. Then, you can create connectors for these topics.
Alternatively, you can use RegexRouter transformation with a set of predicates based on the message headers.

How can i get the Avro schema object from the received message in kafka?

I try to publish/consume my java objects to kafka. I use Avro schema.
My basic program works fine. In my program i use my schema in the producer (for encoding) and consumer (decoding).
If i publish different objects to different topics( eg: 100 topics)at the receiver, i do not know, what type of message i received ?..I would like to get the avro schema from the received byte and would like to use that for decoding..
Is my understand correct? If so, how can i retrieve from the received object?
You won't receive the Avro schema in the received bytes -- and you don't really want to. The whole idea with Avro is to separate the schema from the record, so that it is a much more compact format. The way I do it, I have a topic called Schema. The first thing a Kafka consumer process does is to listen to this topic from the beginning and to parse all of the schemas.
Avro schemas are just JSON string objects -- you can just store one schema per record in the Schema topic.
As to figuring out which schema goes with which topic, as I said in a previous answer, you want one schema per topic, no more. So when you parse a message from a specific topic you know exactly what schema applies, because there can be only one.
If you never re-use the schema, you can just name the schema the same as the topic. However, in practice you probably will use the same schema on multiple topics. In which case, you want to have a separate topic that maps Schemas to Topics. You could create an Avro schema like this:
{"name":"SchemaMapping", "type":"record", "fields":[
{"name":"schemaName", "type":"string"},
{"name":"topicName", "type":"string"}
]}
You would publish a single record per topic with your Avro-encoded mapping into a special topic -- for example called SchemaMapping -- and after consuming the Schema topic from the beginning, a consumer would listen to SchemaMapping and after that it would know exactly which schema to apply for each topic.