I am trying to use kafka streams for some sort of computation, and send the result of computation to a topic which is sinked to database by JDBC sink connector. The result needs to be serialized using avro with confluent schema registry. Is there any demo or guide to show how to handle this scenario?
Not clear what you mean by "integrate"; Kafka Streams is independent from Kafka Connect, however both can be used from ksqlDB
The existing examples of Kafka Connect should be adequate enough using the output topic of your Streams tasks
As for Kafka Streams, you'd need to use the Confluent Avro Serde's and add Schema Registry URL to the StreamsConfig.
Related
I have a requirement to read messages from a topic, enrich the message based on provided configuration (data required for enrichment is sourced from external systems), and publish the enriched message to an output topic. Messages on both source and output topics should be Avro format.
Is this a good use case for a custom Kafka Connector or should I use Kafka Streams?
Why I am considering Kafka Connect?
Lightweight in terms of code and deployment
Configuration driven
Connection and error handling
Scalability
I like the plugin based approach in Connect. If there is a new type of message that needs to be handled I just deploy a new connector without having to deploy a full scale Java app.
Why I am not sure this is good candidate for Kafka Connect?
Calls to external system
Can Kafka be both source and sink for a connector?
Can we use Avro schemas in connectors?
Performance under load
Cannot do stateful processing (currently there is no requirement)
I have experience with Kafka Streams but not with Connect
Use both?
Use Kafka Connect to source external database into a topic.
Use Kafka Streams to build that topic into a stream/table that can then be manipulated.
Use Kafka Connect to sink back into a database, or other system other than Kafka, as necessary.
Kafka Streams can also be config driven, use plugins (i.e. reflection), is just as scalable, and has no different connection modes (to Kafka). Performance should be the similar. Error handling is really the only complex part. ksqlDB is entirely "config driven" via SQL statements, and can connect to external Connect clusters, or embed its own.
Avro works for both, yes.
Some connectors are temporarily stateful, as they build in-memory batches, such as S3 or JDBC sink connectors
I want to write data from Kafka to Cassandra with Flink. The messages in Kafka are serialized via Avro schema which is managed by a Confluent schema registry.
I have already got
a Kafka Consumer, which can consume the Avro messages properly
and a CassandraSink which can write data with Flink's datatype Row
I've already tried to use AvroRowDeserializationSchema (see here) but this didn't work due to the Confluent schema registry. Unfortunately, I couldn't find any resources to deserialize Confluent Avro to Row.
But maybe there is a completely different and easier way to achieve this simple goal (read Confluent Avro and write to Cassandra table). I would really appreciate any hint or idea.
I'm using Kafka streams to fetch the data from a topic and now I would like to load these data to Postgres. Is it possible?
Kafka Streams is a 'small footprint' client libraries meant only to work with data from kafka to kafka. To copy data from / into kafka you should use kafka connect , or building your own kafka consumer/producer ,
Is there a way for a Kafka consumer to read directly from an AWS Kinesis Stream?
Azure EventHub provides the option of enabling Kafka, so that a Kafka consumer can seamlessly read from EventHub. Is there something similar in AWS Kinesis?
There is not. Kafka and Kinesis use different methods of communication and have different representations of their records
You could use Kafka Connect to source data from Kinesis into Kafka, then consume from that, though
I am using Confluent JDBC Kafka connector to publish messages into topic. The source connector will send data to topic along with schema on each poll. I want to retrieve this schema.
Is it possible? How? Can anyone suggest me
My intention is to create a KSQL stream or table based on schema build by Kafka connector on poll.
The best way to do this is to use Avro, in which the schema is stored separately and automatically used by Kafka Connect and KSQL.
You can use Avro by configuring Kafka Connect to use the AvroConverter. In your Kafka Connect worker configuration set:
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
(Update schema-registry to the hostname of where your Schema Registry is running)
From there, in KSQL you just use
CREATE STREAM my_stream WITH (KAFKA_TOPIC='source_topic', VALUE_FORMAT='AVRO');
You don't need to specify the schema itself here, because KSQL fetches it from the Schema Registry.
You can read more about Converters and serialisers here.
Disclaimer: I work for Confluent, and wrote the referenced blog post.