How to update KSQL stream definition dynamically based on schema-registry - apache-kafka

I created a KSQL stream based on schema-registry by following this post. The Kafka JDBC connector updates a latest schema in schema-registry. The new stream gets created with the latest schema, but existing stream sill in the oldest schema.
I don't know when the schema of the datasource gets changed. In this case, I am expecting KSQL to dynamically refresh its definition with the newest schema available in schema-registry.
Any idea? How to achieve this?

At the moment you have to manually drop and recreate a stream to pick up the new schema.
I've logged #2215 if you want to upvote/discuss desired behaviour there.

Related

Streaming Database(mariadb) Changes into another database table using apache-kafka and debezium connector

What I am aiming to do is to Stream data changes into new database table using apache-kafka along with debezium-connectors. But I don't have the slightest idea to how to achieve it. Although I know how to start kafka-zookeeper ,creating topics , and subscribe to that topic . And I am unfamiliar with all the next steps .How to achieve data streaming and capture that data into new database table using Change Data Capture(CDC)?
Debezium only sources data into Kafka. Won't read from Kafka or write to a new database.
You can refer an old blog post of theirs using the JDBC Sink Kafka Connector to write to a new server
https://debezium.io/blog/2017/09/25/streaming-to-another-database/

How do we check number of records are loaded so far onto db from Kafka topic?

I'm trying to load data from Kafka topic to Postgres using Jdbc sink connector . Now, how do we know the number of records are loaded so far into Postgres. As of now I keep on checking number of records in db using sql query. Is there any other way I can know about it?
Kafka Connect doesn't track this. I see nothing wrong with SELECT COUNT(*) on the table, however this doesn't exclude other processes writing to that table as well
it is not possible in KAFKA. Because once you have sinked the records into the target DB, KAFKA is already done its job. But you can track number of records that you are updating using SINK Record Collections write into your local file or insert into a KAFKA State store.

Is it possible to use Kafka Connect to mirror an RDBMS table to a Kafka Stream?

I know it's possible to push updates from a database to a Kafka stream using Kafka Connect. My question is, can I create a consumer to write changes from that same stream back into the table without creating an infinite loop?
I'm assuming if I create a consumer that writes updates into the database table, it would trigger Connect to push that update to the stream, etc. Is there a way around this so I can mirror a database table to a stream?
You can stream from a Kafka topic to a database using the JDBC Sink connector for Kafka Connect.
You'd need to code in your business logic for avoiding an infinite replication loop into either the connectors or your consumer. For example:
JDBC Source connector uses a WHERE clause to only pull records with a flag set to indicate they are the original record
Custom Single Message Transform in the source connector to drop records with a flag set to indicate they are not the original record
Stream application (e.g. KSQL / Kafka Streams) processes the inbound stream of all database changes to filter out only those with a flag set to indicate they are the original record
Inefficient because then you're still streaming everything from the database
Yes. It is possible to configure synchronisation/replication.

Consumer schema update during deserialization

I'm currently studying the Avro schema system and from what I understand the flow of a schema update is:
1) A client changes the schema (maybe by adding a new field with a default value for backwards compatibility) and sends data to Kafka serialized with the updated schema.
2) Schema registry does compatibility checks and registers the new version of the schema with a new version and a unique Id.
3) The consumer (still using the old schema) attempts to deserialize the data and schema evolution drops the new field, allowing the consumer to deserialize the data.
From I understand we need to explicitly update the
consumers after a schema change in order to supply them with the latest schema.
But why the consumer just pull the latest schema when it sees that the ID has changed?
You need to update consumer schemas if they are using a SpecificRecord subclass. That's effectively skipping the schema ID lookup
If you want it to always match the latest, then you can make an http call to the registry to /latest and get it, then restart the app.
Or if you always want the consumer to use the ID of the message, use GenericRecord as the object type

Update ksql stream with new topic schema

I write avro messages into kafka topic using schema registry.
Then created stream based on the topic. The scream created with current schema.
Then I add new field to the schema. The schema register updated, it's OK, but the stream stay with the first structure.
Can I update the stream with new schema?
It's problematic for me to drop and create the schema again because I have lot of other streams\tables that depend on it. The KSQL don't allow to drop stream with dependencies.
I think the best way is stop the streams and DROP STREAM MYSTREAM.
https://docs.confluent.io/current/ksql/docs/developer-guide/create-a-stream.html#delete-a-ksql-stream