Is it a good practice to use the exiting topic for multiple connectors? - postgresql

I am using the Debezium PostgreSQL connector to get the users table into a Kafka Topic.
I have a JDBC Sink Connector connector that then reads the data from the topic and pushes it into it's own Database.
Now, I need a subset of the data for another Microservice Database. So I am planning to write another JDBC Sink Connector.
The Question: is it a good practice to use the existing users table topic? If yes, then how I can make sure that new JDBC connector get's a snapshot of entire users table
 

If Debezium snapshotted the table and data hasn't been lost in the topic due to retention, then that's what any sink or other consumer will read.
Any unique sink connector name will read unique offsets from its topic. Nothing bad will happen with multiple consumers reading the same topic; this is how Kafka is intended to be used.
You may need to ensure consumer.auto.offset.reset=earliest for connect to read from the start of the topic
To get a subset of fields, you'll need to "replace" them - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#replacefield

Related

Kafka Connect: a sink connector for writing multiple tables from one topic

I'd like to use Kafka Connect to detect changes on a Postgres DB via CDC for a group of tables, and push them as messages in one single topic, with the key as the logic key of the main table.
This will give the consumer to consume the data changes in the right order to apply them to a destination DB.
Are there Source and Sink connectors allowing me to achieve this goal?
I'm using Debezium CDC Source Connector for Postgres... which I can configure to route all the messages for all the tables into one single topic.
But then I'm not able to find a Sink connector capable to consume the messages, and write to the right table depending on the schema of the message.
There'd be no property for a specific sink connector, if that's what you're looking for
You can use Single Message Transforms to Extract parts of the record to set the outgoing topic name from a single sink connector topic
Example : https://docs.confluent.io/platform/current/connect/transforms/extracttopic.html

How to monitor 'bad' messages written to kafka topic with no schema

I use Kafka Connect to take data from RabbitMQ into kafka topic. The data comes without schema so in order to associate schema I use ksql stream. On top of the stream I create a new topic that now has a defined schema. At the end I take the data to BQ database. My question is how do I monitor messages that have not passed the stream stage? in this way, do i support schema evolution? and if not, how can use the schema registry functionality?
Thanks
use Kafka Connect to take data ... data comes without schema
I'm not familiar specifically with Rabbitmq connector, but if you use the Confluent converter classes that do use schemas, then it would have one, although maybe only a string or bytes schema
If ksql is consuming the non-schema topic, then there's a consumer group associated with that process. You can monitor its lag to know how many messages have not yet been processed by ksql. If ksql is unable to parse a message because it's "bad", then I assume it's either skipped or the stream stops consuming completely; this is likely configurable
If you've set the output topic format to Avro, for example, then the schema will automatically be registered to the Registry. There will be no evolution until you modify the fields of the stream

What should be the kafka serde configuration when we use kafka streams

We are using a JDBC source connector to sync data from a table to a topic (call this Topic 1) in Kafka. As we know this captures only inserts and updates, we have added a trigger to capture deletes. This trigger captures the deleted record and writes to a new table which gets synced to another Kafka topic (call this Topic 2).
We have configured the JDBC source connector to use AvroConverter.
Now we have written a Kafka streams logic that consumes data from this Topic 2 and publishes to Topic 1. My question is what should be the serializer and deserializer configuration for the Kafka streams logic? Is it ok to use KafkaAvroSerializer and KafkaAvroDeserializer?
I was going through the AvroConverter code (https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java) to see if I can get some ideas. I was navigating the Github code for quite some time. I was not able to conclude whether using KafkaAvoSerializer and KafkaAvroDeserializer is the right side in Kafka streams logic. Can someone please help me?
Why does your JDBC connector only capture inserts and updates?
EDITED: We use Confluent JDBC source connector SQL Server Debezium Connector and it performs well even on deletes. Pay attention to query modes specifically.
Maybe try switching to this connector and you might end up with one problem solved, having only one stream containing all the relevant events.

Is it possible to use Kafka Connect to mirror an RDBMS table to a Kafka Stream?

I know it's possible to push updates from a database to a Kafka stream using Kafka Connect. My question is, can I create a consumer to write changes from that same stream back into the table without creating an infinite loop?
I'm assuming if I create a consumer that writes updates into the database table, it would trigger Connect to push that update to the stream, etc. Is there a way around this so I can mirror a database table to a stream?
You can stream from a Kafka topic to a database using the JDBC Sink connector for Kafka Connect.
You'd need to code in your business logic for avoiding an infinite replication loop into either the connectors or your consumer. For example:
JDBC Source connector uses a WHERE clause to only pull records with a flag set to indicate they are the original record
Custom Single Message Transform in the source connector to drop records with a flag set to indicate they are not the original record
Stream application (e.g. KSQL / Kafka Streams) processes the inbound stream of all database changes to filter out only those with a flag set to indicate they are the original record
Inefficient because then you're still streaming everything from the database
Yes. It is possible to configure synchronisation/replication.

How to setup multiple Kafka JDBC sink connectors for a single topic

I want to stream data from a particular Kafka topic into two distinct databases (MySQL and SQL Server). Every stream of data should be sent into both tables in both databases. What configuration is required in sink connectors in order to achieve this goal?
Create two JDBC Sink connectors, using the same source topic. They'll function independently, and each send the messages from the specified topic to the target RDBMSs.