Debezium connector for SQL Server and want to do the filteration at debezium layer - debezium

We are using debezium connector for SQL Server and want to do the filterating at debezium layer itself so that we can selectively pass the data onto a kafka topic

You can check this documentation: https://debezium.io/documentation/reference/stable/transformations/filtering.html
Probably you will need something like this:
"transforms": "filter",
"transforms.filter.type": "io.debezium.transforms.Filter",
"transforms.filter.language": "jsr223.groovy",
"transforms.filter.condition": "value.op == \"c\""

Related

Stream both schema and data changes from MySQL to MySQL using Kafka Connect

How we can stream schema and data changes along with some kind of transformations into another MySQL instance using Kafka connect source connector.
Is there a way to propagate schema changes also if I use Kafka's Python library(confluent_kafka) to consume and transform messages before loading into target DB.
You can use Debezium to stream MySQL binlogs into Kafka. Debezium is built upon Kafka Connect framework.
From there, you can use whatever client you want, including Python, to consume and transform the data.
If you want to write to MySQL, you can use Kafka Connect JDBC sink connector.
Here is an old post on this topic - https://debezium.io/blog/2017/09/25/streaming-to-another-database/

PLC4X OPCUA -Kafka Connnector

I want to use the PLC4X Connector (https://www.confluent.io/hub/apache/kafka-connect-plc4x-plc4j) to connect OPC UA (Prosys Simulation Server) with Kafka.
However I really do not find any website that describe the kafka connect configuration options?
I tried to connect to the prosys opc ua simulation server and than stream the data to a kafka topic.
I managed it to simply send the data and consume it, however i want to use a schema and the avro connverter.
My output from my sink python connector looks like this. That seems a bit strange to me too?
b'Struct{fields=Struct{ff=-5.4470555688606E8,hhh=Sean Ray MD},timestamp=1651838599206}'
How can I use the PLC4X connector with the Avro converter and a Schema?
Thanks!
{
"connector.class": "org.apache.plc4x.kafka.Plc4xSourceConnector",
"default.topic":"plcTestTopic",
"connectionString":"opcua.tcp://127.0.0.1:12345",
"tasks.max": "2",
"sources": "machineA",
"sources.machineA.connectionString": "opcua:tcp://127.0.0.1:12345",
"sources.machineA.jobReferences": "jobA",
"jobs": "jobA",
"jobs.jobA.interval": "5000",
"jobs.jobA.fields": "job1,job2",
"jobs.jobA.fields.job1": "ns=2;i=2",
"jobs.jobA.fields.job2": "ns=2;i=3"
}
When using a schema with Avro and the Confluent schema registry, the following settings should be used. You can also choose to use different settings for both the keys and values.
key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url:http://127.0.0.1:8081
value.converter.schema.registry.url:http://127.0.0.1:8081
key.converter.schemas.enable=true
value.converter.schemas.enable=true
Sample configuration files are also available in the PLC4X Github repository.
https://github.com/apache/plc4x/tree/develop/plc4j/integrations/apache-kafka/config

How to Connect Kafka to Postgres in Heroku

I have some Kafka consumers and producers running through my Kafka instance on my Heroku Cluster. I'm looking to create a data sink connector to connect Kafka to PosytgreSQL to put data FROM Kafka TO my heroku PostgreSQL instance. Pretty much like the HeroKu docs, but one way.
I can't figure out the steps I need to take to achieve this.
The docs say to look at the Gitlab or Confluence Ecosystem page but i can't find any mention of Postgres in these.
Looking in the Confluent Kafka Connectors library there seems to something from Debezium but i'm not running Confluent.
The diagram in the Heroku docs mentions a JDBC connector? I found this Postgres JDBC driver, should I be using this?
I'm happy to create a consumer and update postgres manually as the data comes if that's what's needed, but I feel that Kafka to Postgres must be a common enough interface that there should be something out there to manage this?
I'm just looking for some high level help or examples to set me on the right path.
Thanks
You're almost there :)
Bear in mind that Kafka Connect is part of Apache Kafka, and you get a variety of connectors. Some (e.g. Debezium) are community projects from Red Hat, others (e.g. JDBC Sink) are community projects from Confluent.
The JDBC Sink connector will let you stream data from Kafka to a database with a JDBC driver - such as Postgres.
Here's an example configuration:
{
"connector.class" : "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"connection.url" : "jdbc:postgresql://postgres:5432/",
"connection.user" : "postgres",
"connection.password": "postgres",
"auto.create" : true,
"auto.evolve" : true,
"insert.mode" : "upsert",
"pk.mode" : "record_key",
"pk.fields" : "MESSAGE_KEY"
}
Here's a walkthrough and couple of videos that you might find useful:
Kafka Connect in Action: JDBC Sink
ksqlDB and the Kafka Connect JDBC Sink
Do i actually need to install anything
Kafka Connect comes with Apache Kafka. You need to install the JDBC connector.
Do i actually need to write any code
No, just the configuration, similar to what I quoted above
can i just call the Connect endpoint , which comes with Kafka,
Once you've installed the connector you run Kafka Connect (a binary that ships with Apache Kafka) and then use the REST endpoint to create the connector using the configuration

Confluent Sink Connector - Sink to Multiple Database Destinations

If I need to copy the same data from a topic to multiple databases with different IPs, do I need to create multiple sink connectors or I can somehow specify multiple destinations in connection.url or some other way.
Your help is most appreciated
I'm assuming that you're talking about Kafka Connect and the JDBC Sink connector? If so, then you need to create one sink connector per target database. One connector cannot target multiple databases.
You need to spin up the individual sink connectors sink since the connector class or the configuration that is being used varies with databases. Specifying multiple destination IPs of different databases in the connection.url won't work.

Debezium CDC connector to EventHub

I am trying to use Debezium CDC connector to send data from the SQL server to Azure EventHub. But table is not getting created in EventHub from the SQL server. I am not getting any error as well. All the defaults topics are created in eventhub when i started the connector
followed this doc https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-connect-tutorial
and worked fine. Will CDC connector works fine with eventhub.. any idea?`
Debezium today only supports Kafka as out of box connector. If you need to write notifications to any other sink, you need to implement in embedded mode. See https://debezium.io/documentation/reference/1.0/operations/embedded.html