Build a pipeline with kafka connect using jdbc connectors for sqlite - apache-kafka

I am new to kafka connect, trying to build a pipeline to get data from sqlite to kafka topic.

Assuming your sqlite DB is at /tmp/test.db then use this config:
{
"name": "jdbc-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:sqlite:/tmp/test.db",
"mode": "incrementing",
"incrementing.column.name": "id",
"topic.prefix": "test-sqlite-jdbc-",
"name": "jdbc-source"
}
}
For more details, see :
https://docs.confluent.io/current/connect/connect-jdbc/docs/source_connector.html#quick-start
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/

Related

Produce Avro messages in Confluent Control Center UI

To develop a data transfer application I need first define a key/value avro schemas. The producer application is not developed yet till define the avro schema.
I cloned a topic and its key/value avro schemas that are already working and
and also cloned the the jdbc snink connector. Simply I just changed the topic and connector names.
Then I copied and existing message successfully sent sink using Confluent Topic Message UI Producer.
But it is sending the error: "Unknown magic byte!"
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.<init>(AbstractKafkaAvroDeserializer.java:323)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:164)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:172)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:107)
... 17 more
[2022-07-25 03:45:42,385] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask)
Reading other questions it seems the message has to be serialized using the schema.
Unknown magic byte with kafka-avro-console-consumer
is it possible to send a message to a topic with AVRO key/value schemas using the Confluent Topic UI?
Any idea if the avro schemas need information depending on the connector/source? or if namespace depends on the topic name?
This is my key schema. And the topic's name is knov_03
{
"connect.name": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming.Key",
"fields": [
{
"name": "id_sap_incoming",
"type": "long"
}
],
"name": "Key",
"namespace": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming",
"type": "record"
}
Connector:
{
"name": "knov_05",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "knov_03",
"connection.url": "jdbc:mysql://eXXXXX:3306/MY_DB_SCHEMA?useSSL=FALSE&nullCatalogMeansCurrent=true",
"connection.user": "USER",
"connection.password": "PASSWORD",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
"pk.fields": "id_sap_incoming",
"auto.create": "true",
"auto.evolve": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schema.registry.url": "http://schema-registry:8081"
}
}
Thanks.

Kafka Connect - JDBC Source Connector - Setting Avro Schema

How can I make Kafka Connect JDBC connector to predefined Avro schema ? It creates a new version when the connecter is created. I am reading from DB2 and putting into Kafka topic.
I am setting schema name and version during creation but it does not work!!! Here is my connector settings :
{
"name": "kafka-connect-jdbc-db2-tst-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:db2://mydb2:50000/testdb",
"connection.user": "DB2INST1",
"connection.password": "12345678",
"query":"SELECT CORRELATION_ID FROM TEST.MYVIEW4 ",
"mode": "incrementing",
"incrementing.column.name": "CORRELATION_ID",
"validate.non.null": "false",
"topic.prefix": "tst-4" ,
"auto.register.schemas": "false",
"use.latest.version": "true",
"transforms": "RenameField,SetSchemaMetadata",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "CORRELATION_ID:id",
"transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.SetSchemaMetadata.schema.name": "foo.bar.MyMessage",
"transforms.SetSchemaMetadata.schema.version": "1"
}
}
And here are the schemas : V.1 is mine, and V.2 is created by JDBC source connector:
$ curl localhost:8081/subjects/tst-4-value/versions/1 | jq .
{
"subject": "tst-4-value",
"version": 1,
"id": 387,
"schema": "{"type":"record","name":"MyMessage",
"namespace":"foo.bar","fields":[{"name":"id","type":"int"}]}"
}
$ curl localhost:8081/subjects/tst-4-value/versions/2 | jq .
{
"subject": "tst-4-value",
"version": 2,
"id": 386,
"schema": "{"type":"record","name":"MyMessage","namespace":"foo.bar",
"fields":[{"name":"id","type":"int"}],
"connect.version":1,
"connect.name":"foo.bar.MyMessage"
}"
}
Any idea how can I force Kafka connector to use my schema ?
Thanks in advance,

kafka connect sql server incremental changes from cdc

I am very new to Kafka (started reading and setting up in my Sandbox environment from just a week) and trying to setup SQL Server JDBC connector.
I have setup Confluent community as per confluent guide and installed io.debezium.connector.sqlserver.SqlServerConnector using confluent-hub
I enabled CDC on SQL Server Database and required table and it is working fine.
I have tried following connectors (one at-a-time):
io.debezium.connector.sqlserver.SqlServerConnector
io.confluent.connect.jdbc.JdbcSourceConnector
both are loading fine with status of connector and task running fine with no errors as seen below:
Here is my io.confluent.connect.jdbc.JdbcSourceConnector confuguration
{
"name": "mssql-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "CreatedDateTime",
"query": "select * from dbo.sampletable",
"tasks.max": "1",
"table.types": "TABLE",
"key.converter.schemas.enable": "false",
"topic.prefix": "data_",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=sampledb",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "120000"
}
}
Here is my io.confluent.connect.jdbc.JdbcSourceConnector connector
{
"name": "mssql-connector",
"config": {
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max" : "1",
"database.server.name" : "SQL2016",
"database.hostname" : "SQL2016",
"database.port" : "1433",
"database.user" : "kafka",
"database.password" : "kafkaPassword#789",
"database.dbname" : "sampleDb",
"database.history.kafka.bootstrap.servers" : "kafkanode1:9092",
"database.history.kafka.topic": "schema-changes.sampleDb"
}
}
Both connectors are creating snapshot of a table in a topic (means it pulls all the rows initially)
but when I make changes to a table "sampletable" (insert/update/delete), those changes are not being pulled to kafka.
can someone please help me understand how to make CDC working with Kafka?
Thanks
This seem to have worked 100%. I am posting answer just in case someone like me stuck on jdbc source connector.
{
"name": "piilog-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "incrementing",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=SIAudit",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"query": "select * from dbo.sampletable",
"incrementing.column.name": "Id",
"validate.non.null": false,
"topic.prefix": "data_",
"tasks.max": "1",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "5000"
}
}

write Debezium Kafka topic data to Hive through HDFS sink connector not work

I try to capture MySQL data changes with Debezium MySQL connector to Kafka, then write the changes finally to Hive on Hadoop through HDFS Sink connector. The pipeline is like: MySQL -> Kafka -> Hive.
The sink connector is configured like following screenshot.
{
"name": "hdfs-sink",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "1",
"topics": "customers",
"hdfs.url": "hdfs://192.168.10.15:8020",
"flush.size": "3",
"hive.integration": "true",
"hive.database":"inventory",
"hive.metastore.uris":"thrift://192.168.10.14:9083",
"schema.compatibility":"BACKWARD",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id"
}
}
This seems working but querying the Hive table and I see the changed data (wrapped in after key is displayed in after conlumn instead of delimiting the data into original table columns.
Here is the query result scrrenshot.
As you can see in sink configuration, I already try to use Debezium's "io.debezium.transforms.UnwrapFromEnvelope" operator to unwrap the event message but obviously it is not working.
What is the minimal settings that can let me write the DB change events from Kafka to Hive? Is it HDFS Sink connector the right choice for this job?
Update:
I test this with the sample "inventory" database from Debezium database.
I got the test environment from Debezium images so that they should be the lastest ones. Some version info here: debezium 1.0, kafka 2.0, confluent-kafka-connect-hdfs sink connector: 5.4.1.
update 2:
I moved on to use following sink config, but still no luck:
{
"name": "hdfs-sink",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "1",
"topics": "dbserver1.inventory.customers",
"hdfs.url": "hdfs://172.17.0.8:8020",
"flush.size": "3",
"hive.integration": "true",
"hive.database":"inventory",
"hive.metastore.uris":"thrift://172.17.0.8:9083",
"schema.compatibility":"BACKWARD",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false"
}
}

how to dynamically update topics of SinkConnector in Kafka Connect?

I have written a Kafka Connect to consumer topics, but my topic will change at runtime, so I need reconfiguration the topics.
I know that use RESTful API can update topics is there another way to achieve this?
Kafka Connect is intended to be run as a service, it also supports a REST API for managing connectors.
Only way to update this through REST API at run time :
PUT /connectors/{name}/config - to update the connector's configuration parameters at run time.
Request Json Object - config(map)
{
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "20",
"topics": "kafkaConnectTopic",
"hdfs.url": "hdfs://smoketest:9000",
"hadoop.conf.dir": "/etc/hadoop/conf",
"hadoop.home": "/etc/hadoop",
"flush.size": "1000",
"rotate.interval.ms": "100"
}
Response :
{
"name": "hdfs-sink-connector",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "20",
"topics": "kafkaConnectTopic",
"hdfs.url": "hdfs://smoketest:9000",
"hadoop.conf.dir": "/etc/hadoop/conf",
"hadoop.home": "/etc/hadoop",
"flush.size": "1000",
"rotate.interval.ms": "100"
},
"tasks": [
{ "connector": "hdfs-sink-connector", "task": 1 },
{ "connector": "hdfs-sink-connector", "task": 2 },
{ "connector": "hdfs-sink-connector", "task": 3 }
]
}
for further reading you can go through http://docs.confluent.io/3.0.0/connect/userguide.html#connect-administration.
You can specify a list of topics to consume from in the connector configuration if you know ahead of time the set of topics you will want to switch to. Otherwise, the the REST API is the only way to update configurations on the fly.