What are the extra topics created when creating and debezium source connector - apache-kafka

Q1) Following is my config which I used while creating the kafka connector for MySQL source.
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"database.user": "cdc_user",
"tasks.max": "3",
"database.history.kafka.bootstrap.servers": "10.49.115.X:9092,10.48.X.211:9092,10.X.178.121:9092,10.53.4.X:9092",
"database.history.kafka.topic": "history.cdc.fkw.supply.mp.seller_platform",
"database.server.name": "cdc.fkw.supply.mp",
"heartbeat.interval.ms": "5000",
"database.port": "3306",
"table.whitelist": "seller_platform.Contacts, seller_platform.EmailVerificationConfigs, seller_platform.financial_account_tag, seller_platform.HolidayConfigs, seller_platform.Preferences, seller_platform.Sellers",
"database.hostname": "something.cloud.in",
"database.password": "ABCDE",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"name": "cdc.fkw.supply.mp.seller_platform.connector",
"database.history.skip.unparseable.ddl": "true",
"errors.tolerance": "all",
"database.whitelist": "seller_platform",
"snapshot.mode": "when_needed"
}
curl -s --location --request GET "http://10.24.18.167:80/connectors/cdc.fkw.supply.mp.seller_platform.connector/topics" | jq '.'
{
"cdc.fkw.supply.mp.seller_platform.connector": {
"topics": [
"cdc.fkw.supply.mp.seller_platform.Sellers",
"cdc.fkw.supply.mp",
"cdc.fkw.supply.mp.seller_platform.HolidayConfigs",
"cdc.fkw.supply.mp.seller_platform.EmailVerificationConfigs",
"cdc.fkw.supply.mp.seller_platform.Contacts",
"cdc.fkw.supply.mp.seller_platform.Preferences",
"__debezium-heartbeat.cdc.fkw.supply.mp",
"cdc.fkw.supply.mp.seller_platform.financial_account_tag"
]
}
}
Why cdc.fkw.supply.mp and __debezium-heartbeat.cdc.fkw.supply.mp topic gets created?
I see some garbage data inside these 2 topics.
Q2)
Is there any rest api to know the kafka connect converter configuration on the worker server?
If there is no API, then what is the path of the the configuration file where we store all worker properties?
This is the link of the worker properties:
https://docs.confluent.io/platform/current/connect/references/allconfigs.html
curl -s --location --request GET "http://10.24.18.167:80"
{"version":"6.1.1-ccs","commit":"c209f70c6c2e52ae","kafka_cluster_id":"snBlf-kfTdCYWEO9IIEXTA"}%

A1)
The heartbeat topic stores the details of all the kafka topics which the connector is using so that the connector can send heartbeat to it.
the database.server.name value named topic is created to store any schema changes that takes place in the database.
https://debezium.io/documentation/reference/1.7/connectors/mysql.html#mysql-schema-change-topic

Related

HTTP Sink Connector not Batching the messages

I am using below HTTP Sink connector config and it is still sending records one by one. It is supposed to send data in a batch of 50 messages.
{
"name": "HTTPSinkConnector_1",
"config": {
"topics": "topic_1",
"tasks.max": "1",
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"http.api.url": "http://localhost/messageHandler",
"request.method": "POST",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"confluent.topic.bootstrap.servers": "kafka:19092",
"confluent.topic.replication.factor": "1",
"batching.enabled": true,
"batch.max.size": 50,
"reporter.bootstrap.servers": "kafka:19092",
"reporter.result.topic.name": "success-responses",
"reporter.result.topic.replication.factor": "1",
"reporter.error.topic.name": "error-responses",
"reporter.error.topic.replication.factor": "1",
"request.body.format": "json"
}
}
Could someone please suggest if any another property is missing?
The HTTP Sink connector does not batch requests for messages containing Kafka header values that are different.
https://docs.confluent.io/kafka-connectors/http/current/overview.html#features
The workaround would be to either:
remove the headers
use Kafka Streams to manually window the data on your own into a new topic, which the connector reads

Produce Avro messages in Confluent Control Center UI

To develop a data transfer application I need first define a key/value avro schemas. The producer application is not developed yet till define the avro schema.
I cloned a topic and its key/value avro schemas that are already working and
and also cloned the the jdbc snink connector. Simply I just changed the topic and connector names.
Then I copied and existing message successfully sent sink using Confluent Topic Message UI Producer.
But it is sending the error: "Unknown magic byte!"
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.<init>(AbstractKafkaAvroDeserializer.java:323)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:164)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:172)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:107)
... 17 more
[2022-07-25 03:45:42,385] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask)
Reading other questions it seems the message has to be serialized using the schema.
Unknown magic byte with kafka-avro-console-consumer
is it possible to send a message to a topic with AVRO key/value schemas using the Confluent Topic UI?
Any idea if the avro schemas need information depending on the connector/source? or if namespace depends on the topic name?
This is my key schema. And the topic's name is knov_03
{
"connect.name": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming.Key",
"fields": [
{
"name": "id_sap_incoming",
"type": "long"
}
],
"name": "Key",
"namespace": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming",
"type": "record"
}
Connector:
{
"name": "knov_05",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "knov_03",
"connection.url": "jdbc:mysql://eXXXXX:3306/MY_DB_SCHEMA?useSSL=FALSE&nullCatalogMeansCurrent=true",
"connection.user": "USER",
"connection.password": "PASSWORD",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
"pk.fields": "id_sap_incoming",
"auto.create": "true",
"auto.evolve": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schema.registry.url": "http://schema-registry:8081"
}
}
Thanks.

What property to use in kafka mysql source connector to register a new version for any schema change

This is the configuration of schema-registry.properties
listeners=http://10.X.X.76:8081
kafkastore.bootstrap.servers=PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXX:9092,PLAINTEXT://10.XXXX.1:9092,PLAINTEXT://1XXXX.69:9092
kafkastore.topic=_schemas
debug=false
master.eligibility=true
This is the configuration of my connector,
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"database.user": "cdc_user",
"tasks.max": "3",
"database.history.kafka.bootstrap.servers": "10.49.115.249:9092,10.48.130.211:9092,10.54.178.121:9092,10.53.4.69:9092",
"database.history.kafka.topic": "history.cdc.fkw.supply.mp.seller_facility",
"database.server.name": "cdc.fkw.supply.mp",
"heartbeat.interval.ms": "5000",
"database.port": "3306",
"table.whitelist": "seller_facility.addresses, seller_facility.location, seller_facility.default_location, seller_facility.location_document_mapping",
"database.hostname": "dog-rr.ffb-supply-ffb-supply-mp.prod.altair.fkcloud.in",
"database.password": "6X5DpJrVzI",
"database.history.kafka.recovery.poll.interval.ms": "5000",
"name": "cdc.fkw.supply.mp.seller_facility.connector",
"database.history.skip.unparseable.ddl": "true",
"errors.tolerance": "all",
"database.whitelist": "seller_facility",
"snapshot.mode": "when_needed"
}
How do I register a new schema when there is any change in the schema?
What property can I add to do so, so that it just adds a new version to schema registry for that particular topic and is fully compatible.
Assuming your key/value.converter are using one of the Confluent ones, such as AvroConverter for example, any new/removed database columns will automatically be picked up by the Connect framework, and registered to the Registry as part of the serialization in the KafkaAvroSerializer process.
Changing the database column types might generate errors, for example, changing VARCHAR to INT

Error handling for invalid JSON in kafka sink connector

I have a sink connector for mongodb, that takes json from a topic and puts it into the mongoDB collection. But, when I send an invalid JSON from a producer to that topic (e.g. with an invalid special character ") => {"id":1,"name":"\"}, the connector stops. I tried using errors.tolerance = all, but the same thing is happening. What should happen is that the connector should skip and log that invalid JSON, and keep the connector running. My distributed-mode connector is as follows:
{
"name": "sink-mongonew_test1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "error7",
"connection.uri": "mongodb://****:27017",
"database": "abcd",
"collection": "abc",
"type.name": "kafka-connect",
"key.ignore": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list": "id",
"value.projection.type": "whitelist",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"delete.on.null.values": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "crm_data_deadletterqueue",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}
Since Apache Kafka 2.0, Kafka Connect has included error handling options, including the functionality to route messages to a dead letter queue, a common technique in building data pipelines.
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
As commented, you're using connect-api-1.0.1.*.jar, version 1.0.1, so that explains why those properties are not working
Your alternatives outside of running a newer version of Kafka Connect include Nifi or Spark Structured Streaming

Kafka connect jdbc source mssql server loading millions record throwing out of memory error

I have tried to load 77 millions of record from MSSQL server to Kafka topic through Kafka connect JDBC source.
Tried batch approach given batch.max.rows as 1000. In this case, after 1000 records, it's throughout of memory. Please share suggestions on how to make it works
Below are connector approach i tried
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_rsitem_pollx",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_pollx-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"max.poll.records" :"100",
"max.poll.interval.ms":"3000",
"validate.non.null": false
}
}'
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
"name": "mssql_jdbc_test_polly",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "10",
"connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx;defaultFetchSize=10000;useCursorFetch=true",
"connection.user": "xxxx",
"connection.password": "xxxx",
"topic.prefix": "mssql-rsitem_polly-",
"mode":"incrementing",
"table.whitelist" : "test",
"timestamp.column.name": "itemid",
"poll.interval.ms":"86400000",
"validate.non.null": false
}
}'
try to increase Java heap size, write in command line:
export KAFKA_HEAP_OPTS="-Xms1g -Xmx2g"
you can change the "Xmx2g" part to match your capacity.