How to configure kafka connect use Avro Schema?

How to configure kafka connect use Avro Schema? - apache-kafka

I have starting to learn Avro. i want to implement it in kafka connect. I use a configuration like the following. Is this the right configuration?
{
"name": "surveyWawancara-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"key.deserializer": "org.apache.kafka.connect.json.JsonDeserializer",
"database.user": "Acquisition.ro",
"database.dbname": "acquisition",
"value.deserializer": "org.apache.kafka.connect.json.JsonDeserializer",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"database.history.kafka.bootstrap.servers": "beta-kafka-brokers.amq-streams-beta.svc:9092",
"database.history.kafka.topic": "schema-changes.sl.surveyWawancara",
"time.precision.mode": "connect",
"database.server.name": "beta-sl-bn",
"database.port": "1433",
"table.whitelist": "dbo.SurveyWawancara",
"key.converter.schemas.enable": "true",
"database.hostname": "10.7.76.62",
"database.password": "Acquisition_ro231!",
"value.converter.schemas.enable": "true",
"name": "surveyWawancara-connector",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}

You've duplicated the converter fields, but these properties are correct, yes
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
Avro always has a schema, so these do nothing *.schemas.enable and can be removed. Similarly, the deserializer class configs are not applicable to Connect and are encompassed by the converter configs, so should be removed as well
Worth mentioning that the key format does not have to (and often doesn't) match the value's

Related

MongoDbConnector publish multiple collections to only topic Kafka

Below is my MongoDbConnector configuration:
{
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"collection.include.list": "dbname.messages,dbname.comments",
"mongodb.password": "mongodbpassword",
"tasks.max": "1",
"database.history.kafka.topic": "dev.dbhistory.unwrap_with_key_id_8",
"mongodb.user": "mongodbuser",
"heartbeat.interval.ms": "90000",
"mongodb.name": "analytics",
"snapshot.delay.ms": "120000",
"key.converter.schemas.enable": "false",
"poll.interval.ms": "3000",
"value.converter.schemas.enable": "false",
"mongodb.authsource": "admin",
"errors.tolerance": "all",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"mongodb.hosts": "rs0/ip:27017",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.include.list": "dbname",
"snapshot.mode": "initial"
}
I need this publish to only topic, but it create two topic is analytics.dbname.messages and analytics.dbname.messages. How can i do?
My english is not good! Thanks!

Debezium stops after initial sync

The initial sync works as expected but then the connector just stops and does not care about further table changes. There are no errors thrown and the connector is still marked as active and running.
Database: Amazon Postgres v10.7
Debezium config:
"name": "postgres_cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "...",
"database.port": "5432",
"database.user": "...",
"database.password": "...",
"database.dbname": "...",
"database.server.name": "...",
"table.whitelist": "public.table1,public.table2,public.table3",
"plugin.name": "pgoutput",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "unwrap, route, extractId",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": false,
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "[^.]+\\.[^.]+\\.(.+)",
"transforms.route.replacement": "postgres_$1",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id"
}
}
Any thoughts about what the problem could be?
Edit:
Log-Errors:
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to flush, timed out while waiting for producer to flush outstanding 75687 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)

Adding new Debezium Connector to Apache Kafka restarts snapshot

I'm using debezium/Kafka(v1.0) that uses Apache Kafka 2.4.
In addition, I have deployed a debezium mysql connector that monitor some tables configured to take snapshot at beginning, at this point is all good.
After a time I have the need to start monitoring other tables, so I create another connector, this time without snapshot because is not needed.
This produces that the first connector start to take the snapshot again.
Is this an expected behavior?
How is the procedure to start monitoring new tables without the others connector take the snapshot again?
thanks in advance.
EDIT configs added:
First Connector
{
"name": "orion_connector_con_snapshot_prod_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "orion_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "history_orion",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "my_db.my_table_1,my_db.my_table_2,my_db.my_table_3",
"snapshot.mode": "when_needed",
"snapshot.locking.mode": "none"
}
The second connector that start the problem
{
"name": "nexo_impactos_connector_sin_snapshot_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "nexo_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "nexo_impactos",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "other_db.other_table",
"snapshot.mode": "schema_only",
"snapshot.locking.mode": "none"
}

Multiple replication slot for debezium connector

I want to create multiple debezium connector with different replication slot. But I am Unable to create multiple replication slot for postgres debezium connector.
I am using docker container for Postgres & kafka. I tried setting up max_replication_slots = 2 in postgressql.conf file & also different slot.name. but still it did not create 2 replication slot for me.
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702771",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium1",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702771",
"tasks": [],
"type": "source"
}
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702772",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium2",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702772",
"tasks": [],
"type": "source"
}
It creates multiple connector but not multiple replication slot even after giving different slot name. Do I need to do anything over here.

Kafka Connect date handling of debezium generated events

I'm using debezium SQL Server to track changes on a production base.
The topic is created, CDC is working like a charm, but when trying to use jdbcSinkConnector to dump the data in another Sql Server DB, I'm encountering the following error.
com.microsoft.sqlserver.jdbc.SQLServerException: One or more values is out of range of values for the datetime2 SQL Server data type
On the source database the sql datatype is timestamp2(7).
The kafka event is 1549461754650000000.
The schema type is INT64.
The schema name io.debezium.time.NanoTimestamp.
I can't find a way to tell the TimestampConverter that is value isn't expressed in millis, or micros, but nanoseconds (would not work with microseconds anyway).
here is my connector configuration
{
"name": "cdc.swip.bi.ods.sink.contract",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "swip.swip_core.contract",
"connection.url": "jdbc:sqlserver://someip:1234;database=DB",
"connection.user": "loloolololo",
"connection.password": "muahahahahaha",
"dialect.name": "SqlServerDatabaseDialect",
"auto.create": "false",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schemas.enable": "true",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "unwrap,created_date,modified_date",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.created_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.created_date.target.type": "Timestamp",
"transforms.created_date.field": "created_date",
"transforms.modified_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.modified_date.target.type": "Timestamp",
"transforms.modified_date.field": "modified_date",
"insert.mode": "insert",
"delete.enabled": "false",
"pk.fields": "id",
"pk.mode": "record_value",
"schema.registry.url": "http://localhost:8081",
"table.name.format": "ODS.swip.contract"
}
}

there is a missing feature in the SQL Server connector - DBZ-1419.
You can workaround the problem by writing your own SMT that would do the field conversion on the sink side before it is processed by the JDBC connector.

I forgort to post the answer.
The property "time.precision.mode":"connect" does the trick
https://debezium.io/documentation/reference/connectors/sqlserver.html#sqlserver-property-time-precision-mode
{
"name":"debezium-connector-sqlserver",
"config": {
"connector.class":"io.debezium.connector.sqlserver.SqlServerConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"database.hostname":"someHost",
"database.port":"somePort",
"database.user":"someUser",
"database.password":"somePassword",
"database.dbname":"someDb",
"database.server.name":"xxx.xxx",
"database.history.kafka.topic":"xxx.xxx.history",
"time.precision.mode":"connect",
"database.history.kafka.bootstrap.servers":"example.com:9092"
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to configure kafka connect use Avro Schema? - apache-kafka

Related

MongoDbConnector publish multiple collections to only topic Kafka

Debezium stops after initial sync

Adding new Debezium Connector to Apache Kafka restarts snapshot

Multiple replication slot for debezium connector

Kafka Connect date handling of debezium generated events

Categories

Resources