MongoDbConnector publish multiple collections to only topic Kafka - mongodb

Below is my MongoDbConnector configuration:
{
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"collection.include.list": "dbname.messages,dbname.comments",
"mongodb.password": "mongodbpassword",
"tasks.max": "1",
"database.history.kafka.topic": "dev.dbhistory.unwrap_with_key_id_8",
"mongodb.user": "mongodbuser",
"heartbeat.interval.ms": "90000",
"mongodb.name": "analytics",
"snapshot.delay.ms": "120000",
"key.converter.schemas.enable": "false",
"poll.interval.ms": "3000",
"value.converter.schemas.enable": "false",
"mongodb.authsource": "admin",
"errors.tolerance": "all",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"mongodb.hosts": "rs0/ip:27017",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.include.list": "dbname",
"snapshot.mode": "initial"
}
I need this publish to only topic, but it create two topic is analytics.dbname.messages and analytics.dbname.messages. How can i do?
My english is not good! Thanks!

Related

How to configure kafka connect use Avro Schema?

I have starting to learn Avro. i want to implement it in kafka connect. I use a configuration like the following. Is this the right configuration?
{
"name": "surveyWawancara-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"key.deserializer": "org.apache.kafka.connect.json.JsonDeserializer",
"database.user": "Acquisition.ro",
"database.dbname": "acquisition",
"value.deserializer": "org.apache.kafka.connect.json.JsonDeserializer",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"database.history.kafka.bootstrap.servers": "beta-kafka-brokers.amq-streams-beta.svc:9092",
"database.history.kafka.topic": "schema-changes.sl.surveyWawancara",
"time.precision.mode": "connect",
"database.server.name": "beta-sl-bn",
"database.port": "1433",
"table.whitelist": "dbo.SurveyWawancara",
"key.converter.schemas.enable": "true",
"database.hostname": "10.7.76.62",
"database.password": "Acquisition_ro231!",
"value.converter.schemas.enable": "true",
"name": "surveyWawancara-connector",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}
You've duplicated the converter fields, but these properties are correct, yes
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
Avro always has a schema, so these do nothing *.schemas.enable and can be removed. Similarly, the deserializer class configs are not applicable to Connect and are encompassed by the converter configs, so should be removed as well
Worth mentioning that the key format does not have to (and often doesn't) match the value's

What kind of data got routed to a dead letter queue topic?

I Have implemented Dead Letter Queues error handling in Kafka. It works and the data are sent to DLQ topics. I am not understanding what types of data got routed in DLQ topics.
1st picture is the data that got routed into DLQ Topics and the second one is the normal data that got sunk into databases.
Does anyone have any idea how does that key got changed as I have used id as a key?
Here is my source and sink properties:
"name": "jdbc_source_postgresql_analytics",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.5.40:5432/abc",
"connection.user": "abc",
"connection.password": "********",
"topic.prefix": "test_",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updatedAt",
"validate.non.null": true,
"table.whitelist": "test",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"catalog.pattern": "public",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id",
"errors.tolerance": "all"
}
}
sink properties:
{
"name": "es_sink_analytics",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"key.converter.schemas.enable": "false",
"topics": "TEST",
"topic.index.map": "TEST:te_test",
"value.converter.schemas.enable": "false",
"connection.url": "http://192.168.10.40:9200",
"connection.username": "******",
"connection.password": "********",
"key.ignore": "false",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq-error-es",
"errors.deadletterqueue.topic.replication.factor": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"schema.ignore": "true",
"error.tolerance":"all"
}
}

Debezium stops after initial sync

The initial sync works as expected but then the connector just stops and does not care about further table changes. There are no errors thrown and the connector is still marked as active and running.
Database: Amazon Postgres v10.7
Debezium config:
"name": "postgres_cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "...",
"database.port": "5432",
"database.user": "...",
"database.password": "...",
"database.dbname": "...",
"database.server.name": "...",
"table.whitelist": "public.table1,public.table2,public.table3",
"plugin.name": "pgoutput",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "unwrap, route, extractId",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": false,
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "[^.]+\\.[^.]+\\.(.+)",
"transforms.route.replacement": "postgres_$1",
"transforms.extractId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractId.field": "id"
}
}
Any thoughts about what the problem could be?
Edit:
Log-Errors:
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to flush, timed out while waiting for producer to flush outstanding 75687 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
ERROR WorkerSourceTask{id=postgres_cdc-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)

Adding new Debezium Connector to Apache Kafka restarts snapshot

I'm using debezium/Kafka(v1.0) that uses Apache Kafka 2.4.
In addition, I have deployed a debezium mysql connector that monitor some tables configured to take snapshot at beginning, at this point is all good.
After a time I have the need to start monitoring other tables, so I create another connector, this time without snapshot because is not needed.
This produces that the first connector start to take the snapshot again.
Is this an expected behavior?
How is the procedure to start monitoring new tables without the others connector take the snapshot again?
thanks in advance.
EDIT configs added:
First Connector
{
"name": "orion_connector_con_snapshot_prod_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "orion_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "history_orion",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "my_db.my_table_1,my_db.my_table_2,my_db.my_table_3",
"snapshot.mode": "when_needed",
"snapshot.locking.mode": "none"
}
The second connector that start the problem
{
"name": "nexo_impactos_connector_sin_snapshot_v1",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.hostname": "my_host",
"database.port": "3306",
"database.user": "my_db",
"database.password": "*********************",
"database.server.name": "nexo_kafka",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "nexo_impactos",
"database.history.skip.unparseable.ddl": "true",
"database.history.store.only.monitored.tables.ddl": "true",
"table.whitelist": "other_db.other_table",
"snapshot.mode": "schema_only",
"snapshot.locking.mode": "none"
}

Multiple replication slot for debezium connector

I want to create multiple debezium connector with different replication slot. But I am Unable to create multiple replication slot for postgres debezium connector.
I am using docker container for Postgres & kafka. I tried setting up max_replication_slots = 2 in postgressql.conf file & also different slot.name. but still it did not create 2 replication slot for me.
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702771",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium1",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702771",
"tasks": [],
"type": "source"
}
{
"config": {
"batch.size": "49152",
"buffer.memory": "100663296",
"compression.type": "lz4",
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.dbname": "Db1",
"database.hostname": "DBhost",
"database.password": "dbpwd",
"database.port": "5432",
"database.server.name": "serve_name",
"database.user": "usename",
"decimal.handling.mode": "double",
"hstore.handling.mode": "json",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"name": "debezium-702772",
"plugin.name": "wal2json",
"schema.refresh.mode": "columns_diff_exclude_unchanged_toast",
"slot.drop_on_stop": "true",
"slot.name": "debezium2",
"table.whitelist": "tabel1",
"time.precision.mode": "adaptive_time_microseconds",
"transforms": "Reroute",
"transforms.Reroute.topic.regex": "(.*).public.(.*)",
"transforms.Reroute.topic.replacement": "$1.$2",
"transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
},
"name": "debezium-702772",
"tasks": [],
"type": "source"
}
It creates multiple connector but not multiple replication slot even after giving different slot name. Do I need to do anything over here.