Kafka MongoDB sink Connector exception handling - apache-kafka

I have created a connector from Kafka to MongoDB to sink the data. In some cases, there is a case in which I got the wrong data on my topic. So that topic sink with the DB at that time it will give me a duplicate key issue due to the index which I created.
But in this case, I want to move that data in dlq. But it is not moving the record.
this is my connector can anyone please help me with this.
{
"name": "test_1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "xxx",
"database": "test",
"collection": "test_record",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://xxx:8081",
"document.id.strategy.overwrite.existing": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy",
"transforms": "hk",
"transforms.hk.type": "org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hk.field": "_id",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"write.method": "upsert",
"errors.tolerance":"all",
"errors.deadletterqueue.topic.name":"dlq_sink",
"errors.deadletterqueue.context.headers.enable":true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}
Thanks,

Related

Only Map objects supported in absence of schema for record conversion to BigQuery format

I'm streaming data from Postgres to Kakfa to Big Query. Most tables in PG have a primary key, as such most tables/topics have an Avro key and value schema, these all go to Big Query fine.
I do have a couple of tables that do not have a PK, and subsequently have no Avro key schema.
When I create a sink connector for those tables the connector errors with,
Caused by: com.wepay.kafka.connect.bigquery.exception.ConversionConnectException: Only Map objects supported in absence of schema for record conversion to BigQuery format.
If I remove the 'key.converter' config then I get 'Top-level Kafka Connect schema must be of type 'struct'' error.
How do I handle this?
Here's the connector config for reference,
{
"project": "staging",
"defaultDataset": "data_lake",
"keyfile": "<redacted>",
"keySource": "JSON",
"sanitizeTopics": "true",
"kafkaKeyFieldName": "_kid",
"autoCreateTables": "true",
"allowNewBigQueryFields": "true",
"upsertEnabled": "false",
"bigQueryRetry": "5",
"bigQueryRetryWait": "120000",
"bigQueryPartitionDecorator": "false",
"name": "hd-sink-bq",
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "<redacted>",
"key.converter.basic.auth.credentials.source": "USER_INFO",
"key.converter.schema.registry.basic.auth.user.info": "<redacted>",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "<redacted>",
"value.converter.basic.auth.credentials.source": "USER_INFO",
"value.converter.schema.registry.basic.auth.user.info": "<redacted>",
"topics": "public.event_issues",
"errors.tolerance": "all",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "connect.bq-sink.deadletter",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true",
"transforms": "tombstoneHandler",
"offset.flush.timeout.ms": "300000",
"transforms.dropNullRecords.predicate": "isNullRecord",
"transforms.dropNullRecords.type": "org.apache.kafka.connect.transforms.Filter",
"transforms.tombstoneHandler.behavior": "drop_warn",
"transforms.tombstoneHandler.type": "io.aiven.kafka.connect.transforms.TombstoneHandler"
}
For my case, I used to handle such case by using the predicate, as following
{
...
"predicates.isTombstone.type":
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone",
"predicates": "isTombstone",
"transforms.x.predicate":"isTombstone",
"transforms.x.negate":true
...
}
This as per the docs here, and the transforms.x.negate will skip such tompStone records.

What kind of data got routed to a dead letter queue topic?

I Have implemented Dead Letter Queues error handling in Kafka. It works and the data are sent to DLQ topics. I am not understanding what types of data got routed in DLQ topics.
1st picture is the data that got routed into DLQ Topics and the second one is the normal data that got sunk into databases.
Does anyone have any idea how does that key got changed as I have used id as a key?
Here is my source and sink properties:
"name": "jdbc_source_postgresql_analytics",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.5.40:5432/abc",
"connection.user": "abc",
"connection.password": "********",
"topic.prefix": "test_",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updatedAt",
"validate.non.null": true,
"table.whitelist": "test",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"catalog.pattern": "public",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id",
"errors.tolerance": "all"
}
}
sink properties:
{
"name": "es_sink_analytics",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"key.converter.schemas.enable": "false",
"topics": "TEST",
"topic.index.map": "TEST:te_test",
"value.converter.schemas.enable": "false",
"connection.url": "http://192.168.10.40:9200",
"connection.username": "******",
"connection.password": "********",
"key.ignore": "false",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq-error-es",
"errors.deadletterqueue.topic.replication.factor": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"schema.ignore": "true",
"error.tolerance":"all"
}
}

Delete document with MongoSinkConnector

I'm able to insert/update documents in mongo but i'm struggling to delete documents.
This is how the data is recorded in kafka topic by a debezium connect from a SQL Server source(the last row is how DELETE operation looks like):
{"user_code":1001} {"user_code":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas#acme.com"}
{"user_code":1002} {"user_code":1002,"first_name":"George","last_name":"Bailey","email":"gbailey#foobar.com"}
{"user_code":1003} {"user_code":1003,"first_name":"Edward","last_name":"Walker","email":"ed#walker.com"}
{"user_code":1003} null
In this example even if after the NULL value, the document with user_code 1003 still in MongoDB.
Below is how I'm configuring my MongoSinkConnector (I've already tried both mongodb.delete.on.null.values and delete.on.null.values but none of them worked):
{
"name": "inventory-connector-sink-2",
"config": {
"connector.class" : "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max" : "1",
"topics": "server1.dbo.customers",
"connection.uri": "mongodb://root:root#mongo:27017/",
"database": "testDB",
"collection": "customers_2",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false,
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.FullKeyStrategy",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy",
"mongodb.delete.on.null.values": true
}
}
I've also tried using PartialValueStrategy but no luck.
PS: I'm working with confluentinc/cp-kafka-connect docker image for my sink connector.

kafka connect sql server incremental changes from cdc

I am very new to Kafka (started reading and setting up in my Sandbox environment from just a week) and trying to setup SQL Server JDBC connector.
I have setup Confluent community as per confluent guide and installed io.debezium.connector.sqlserver.SqlServerConnector using confluent-hub
I enabled CDC on SQL Server Database and required table and it is working fine.
I have tried following connectors (one at-a-time):
io.debezium.connector.sqlserver.SqlServerConnector
io.confluent.connect.jdbc.JdbcSourceConnector
both are loading fine with status of connector and task running fine with no errors as seen below:
Here is my io.confluent.connect.jdbc.JdbcSourceConnector confuguration
{
"name": "mssql-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "CreatedDateTime",
"query": "select * from dbo.sampletable",
"tasks.max": "1",
"table.types": "TABLE",
"key.converter.schemas.enable": "false",
"topic.prefix": "data_",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=sampledb",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "120000"
}
}
Here is my io.confluent.connect.jdbc.JdbcSourceConnector connector
{
"name": "mssql-connector",
"config": {
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max" : "1",
"database.server.name" : "SQL2016",
"database.hostname" : "SQL2016",
"database.port" : "1433",
"database.user" : "kafka",
"database.password" : "kafkaPassword#789",
"database.dbname" : "sampleDb",
"database.history.kafka.bootstrap.servers" : "kafkanode1:9092",
"database.history.kafka.topic": "schema-changes.sampleDb"
}
}
Both connectors are creating snapshot of a table in a topic (means it pulls all the rows initially)
but when I make changes to a table "sampletable" (insert/update/delete), those changes are not being pulled to kafka.
can someone please help me understand how to make CDC working with Kafka?
Thanks
This seem to have worked 100%. I am posting answer just in case someone like me stuck on jdbc source connector.
{
"name": "piilog-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "incrementing",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=SIAudit",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"query": "select * from dbo.sampletable",
"incrementing.column.name": "Id",
"validate.non.null": false,
"topic.prefix": "data_",
"tasks.max": "1",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "5000"
}
}

Kafka Connect date handling of debezium generated events

I'm using debezium SQL Server to track changes on a production base.
The topic is created, CDC is working like a charm, but when trying to use jdbcSinkConnector to dump the data in another Sql Server DB, I'm encountering the following error.
com.microsoft.sqlserver.jdbc.SQLServerException: One or more values is out of range of values for the datetime2 SQL Server data type
On the source database the sql datatype is timestamp2(7).
The kafka event is 1549461754650000000.
The schema type is INT64.
The schema name io.debezium.time.NanoTimestamp.
I can't find a way to tell the TimestampConverter that is value isn't expressed in millis, or micros, but nanoseconds (would not work with microseconds anyway).
here is my connector configuration
{
"name": "cdc.swip.bi.ods.sink.contract",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "swip.swip_core.contract",
"connection.url": "jdbc:sqlserver://someip:1234;database=DB",
"connection.user": "loloolololo",
"connection.password": "muahahahahaha",
"dialect.name": "SqlServerDatabaseDialect",
"auto.create": "false",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schemas.enable": "true",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "unwrap,created_date,modified_date",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.created_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.created_date.target.type": "Timestamp",
"transforms.created_date.field": "created_date",
"transforms.modified_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.modified_date.target.type": "Timestamp",
"transforms.modified_date.field": "modified_date",
"insert.mode": "insert",
"delete.enabled": "false",
"pk.fields": "id",
"pk.mode": "record_value",
"schema.registry.url": "http://localhost:8081",
"table.name.format": "ODS.swip.contract"
}
}
there is a missing feature in the SQL Server connector - DBZ-1419.
You can workaround the problem by writing your own SMT that would do the field conversion on the sink side before it is processed by the JDBC connector.
I forgort to post the answer.
The property "time.precision.mode":"connect" does the trick
https://debezium.io/documentation/reference/connectors/sqlserver.html#sqlserver-property-time-precision-mode
{
"name":"debezium-connector-sqlserver",
"config": {
"connector.class":"io.debezium.connector.sqlserver.SqlServerConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"database.hostname":"someHost",
"database.port":"somePort",
"database.user":"someUser",
"database.password":"somePassword",
"database.dbname":"someDb",
"database.server.name":"xxx.xxx",
"database.history.kafka.topic":"xxx.xxx.history",
"time.precision.mode":"connect",
"database.history.kafka.bootstrap.servers":"example.com:9092"
}
}