Change the topic name in a HDFS2 SINK CONNECTOR integrated with HIVE - apache-kafka

Good morning
When I work with a HDFS2 connector sink, integrated with HIVE, the database table get the name of the topic. Is there a way to choice the name of the table?.
This is the config of my conector:
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"hive.integration": "true",
"hive.database": "databaseEze",
"hive.metastore.uris": "thrift://server1.dc.es.arioto:9083",
"transforms.InsertField.timestamp.field": "carga",
"flush.size": "100000",
"tasks.max": "2",
"timezone": "Europe/Paris",
"transforms": "RenameField,InsertField,carga_format",
"rotate.interval.ms": "900000",
"locale": "en-GB",
"logs.dir": "/logs",
"format.class": "io.confluent.connect.hdfs.avro.AvroFormat",
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms.RenameField.renames": "var1:Test1,var2:Test2,var3:test3",
"transforms.carga_format.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.carga_format.target.type": "string",
"transforms.carga_format.format": "yyyyMMdd",
"hadoop.conf.dir": "/etc/hadoop/",
"schema.compatibility": "BACKWARD",
"topics": "Skiel-Tracking-Replicator",
"hdfs.url": "hdfs://database/user/datavaseEze/",
"transforms.InsertField.topic.field": "ds_topic",
"partition.field.name": "carga",
"transforms.InsertField.partition.field": "test_partition",
"value.converter.schema.registry.url": "http://schema-registry-eze-dev.ocjc.serv.dc.es.arioto",
"partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitioner",
"name": "KAFKA-HDFS-HIVE-TEST",
"transforms.fx_carga_format.field": "carga",
"transforms.InsertField.offset.field": "test_offset"
}
With that config, the table will name **Skiel-Tracking-Replicator** and I want that the table name will be d9nvtest.

You can use the RegexRouter Single Message Transform to modify the topic name.
{
"transforms" : "renameTopic",
"transforms.renameTopic.type" : "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex" : "Skiel-Tracking-Replicator",
"transforms.renameTopic.replacement": "d9nvtest"
}
See https://rmoff.net/2020/12/11/twelve-days-of-smt-day-4-regexrouter/

While using RegexRouter with kafka-connect-hdfs, this issue occurs - https://github.com/confluentinc/kafka-connect-hdfs/issues/236
Last comment here specifies that these two are conceptually incompatible.

Related

MongoDbConnector publish multiple collections to only topic Kafka

Below is my MongoDbConnector configuration:
{
"connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
"collection.include.list": "dbname.messages,dbname.comments",
"mongodb.password": "mongodbpassword",
"tasks.max": "1",
"database.history.kafka.topic": "dev.dbhistory.unwrap_with_key_id_8",
"mongodb.user": "mongodbuser",
"heartbeat.interval.ms": "90000",
"mongodb.name": "analytics",
"snapshot.delay.ms": "120000",
"key.converter.schemas.enable": "false",
"poll.interval.ms": "3000",
"value.converter.schemas.enable": "false",
"mongodb.authsource": "admin",
"errors.tolerance": "all",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"mongodb.hosts": "rs0/ip:27017",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.include.list": "dbname",
"snapshot.mode": "initial"
}
I need this publish to only topic, but it create two topic is analytics.dbname.messages and analytics.dbname.messages. How can i do?
My english is not good! Thanks!

Kafka MongoDB sink Connector exception handling

I have created a connector from Kafka to MongoDB to sink the data. In some cases, there is a case in which I got the wrong data on my topic. So that topic sink with the DB at that time it will give me a duplicate key issue due to the index which I created.
But in this case, I want to move that data in dlq. But it is not moving the record.
this is my connector can anyone please help me with this.
{
"name": "test_1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "test",
"connection.uri": "xxx",
"database": "test",
"collection": "test_record",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://xxx:8081",
"document.id.strategy.overwrite.existing": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy",
"transforms": "hk",
"transforms.hk.type": "org.apache.kafka.connect.transforms.HoistField$Key",
"transforms.hk.field": "_id",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"write.method": "upsert",
"errors.tolerance":"all",
"errors.deadletterqueue.topic.name":"dlq_sink",
"errors.deadletterqueue.context.headers.enable":true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}
Thanks,

What kind of data got routed to a dead letter queue topic?

I Have implemented Dead Letter Queues error handling in Kafka. It works and the data are sent to DLQ topics. I am not understanding what types of data got routed in DLQ topics.
1st picture is the data that got routed into DLQ Topics and the second one is the normal data that got sunk into databases.
Does anyone have any idea how does that key got changed as I have used id as a key?
Here is my source and sink properties:
"name": "jdbc_source_postgresql_analytics",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://192.168.5.40:5432/abc",
"connection.user": "abc",
"connection.password": "********",
"topic.prefix": "test_",
"mode": "timestamp+incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updatedAt",
"validate.non.null": true,
"table.whitelist": "test",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": false,
"value.converter.schemas.enable": false,
"catalog.pattern": "public",
"transforms": "createKey,extractInt",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id",
"errors.tolerance": "all"
}
}
sink properties:
{
"name": "es_sink_analytics",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"key.converter.schemas.enable": "false",
"topics": "TEST",
"topic.index.map": "TEST:te_test",
"value.converter.schemas.enable": "false",
"connection.url": "http://192.168.10.40:9200",
"connection.username": "******",
"connection.password": "********",
"key.ignore": "false",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq-error-es",
"errors.deadletterqueue.topic.replication.factor": "1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"schema.ignore": "true",
"error.tolerance":"all"
}
}

kafka connect sql server incremental changes from cdc

I am very new to Kafka (started reading and setting up in my Sandbox environment from just a week) and trying to setup SQL Server JDBC connector.
I have setup Confluent community as per confluent guide and installed io.debezium.connector.sqlserver.SqlServerConnector using confluent-hub
I enabled CDC on SQL Server Database and required table and it is working fine.
I have tried following connectors (one at-a-time):
io.debezium.connector.sqlserver.SqlServerConnector
io.confluent.connect.jdbc.JdbcSourceConnector
both are loading fine with status of connector and task running fine with no errors as seen below:
Here is my io.confluent.connect.jdbc.JdbcSourceConnector confuguration
{
"name": "mssql-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "timestamp",
"timestamp.column.name": "CreatedDateTime",
"query": "select * from dbo.sampletable",
"tasks.max": "1",
"table.types": "TABLE",
"key.converter.schemas.enable": "false",
"topic.prefix": "data_",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=sampledb",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "120000"
}
}
Here is my io.confluent.connect.jdbc.JdbcSourceConnector connector
{
"name": "mssql-connector",
"config": {
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max" : "1",
"database.server.name" : "SQL2016",
"database.hostname" : "SQL2016",
"database.port" : "1433",
"database.user" : "kafka",
"database.password" : "kafkaPassword#789",
"database.dbname" : "sampleDb",
"database.history.kafka.bootstrap.servers" : "kafkanode1:9092",
"database.history.kafka.topic": "schema-changes.sampleDb"
}
}
Both connectors are creating snapshot of a table in a topic (means it pulls all the rows initially)
but when I make changes to a table "sampletable" (insert/update/delete), those changes are not being pulled to kafka.
can someone please help me understand how to make CDC working with Kafka?
Thanks
This seem to have worked 100%. I am posting answer just in case someone like me stuck on jdbc source connector.
{
"name": "piilog-connector",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "incrementing",
"value.converter.schemas.enable": "false",
"connection.url": "jdbc:sqlserver://SQL2016:1433;databaseName=SIAudit",
"connection.user": "kafka",
"connection.password": "kafkaPassword#789",
"query": "select * from dbo.sampletable",
"incrementing.column.name": "Id",
"validate.non.null": false,
"topic.prefix": "data_",
"tasks.max": "1",
"poll.interval.ms": "5000",
"table.poll.interval.ms": "5000"
}
}

Kafka Connect date handling of debezium generated events

I'm using debezium SQL Server to track changes on a production base.
The topic is created, CDC is working like a charm, but when trying to use jdbcSinkConnector to dump the data in another Sql Server DB, I'm encountering the following error.
com.microsoft.sqlserver.jdbc.SQLServerException: One or more values is out of range of values for the datetime2 SQL Server data type
On the source database the sql datatype is timestamp2(7).
The kafka event is 1549461754650000000.
The schema type is INT64.
The schema name io.debezium.time.NanoTimestamp.
I can't find a way to tell the TimestampConverter that is value isn't expressed in millis, or micros, but nanoseconds (would not work with microseconds anyway).
here is my connector configuration
{
"name": "cdc.swip.bi.ods.sink.contract",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "swip.swip_core.contract",
"connection.url": "jdbc:sqlserver://someip:1234;database=DB",
"connection.user": "loloolololo",
"connection.password": "muahahahahaha",
"dialect.name": "SqlServerDatabaseDialect",
"auto.create": "false",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schemas.enable": "true",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "unwrap,created_date,modified_date",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.created_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.created_date.target.type": "Timestamp",
"transforms.created_date.field": "created_date",
"transforms.modified_date.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.modified_date.target.type": "Timestamp",
"transforms.modified_date.field": "modified_date",
"insert.mode": "insert",
"delete.enabled": "false",
"pk.fields": "id",
"pk.mode": "record_value",
"schema.registry.url": "http://localhost:8081",
"table.name.format": "ODS.swip.contract"
}
}
there is a missing feature in the SQL Server connector - DBZ-1419.
You can workaround the problem by writing your own SMT that would do the field conversion on the sink side before it is processed by the JDBC connector.
I forgort to post the answer.
The property "time.precision.mode":"connect" does the trick
https://debezium.io/documentation/reference/connectors/sqlserver.html#sqlserver-property-time-precision-mode
{
"name":"debezium-connector-sqlserver",
"config": {
"connector.class":"io.debezium.connector.sqlserver.SqlServerConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"database.hostname":"someHost",
"database.port":"somePort",
"database.user":"someUser",
"database.password":"somePassword",
"database.dbname":"someDb",
"database.server.name":"xxx.xxx",
"database.history.kafka.topic":"xxx.xxx.history",
"time.precision.mode":"connect",
"database.history.kafka.bootstrap.servers":"example.com:9092"
}
}