Unable to trigger Custom Producer Interceptor via JDBC Source Connector - apache-kafka

I have created a custom Producer Interceptor (AuditProducerInterceptor) which accepts some custom configs(application_id, type etc.). I have generated a jar from the AuditProducerInterceptor project and placed the jar inside kafka-connect at /usr/share/java/monitoring-interceptors. When i try to post JDBC-Source connector with below configurations, my audit interceptor is not triggered.
{
"name": "jdbc-source-xx-xxxx-xxx-xxx",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:sqlserver://{{ip}}:1433;databaseName=XX;useNTLMv2=true",
"connection.user": "SA",
"connection.password": "Admin1234",
"producer.interceptor.classes": "com.optum.payer.common.kafka.audit.interceptor.AuditProducerInterceptor",
"topic.prefix": "MyTestTopic",
"query": "SELECT ID, chart_id, request_id, UpdatedDate FROM xxx.xxx WITH (NOLOCK)",
"mode": "timestamp",
"timestamp.column.name": "UpdatedDate",
"producer.audit.application.id": "HelloApplication",
"producer.audit.type": "test type",
"poll.interval.ms": "10",
"tasks.max": "1",
"batch.max.rows": "100",
"validate.non.null": "false",
"numeric.mapping":"best_fit",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://{{ip}}:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://{{ip}}:8081"
}}
As u can see in the configuration, i have added below props in connector config to trigger the custom interceptor. But I dont see any logs in Kafka-Connect related to AuditProducerInterceptor.
"producer.interceptor.classes": "com.optum.payer.common.kafka.audit.interceptor.AuditProducerInterceptor"
"producer.audit.application.id": "HelloApplication",
"producer.audit.type": "test type"
I tried adding these three config in kafka-connect config and I am able to trigger the interceptor. But I want to trigger the interceptor via JDBC source connector so that i can pass the custom props(application_id,type etc) via connector.
Please help me solve this issue

If you have allowed client overrides (enabled by default) in the Connect worker, you'll want to use producer.override prefix
From docs
Starting with 2.3.0, client configuration overrides can be configured individually per connector by using the prefixes producer.override. and consumer.override. for Kafka sources or Kafka sinks respectively.

Related

Is there any way to use MongoSourceConnector for multiple database with single kafka topic?

I am using MongoSourceConnector to connect kafka topic with mongo database collection. For single database with single kafka topic it's working fine, but is there any way that i could do a connection for multiple mongo database with single kafka topic.
If you are running kafka-connect in distributed mode then you can create a another connector config file with the above mentioned config
I am not really sure about multiple databases and a single Kafka topic but you can surely listen to multiple databases change streams and push data to topics. Since topic creation depends on the database_name.collection_name, so you will have more topics.
You can provide the Regex to listen to multiple databases in the pipeline.
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-names_.*/}},{\"ns.coll\":{\"$regex\":/^collection_name$/}}]}}]"
Here is the complete Kafka connector configuration.
Mongo to Kafka source connector
{
"name": "mongo-to-kafka-connect",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"publish.full.document.only": "true",
"tasks.max": "3",
"key.converter.schemas.enable": "false",
"topic.creation.enable": "true",
"poll.await.time.ms": 1000,
"poll.max.batch.size": 100,
"topic.prefix": "any prefix for topic name",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"connection.uri": "mongodb://<username>:<password>#ip:27017,ip:27017,ip:27017,ip:27017/?authSource=admin&replicaSet=xyz&tls=true",
"value.converter.schemas.enable": "false",
"copy.existing": "true",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 3,
"topic.creation.compacted.cleanup.policy": "compact",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"mongo.errors.log.enable": "true",
"heartbeat.interval.ms": 10000,
"pipeline": "[{\"$match\":{\"$and\":[{\"ns.db\":{\"$regex\":/^database-names_.*/}},{\"ns.coll\":{\"$regex\":/^collection_name$/}}]}}]"
}
}
You can get more details from official docs.
https://www.mongodb.com/docs/kafka-connector/current/source-connector/
https://docs.confluent.io/platform/current/connect/index.html

Kafka Connect: streaming changes from Postgres to topics using debezium

I'm pretty new to Kafka and Kafka Connect world. I am trying to implement CDC using Kafka (on MSK), Kafka Connect (using the Debezium connector for PostgreSQL) and an RDS Postgres instance. Kafka Connect runs in a K8 pod in our cluster deployed in AWS.
Before diving into the details of the configuration used, I'll try to summarise the problem:
Once the connector starts, it sends messages to the topic as expected (snahpshot)
Once we make any change to a table (Create, Update, Delete), no messages are sent to the topic. We would expect to see messages about the changes made to the table.
My connector config looks like:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "root",
"database.dbname": "insights",
"slot.name": "cdc_organization",
"tasks.max": "1",
"column.blacklist": "password, access_key, reset_token",
"database.server.name": "insights",
"database.port": "5432",
"plugin.name": "wal2json_rds_streaming",
"schema.whitelist": "public",
"table.whitelist": "public.kafka_connect_cdc_test",
"key.converter.schemas.enable": "false",
"database.hostname": "de-test-sre-12373.cbplqnioxomr.eu-west-1.rds.amazonaws.com",
"database.password": "MYSECRETPWD",
"value.converter.schemas.enable": "false",
"name": "source-postgres",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.mode": "initial"
}
We have tried different configurations for the plugin.name property: wal2josn, wal2json_streaming and wal2json_rds_streaming.
There's no problem of connection between the connector and the DB as we already saw messages flowing through as soon as the connector starts.
Is there a configuration issue with the connector described above that prevent us to see messages related to new changes appearing in the topic?
Thanks
Your connector config looks a bit confusing. I'm pretty new to Kafka as well so I don't really know the issue but this is my connector config that works for me.
{
"name":"<connector_name>",
"config": {
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.server.name":"<server>",
"database.port":"5432",
"database.hostname":"<host>",
"database.user":"<user>",
"database.dbname":"<password>",
"tasks.max":"1",
"database.history.kafka.boostrap.servers":"localhost:9092",
"database.history.kafka.topic":"<kafka_topic_name>",
"plugin.name":"pgoutput",
"include.schema.changes":"true"
}
}
If this configuration didn't work aswell, try look up the log console; sometimes the error isn't the last write of the console

How to pass data when meets a condition from MongoDB to a Kafka topic with a source connector and a pipeline property?

I'm working in a source connector to watch for changes in a Mongo's collection and take them to a Kafka topic. This works nicely till I add the requirement to just put them in Kafka topic if meets a specific condition (name=Kathe). It means I need to put data in a topic just if the update process changes the name to Kathe.
My connector's config looks like:
{
"connection.uri":"xxxxxx",
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"topic.prefix": "qu",
"database":"sample_analytics",
"collection":"customers",
"copy.existing": "true",
"pipeline":"[{\"$match\":{\"name\":\"Kathe\"}}]",
"publish.full.document.only": "true",
"flush.timeout.ms":"15000"
}
I also have tried with
"pipeline":"[{\"$match\":{\"name\":{ \"$eq\":\"Kathe\"}}}]"
But it is not producing messages, when the condition meets.
Am I making a mistake?

How to use the Kafka Connect JDBC to source PostgreSQL with multiple schemas that contain tables with the same name?

I need to source data from a PostgreSQL database with ~2000 schemas. All schemas contain the same tables (it is a multi-tenant application).
The connector is configured as following:
{
"name": "postgres-source",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"timestamp.column.name": "updated",
"incrementing.column.name": "id",
"connection.password": "********",
"tasks.max": "1",
"mode": "timestamp+incrementing",
"topic.prefix": "postgres-source-",
"connection.user": "*********",
"poll.interval.ms": "3600000",
"numeric.mapping": "best_fit",
"connection.url": "jdbc:postgresql://*******:5432/*******",
"table.whitelist": "table1",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"false"
}
With this configuration, I get this error:
"The connector uses the unqualified table name as the topic name and has detected duplicate unqualified table names. This could lead to mixed data types in the topic and downstream processing errors. To prevent such processing errors, the JDBC Source connector fails to start when it detects duplicate table name configurations"
Apparently the connector does not want to publish data from multiple tables with the same name to a single topic.
This does not matter to me, it could go to a single topic or multiple topics (one for each schema).
As an additional info, if I add:
"schema.pattern": "schema1"
to the config, the connector works and the data from the specified schema and table is copied.
Is there a way to copy multiple schemas that contain tables with the same name?
Thank you

Debezium-contains no connector type

I am trying to use Debezium to connect to a mysql database on my local machine.
Trying with the following command to call kafka:
sudo kafka/bin/connect-standalone.shsh kafka/config/connect-standalone.properties kafka/config/connector.properties
Here is the config in connector.properties:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "127.0.0.1",
"tasks.max": "1",
"database.port": "3306",
"database.user": "debezium",
"database.password": "Password#123",
"database.server.id": "1",
"database.server.name": "fullfillment",
"database.whitelist": "inventory",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "dbhistory.fullfillment",
"include.schema.changes": "true",
"type": "null"
}
}
Getting the following error while running the mentioned command:
[2018-12-07 10:58:17,102] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:113)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {"config"={, "type"="null", "database.user"="debezium",, "database.port"="3306",, "include.schema.changes"="true",, "database.server.name"="fullfillment",, "connector.class"="io.debezium.connector.mysql.MySqlConnector",, "tasks.max"="1",, "database.history.kafka.topic"="dbhistory.fullfillment",, "database.server.id"="1",, "database.whitelist"="inventory",, "name"="inventory-connector",, "database.hostname"="127.0.0.1",, {=, "database.password"="Password#123",, }=, "database.history.kafka.bootstrap.servers"="localhost:9092",} contains no connector type
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:110)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector config {"config"={, "type"="null", "database.user"="debezium",, "database.port"="3306",, "include.schema.changes"="true",, "database.server.name"="fullfillment",, "connector.class"="io.debezium.connector.mysql.MySqlConnector",, "tasks.max"="1",, "database.history.kafka.topic"="dbhistory.fullfillment",, "database.server.id"="1",, "database.whitelist"="inventory",, "name"="inventory-connector",, "database.hostname"="127.0.0.1",, {=, "database.password"="Password#123",, }=, "database.history.kafka.bootstrap.servers"="localhost:9092",} contains no connector type
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:259)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:189)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:107)
Any help will be highly appreciated.
connector.properites for standalone mode requires property file format. So please take config section and rewrite it like
connector.class=io.debezium.connector.mysql.MySqlConnector
.
.
.
You have a JSON file, not a property file.
This is meant to be used with connect-distributed mode. And POSTed via HTTP to the Kafka Connect REST API, not as a CLI argument.
For connect-standalone, you provide both the Connect worker properties and the connector properties files at the same time, as Java .properties files.
connector.properties files format should be yml,not json.