Kafka Connect ignoring the Subject Strategies specified - apache-kafka

I want to publish multiple table data on to same Kafka topic using the below connector config, but I am seeing below exception
Exception
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
The connector seems to ignore the subject strategy properties set and keeps using the old ${topic}-key and ${topic}-value subjects.
[2019-04-25 22:43:45,590] INFO AvroConverterConfig values:
schema.registry.url = [http://schema-registry:8081]
basic.auth.user.info = [hidden]
auto.register.schemas = true
max.schemas.per.subject = 1000
basic.auth.credentials.source = URL
schema.registry.basic.auth.user.info = [hidden]
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
Connector configuration
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "two-in-one-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "xxxxxxx",
"database.port": "3306",
"database.user": "xxxxxxx",
"database.password": "xxxxxxxxx",
"database.server.id": "18405457",
"database.server.name": "xxxxxxxxxx",
"table.whitelist": "customers,phone_book",
"database.history.kafka.bootstrap.servers": "broker:9092",
"database.history.kafka.topic": "dbhistory.customer",
"transforms": "dropPrefix",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"(.*)",
"transforms.dropPrefix.replacement":"customer",
"key.converter.key.subject.name.strategy": "io.confluent.kafka.serializers.subject.TopicRecordNameStrategy",
"value.converter.value.subject.name.strategy": "io.confluent.kafka.serializers.subject.TopicRecordNameStrategy"
}
}'

Try setting strategy classes to below parameters in you connector configuration (JSON) file instead of "key.converter.key.subject.name.strategy" and "value.converter.value.subject.name.strategy"
"key.subject.name.strategy"
"value.subject.name.strategy"

Related

Debezium heartbeat table not updating

There is already a question Debezium Heartbeat Action not firing but it did not resolve my issue.
Here is my source connector config for postgres. It is generating heatbeat events after every 5 seconds. I have confirmed that by checking kafka topic but the issue is that it is not updating the row in the database heartbeat table. Any suggestions?
{
"name": "postgres-localdb-source-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "postgres",
"database.port": "5432",
"slot.name":"debezium",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "dbserver2",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.dbserver2",
"schema.include": "inventory",
"tables.include": "customers,heartbeat",
"publication.autocreate.mode" : "filtered",
"max.batch.size":"20480",
"max.queue.size":"81920",
"poll.interval.ms":"100",
"heartbeat.interval.ms": "5000",
"heartbeat.action.query" :"INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON CONFLICT(id) DO UPDATE SET ts=EXCLUDED.ts;"
} }
Try to share your heartbeat table with a DDL. Your heartbeat table have a primary key? Debezium only track updates and deletes if table have PK defined. Also try to share your debezium version because this propertiers change from version to version.
Try a UPDATE without WHERE to test if the problem is in your query. Check if your heartbeat in your schema public or inventory and add as a prefix in your query.
UPDATE inventory.heartbeat SET ts = NOW();
On your tables.include add for each table a prefix with schema.
"tables.include": "inventory.customers,inventory.heartbeat",
On tables.include try to change to tables.include.list. Source: https://debezium.io/documentation/reference/1.6/connectors/mysql.html#:~:text=connector%20configuration%20property.-,table.include.list,-empty%20string
"tables.include.list": "inventory.customers,inventory.heartbeat",

How to migrate consumer offsets using MirrorMaker 2.0?

With Kafka 2.7.0, I am using MirroMaker 2.0 as a Kafka-connect connector to replicate all the topics from the primary Kafka cluster to the backup cluster.
All the topics are being replicated perfectly except __consumer_offsets. Below are the connect configurations:
{
"name": "test-connector",
"config": {
"connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
"topics.blacklist": "some-random-topic",
"replication.policy.separator": "",
"source.cluster.alias": "",
"target.cluster.alias": "",
"exclude.internal.topics":"false",
"tasks.max": "10",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"source.cluster.bootstrap.servers": "xx.xx.xxx.xx:9094",
"target.cluster.bootstrap.servers": "yy.yy.yyy.yy:9094",
"topics": "test-topic-from-primary,primary-kafka-connect-offset,primary-kafka-connect-config,primary-kafka-connect-status,__consumer_offsets"
}
}
In a similar question here, the accepted answer says the following:
Add this in your consumer.config:
exclude.internal.topics=false
And add this in your producer.config:
client.id=__admin_client
Where do I add these in my configuration?
Here the Connector Configuration Properties does not have such property named client.id, I have set the value of exclude.internal.topics to false though.
Is there something I am missing here?
UPDATE
I learned that Kafka 2.7 and above supports automated consumer offset sync using MirrorCheckpointTask as mentioned here.
I have created a connector for this having the below configurations:
{
"name": "mirror-checkpoint-connector",
"config": {
"connector.class": "org.apache.kafka.connect.mirror.MirrorCheckpointConnector",
"sync.group.offsets.enabled": "true",
"source.cluster.alias": "",
"target.cluster.alias": "",
"exclude.internal.topics":"false",
"tasks.max": "10",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"source.cluster.bootstrap.servers": "xx.xx.xxx.xx:9094",
"target.cluster.bootstrap.servers": "yy.yy.yyy.yy:9094",
"topics": "__consumer_offsets"
}
}
Still no help.
Is this the correct approach? Is there something needed?
you do not want to replicate connsumer_offsets. The offsets from the src to the destination cluster will not be the same for various reasons.
MirrorMaker2 provides the ability to do offset translation. It will populate the destination cluster with a translated offset generated from the src cluster. https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0
__consumer_offsets is ignored by default
topics.exclude = [.*[\-\.]internal, .*\.replica, __.*]
you'll need to override this config

How to disable JSON schema in Kafka Source Connector (e.g. Debezium)

I followed Debezium tutorial (https://github.com/debezium/debezium-examples/tree/master/tutorial#using-postgres) and all received CDC data from Postgres are sent to Kafka topic in JSON format with schema - how to get rid of schema?
Here is config of connector (launched in Docker container)
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "dbserver1",
"schema.include": "inventory"
}
}
The JSON schema is still in message.
I managed to get rid of it only when launched Docker container with following environment variables:
- CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE=false
- CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE=false
Why I cannot achieve exactly the same from connector configuration?
Example of Kafka message with schema:
{"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"}],"optional":false,"name":"dbserver1.inventory.customers.Key"},"payload":{"id":1001}} {"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"}],"optional":true,"name":"dbserver1.inventory.customers.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"}],"optional":true,"name":"dbserver1.inventory.customers.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":false,"field":"schema"},{"type":"string","optional":false,"field":"table"},{"type":"int64","optional":true,"field":"txId"},{"type":"int64","optional":true,"field":"lsn"},{"type":"int64","optional":true,"field":"xmin"}],"optional":false,"name":"io.debezium.connector.postgresql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"dbserver1.inventory.customers.Envelope"},"payload":{"before":null,"after":{"id":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas#acme.com"},"source":{"version":"1.4.1.Final","connector":"postgresql","name":"dbserver1","ts_ms":1611918971029,"snapshot":"true","db":"postgres","schema":"inventory","table":"customers","txId":602,"lsn":34078720,"xmin":null},"op":"r","ts_ms":1611918971032,"transaction":null}}
Example (desired by me) w/o schema:
{"id":1001} {"before":null,"after":{"id":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas#acme.com"},"source":{"version":"1.4.1.Final","connector":"postgresql","name":"dbserver1","ts_ms":1611920304594,"snapshot":"true","db":"postgres","schema":"inventory","table":"customers","txId":597,"lsn":33809448,"xmin":null},"op":"r","ts_ms":1611920304596,"transaction":null}
Debezium container is run with following command:
docker run -it --name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses -e CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE=false -e CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE=false --link zookeeper:zookeeper --link kafka:kafka --link mysql:mysql debezium/connect:1.3
or as docker-compose
connect:
image: debezium/connect:${DEBEZIUM_VERSION}
ports:
- 8083:8083
links:
- kafka
- postgres
environment:
- BOOTSTRAP_SERVERS=kafka:9092
- GROUP_ID=1
- CONFIG_STORAGE_TOPIC=my_connect_configs
- OFFSET_STORAGE_TOPIC=my_connect_offsets
- STATUS_STORAGE_TOPIC=my_connect_statuses
- CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE=false
- CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE=false
CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE=false and CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE=false was added later by me, but without them I cannot get rid of schema.
connect docker container (Kafka connectors servers cluster - if I understood it correctly) is started without any connector.
I create it manually.
LOGs from docker-compose for connect when connector created
connect_1 | 2021-01-29 18:04:57,395 INFO || JsonConverterConfig values:
connect_1 | converter.type = key
connect_1 | decimal.format = BASE64
connect_1 | schemas.cache.size = 1000
connect_1 | schemas.enable = true
connect_1 | [org.apache.kafka.connect.json.JsonConverterConfig]
connect_1 | 2021-01-29 18:04:57,396 INFO || Set up the key converter class org.apache.kafka.connect.json.JsonConverter for task inventory-connector-0 using the worker config [org.apache.kafka.connect.runtime.Worker]
connect_1 | 2021-01-29 18:04:57,396 INFO || JsonConverterConfig values:
connect_1 | converter.type = value
connect_1 | decimal.format = BASE64
connect_1 | schemas.cache.size = 1000
connect_1 | schemas.enable = true
connect_1 | [org.apache.kafka.connect.json.JsonConverterConfig]
...
connect_1 | 2021-01-29 18:04:57,458 INFO || Starting PostgresConnectorTask with configuration: [io.debezium.connector.common.BaseSourceTask]
connect_1 | 2021-01-29 18:04:57,460 INFO || key.converter.schemas.enable = false [io.debezium.connector.common.BaseSourceTask]
connect_1 | 2021-01-29 18:04:57,460 INFO || value.converter.schemas.enable = false [io.debezium.connector.common.BaseSourceTask]
Here are get connector command output:
$ curl -i http://localhost:8083/connectors/inventory-connector
{"name":"inventory-connector","config":{"connector.class":"io.debezium.connector.postgresql.PostgresConnector",**"key.converter.schemas.enable":"false"**,"database.user":"postgres","database.dbname":"postgres","tasks.max":"1","database.hostname":"postgres","database.password":"postgres",**"value.converter.schemas.enable":"false"**,"name":"inventory-connector","database.server.name":"dbserver1","database.port":"5432","schema.include":"inventory"},"tasks":[{"connector":"inventory-connector","task":0}],"type":"source"}
I reproduced this example. As #OneCricketeer mentioned in the comment, you have to explicitly add JsonConverter:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "dbserver1",
"schema.include": "inventory"
}
}

Kafka REST API source connector with authentication header

I need to create kafka source connector for REST API with Header authentication like
curl -H "Authorization: Basic " -H "clientID: " "https:< url for source> " .
I am using apache kafka , I used connector class com.github.castorm.kafka.connect.http.HttpSourceConnector
Here is my json file for connector
{
"name": "rest_data6",
"config": {
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"connector.class": "com.github.castorm.kafka.connect.http.HttpSourceConnector",
"tasks.max": "1",
"http.request.headers": "Authorization: Basic <key1>",
"http.request.headers": "clientID: <key>",
"http.request.url": "https:<url for source ?",
"kafka.topic": "mysqltopic2"
}
}
Also I tried with "connector.class": "com.tm.kafka.connect.rest.RestSourceConnector", My joson file as below
"name": "rest_data2",
"config": {
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"connector.class": "com.tm.kafka.connect.rest.RestSourceConnector",
"rest.source.poll.interval.ms": "900",
"rest.source.method": "GET",
"rest.source.url":"URL of source ",
"tasks.max": "1",
"rest.source.headers": "Authorization: Basic <key> , clientId :<key2>",
"rest.source.topic.selector": "com.tm.kafka.connect.rest.selector.SimpleTopicSelector",
"rest.source.destination.topics": "mysql1"
}
}
But no hope . Any idea how to GET REST API data with authentication . My authentication parameter is
Authorization: Basic and Authorization: Basic .
Just for mention both the file are working with REST API without authentication , once I added authentication parameter then wither connector status is failed or It produce ":"Cannot route. Codebase/company is invalid"" message in topic.
Can any one suggest what is way to solve it
I mailed to original developer to Cástor Rodríguez. As per his solution I modified my json
Put header into a single form and it works
{
"name": "rest_data6",
"config": {
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
"connector.class": "com.github.castorm.kafka.connect.http.HttpSourceConnector",
"tasks.max": "1",
"http.request.headers": "Authorization: Basic <key1>, clientID: <key>"
"http.request.url": "https:<url for source ?",
"kafka.topic": "mysqltopic2"
}
}

Can we use System Env Variables in Postman while creating Kafka Connector(s)

We have deployed Customized Confluent Kafka Connector as statefulset in Kubernetes, which mounts secrets from Azure KeyVault. These secrets contain db username and password & are meant to be used while creating connectors via rest endpoint https://kafka.mydomain.com/connectors using Postman.
The secrets are being loaded as environment variables in container. And kubernetes-ingress-controller - path based routing is used for exposing rest endpoint.
So far, our team is unable to use the environment variables while creating connector through Postman.
Connector config:
{
"name": "TEST.CONNECTOR.SINK",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"errors.log.include.messages": "true",
"table.name.format": "AuditTransaction",
"connection.password": "iampassword", <------------ (1)
"flush.size": "3",
"tasks.max": "1",
"topics": "TEST.CONNECTOR.SOURCE-AuditTransaction",
"key.converter.schemas.enable": "false",
"connection.user": "iamuser", <------------ (2)
"value.converter.schemas.enable": "true",
"name": "TEST.CONNECTOR.SINK",
"errors.tolerance": "all",
"connection.url": "jdbc:sqlserver://testdb.database.windows.net:1433;databaseName=mytestdb01",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"insert.mode": "insert",
"errors.log.enable": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}
(1) and (2) - Here we want to use system environment variables with Values - $my_db_username=iamuser, $my_db_password=iampassword. We have tried using "$my_db_username" and "$my_db_password" there but in logs of Connector Pod, it doesn't resolve to the respective values.
Logs:
[2020-07-28 12:31:22,838] INFO Starting JDBC Sink task (io.confluent.connect.jdbc.sink.JdbcSinkTask:44)
[2020-07-28 12:31:22,839] INFO JdbcSinkConfig values:
auto.create = false
auto.evolve = false
batch.size = 3000
connection.password = [hidden]
connection.url = jdbc:sqlserver://testdb.database.windows.net:1433;databaseName=mytestdb01
connection.user = $my_db_username
db.timezone = UTC
delete.enabled = false
dialect.name =
fields.whitelist = []
insert.mode = insert
max.retries = 10
pk.fields = []
pk.mode = none
quote.sql.identifiers = ALWAYS
retry.backoff.ms = 3000
table.name.format = AuditTransaction
Is there any way to use system/container environment variables in this config, while creating connectors with Postman or something else?
Finally did it!! Using FileConfigProvider. All the needed information was here.
We just had to parametrize connect-secrets.properties according to our requirement and substitute env vars value on startup.
This doesn't allow using Env Vars via Postman. But parametrized connect-secrets.properties specifically tuned according to our need did the job and FileConfigProvider did the rest by picking values from connect-secrets.properties
Update
Found a way to implement this using env vars here.