Debezium fails when using it with Kerberos - apache-kafka

I'm trying to configure the Oracle Connector (debezium 1.9) with a Kerberized Kafka cluster (from Cloudera Private CDP) and have some weird troubles.
I first tried to configure Debezium with a PLAINTEXT security protocol (using an Apache Kafka 3.1.0) to validate everything was fine (Oracle, Connect config... ) and everything runs perfectly.
Next, I deployed the same connector, using the same Oracle DB instance on my On Premises Cloudera CDP platform, which is kerberized, and updating the connector config by adding :
"database.history.kafka.topic": "schema-changes.oraclecdc",
"database.history.consumer.sasl.jaas.config": "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab=\"/tmp/debezium.keytab\" principal=\"debezium#MYREALM\";",
"database.history.consumer.security.protocol": "SASL_PLAINTEXT",
"database.history.consumer.sasl.kerberos.service.name": "kafka",
"database.history.producer.sasl.jaas.config": "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab=\"/tmp/debezium.keytab\" principal=\"debezium#MYREALM\";",
"database.history.producer.security.protocol": "SASL_PLAINTEXT",
"database.history.producer.sasl.kerberos.service.name": "kafka"
In this case, the topic schema-changes.oraclecdc is automatically created when the connector starts (auto creation enabled) and the DDL definitions are correctly reported. But that's it. So I suppose the JAAS config is OK and the producer config is correctly set as the connector has been able to create the topic and publish something in it.
But I can't get my updates/inserts/deletes being published. And the corresponding topics are not created. Instead kafka connect reports me the producer is disconnected, as soon as the connector starts.
Activating the TRACE level into kafka-connect, I can check that the updates/inserts/... are correctly detected by debezium from the redo log.
The fact the producer is being disconnected makes me think there's a problem of authentication. But if I understand the debezium documentation, the producer config is the same for either schema topic and tables cdc topics. So I can't understand why the "schema changes topic" is created with messages published, but the "CDC mechanism" doesn't create topics...
What am I missing here?

Related

Apache NiFi to/from Confluent Cloud

I'm trying to publish custom db data (derived from Microsoft SQL CDC tables, having a join on other tables -> how it's arrived is for a different day though) to Kafka cluster.
I'm able to publish and consume messages from Apache NiFi -to/from- Apache Kafka.
But I'm unable to do publish messages from Apache NiFi -to- Kafka in Confluent Cloud.
Is it possible to publish/consume messages from Apache NiFi (server-A) to Confluent Cloud using the API Key that's created there?
If yes, what is the corresponding property in Apache NiFi's PublishKafkaRecord's processor and ConsumeKafkaRecord processor?
If no, please share any other idea to overcome the constraint.
Yes, NiFi uses the plain Kafka Clients Java API; it can work with any Kafka environment.
Confluent Cloud gives you all the client properties you will need, such as SASL configs for username + password.
Using PublishKafka_2_6 as an example,
Obviously, "Kafka Brokers" is the Bootstrap Brokers, then you have "Username" and "Password" settings for the SASL connection.
Set "Security Protocol" to SASL_SSL and "SASL Mechanism" to PLAIN.
"Delivery Guarantee" will set producer acks.
For any extra properties, use the + button above the properties for setting "Dynamic Properties" (refer above NiFi docs)
share any other idea to overcome the constraint
Use Debezium (Kafka Connect) instead.

Confluent Cloud Kafka - Audit Log Cluster : Sink Connector

For Kafka cluster hosted in Confluent Cloud, there is an Audit Log cluster that gets created. It seems to be possible to hook a Sink connector to this cluster and drain the events out from "confluent-audit-log-events" topic.
However, I am running into the below error when I run the connector to do the same.
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [connect-offsets]
In my connect-distributed.properties file, I have the settings as :
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
offset.storage.partitions=3
What extra permission/s needs to be granted so that the connector can create the required topics in the cluster? The key/secret being used in the connect-distributed.properties files is a valid key/secret that is associated to the service account for this cluster.
Also, when I run the consumer in the console using the same key (as above) , I am able to read the audit log events just fine.
It's confirmed that this feature (hooking up a connector to the Audit Log cluster) is not supported at the moment in Confluent Cloud. This feature may be available later this year at some point.

Kafka-MongoDB Debezium Connector : distributed mode

I am working on debezium mongodb source connector. Can I run connector in local machine in distributed mode by giving kafka bootstrap server address as remote machine (deployed in Kubernetes) and remote MongoDB url?
I tried this and I see connector starts successfully, no errors, just few warnings but no data is flowing from mongodb.
Using below command to run connector
./bin/connect-distributed ./etc/schema-registry/connect-avro-distributed.properties ./etc/kafka/connect-mongodb-source.properties
If not how else can I achieve this, I donot want to install local kafka or mondoDB as most of the tutorial suggest. I want to use our test servers for this.
Followed below tutorial for this
: https://medium.com/tech-that-works/cloud-kafka-connector-for-mongodb-source-8b525b779772
Below are more details for the issue
Connector works fine, I see below lines at the end of connector log
INFO [Worker clientId=connect-1, groupId=connect-cluster] Starting connectors and tasks using config offset -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1000)
] INFO [Worker clientId=connect-1, groupId=connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1021)
I have also defined MongoDB config in /etc/kafka/connect-mongodb-source.properties as follows
name=mongodb-source-connector
connector.class=io.debezium.connector.mongodb.MongoDbConnector
mongodb.hosts=/remoteserveraddress:27017
mongodb.name=mongo_conn
initial.sync.max.threads=1
tasks.max=1
But Data is not flowing between MongoDB and Kafka. I have also posted saperate question for this Kafka-MongoDB Debezium Connector : distributed mode
Any pointers are appriciated
connect-distributed only accepts a single property file.
You must use the REST API to configure Kafka Connect in Distributed mode.
https://docs.confluent.io/current/connect/references/restapi.html
Note: by default, the consumer will read the latest data off the topic, not existing data.
You would add this to the connect-avro-distributed.properties to fix it
consumer.auto.offset.reset=earliest

Configuring Kafka connect Postgress Debezium CDC plugin

I am trying to use kafka connect to read changes in postgress DB.
I have Kafka running on my local system and i want to use the Kafka connect API in standalone mode to read the postgress server DB changes.
connect-standalone.sh connect-standalone.properties dbezium.properties
i would appreciate if someone can help me with setting up configuration properties for CDC postgress debezium connector
https://www.confluent.io/connector/debezium-postgresql-cdc-connector/
I am following the below to construct the properties
https://debezium.io/docs/connectors/postgresql/#how-the-postgresql-connector-works
The name of the Kafka topics takes by default the form
serverName.schemaName.tableName, where serverName is the logical name
of the connector as specified with the database.server.name
configuration property
and here is what i have come up with for dbezium.properties
name=cdc_demo
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=wal2json
slot.name=debezium
slot.drop_on_stop=false
database.hostname=localhost
database.port=5432
database.user=postgress
database.password=postgress
database.dbname=test
time.precision.mode=adaptive
database.sslmode=disable
Lets say i create a PG schema name as demo and table name as suppliers
So i need to create a topic with name as test.demo.suppliers so that this plugin can push the data to?
Also can someone suggest a docker image which has the postgress server + with suitable replication plugin such as wal2json etc? i am having hard time configuring postgress and the CDC plugin myself.
Check out the tutorial with associated Docker Compose and sample config.
The topic you've come up with sounds correct, but if you have your Kafka broker configured to auto-create topics (which is the default behaviour IIRC) then it will get created for you and you don't need to pre-create it.

Kafka Connect writes data to non-existing topic

Does Kafka Connect creates the topic on the fly if it doesn't exist (but provided as a destination) or fails to copy messages to it?
I need to create such topics on the fly or programmatically (Java API) at least, not manually using scripts.
I searched this info, but it seems topics have to be already created before migration
Kafka Connect doesn't really control this.
There's a setting in Kafka that enables/disables automatic topic creation.
If this is turned on - Kafka Connect will create its' own topics, if not - you have to create them yourselves.
By default, Kafka will not create a new topic when a consumer subscribes to a non-existing topic. you should enable the auto.create.topics.enable=truein your Kafka server configuration file which enables auto-creation of topics on the server.
Once you turn on this feature Kafka will automatically create topics on the fly. When an application tries to connect to a non-existing topic, Kafka will create that topic automatically.