My current testing configuration looks like so:
version: '3.7'
services:
postgres:
image: debezium/postgres
restart: always
ports:
- "5432:5432"
zookeeper:
image: debezium/zookeeper
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
kafka:
image: debezium/kafka
restart: always
ports:
- "9092:9092"
links:
- zookeeper
depends_on:
- zookeeper
environment:
- ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_GROUP_MIN_SESSION_TIMEOUT_MS=250
connect:
image: debezium/connect
restart: always
ports:
- "8083:8083"
links:
- zookeeper
- postgres
- kafka
depends_on:
- zookeeper
- postgres
- kafka
environment:
- BOOTSTRAP_SERVERS=kafka:9092
- GROUP_ID=1
- CONFIG_STORAGE_TOPIC=my_connect_configs
- OFFSET_STORAGE_TOPIC=my_connect_offsets
- STATUS_STORAGE_TOPIC=my_source_connect_statuses
I run it with docker-compose like so:
$ docker-compose up
And I see no error messages. It seems like everything is running ok. If I do docker ps, I see that all services are running.
In order to check that Kafka is running, I made Kafka producer and Kafka consumer in Python:
# producer. I run it in one console window
from kafka import KafkaProducer
from json import dumps
from time import sleep
producer = KafkaProducer(bootstrap_servers=['localhost:9092'], value_serializer=lambda x: dumps(x).encode('utf-8'))
for e in range(1000):
data = {'number' : e}
producer.send('numtest', value=data)
sleep(5)
# consumer. I run it in other colsole window
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'numtest',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group',
value_deserializer=lambda x: loads(x.decode('utf-8')))
for message in consumer:
print(message)
And it works absolutely great. I see how my producer publishes messages and I see how they are consumed in consumer window.
Now I want to make CDC work. First of all, inside Postgres container I set postgres role password to postgres:
$ su postgres
$ psql
psql> \password postgres
Enter new password: postgres
I then created a new database test:
psql> CREATE DATABASE test;
I created a table:
psql> \c test;
test=# create table mytable (id serial, name varchar(128), primary key(id));
And, finally, for my Debezium CDC stack I created a connector:
$ curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "test-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"plugin.name": "pgoutput",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "test",
"database.server.name": "postgres",
"database.whitelist": "public.mytable",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "public.some_topic"
}
}'
{"name":"test-connector","config":{"connector.class":"io.debezium.connector.postgresql.PostgresConnector","tasks.max":"1","plugin.name":"pgoutput","database.hostname":"postgres","database.port":"5432","database.user":"postgres","database.password":"postgres","database.dbname":"test","database.server.name":"postgres","database.whitelist":"public.mytable","database.history.kafka.bootstrap.servers":"localhost:9092","database.history.kafka.topic":"public.some_topic","name":"test-connector"},"tasks":[],"type":"source"}
As you can see, my connector was created without any errors. Now I expect Debezium CDC to publish all changes to Kafka topic public.some_topic. To check this, I create a new Kafka comsumer:
from kafka import KafkaConsumer
from json import loads
consumer = KafkaConsumer(
'public.some_topic',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='my-group',
value_deserializer=lambda x: loads(x.decode('utf-8')))
for message in consumer:
print(message)
The only difference with the first example, is that I'm watching public.some_topic. I then go to database console and make an insert:
test=# insert into mytable (name) values ('Tom Cat');
INSERT 0 1
test=#
So, a new value is inserted, but I see nothing is happening in consumer window. In other words, Debezium does not publish events to Kafka public.some_topic. What is wrong with that and how can I fix it?
Using your Docker Compose I see this error in the Kafka Connect worker log when the connector is created:
Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "pgoutput": No such file or directory
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2505)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2241)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:309)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:295)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:272)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:267)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationSlot(PostgresReplicationConnection.java:288)
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:126)
... 9 more
This is also mirrored in the status of the task if you use the Kafka Connect REST API to query it:
curl -s "http://localhost:8083/connectors?expand=info&expand=status" | jq '."test-connector".status'
{
"name": "test-connector",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.16.5:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "192.168.16.5:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: ERROR: could not access file \"pgoutput\": No such file or directory\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:129)\n\tat io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:208)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.postgresql.util.PSQLException: ERROR: could not access file \"pgoutput\": No such file or directory\n\tat org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2505)\n\tat org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2241)\n\tat org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)\n\tat org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)\n\tat org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)\n\tat org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:309)\n\tat org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:295)\n\tat org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:272)\n\tat org.postgresql.jdbc.PgStatement.execute(PgStatement.java:267)\n\tat io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationSlot(PostgresReplicationConnection.java:288)\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:126)\n\t... 9 more\n"
}
],
"type": "source"
The version of Postgres that you're running is
postgres=# SHOW server_version;
server_version
----------------
9.6.16
The pgoutput is only available >= version 10.
I changed your Docker Compose to use version 10:
image: debezium/postgres:10
After bouncing the stack for a clean start and following your instructions, I get a connector that's running:
curl -s "http://localhost:8083/connectors?expand=info&expand=status" | \
jq '. | to_entries[] | [ .value.info.type, .key, .value.status.connector.state,.value.status.tasks[].state,.value.info.config."connector.class"]|join(":|:")' | \
column -s : -t| sed 's/\"//g'| sort
source | test-connector | RUNNING | RUNNING | io.debezium.connector.postgresql.PostgresConnector
and data in the Kafka topic:
$ docker exec kafkacat kafkacat -b kafka:9092 -t postgres.public.mytable -C
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"}],"optional":true,"name":"postgres.public.mytable.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"}],"optional":true,"name":"postgres.public.mytable.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":false,"field":"schema"},{"type":"string","optional":false,"field":"table"},{"type":"int64","optional":true,"field":"txId"},{"type":"int64","optional":true,"field":"lsn"},{"type":"int64","optional":true,"field":"xmin"}],"optional":false,"name":"io.debezium.connector.postgresql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"}],"optional":false,"name":"postgres.public.mytable.Envelope"},"payload":{"before":null,"after":{"id":1,"name":"Tom Cat"},"source":{"version":"1.0.0.Final","connector":"postgresql","name":"postgres","ts_ms":1579172192292,"snapshot":"false","db":"test","schema":"public","table":"mytable","txId":561,"lsn":24485520,"xmin":null},"op":"c","ts_ms":1579172192347}}% Reached end of topic postgres.public.mytable [0] at offset 1
I added kafkacat into your Docker Compose with:
kafkacat:
image: edenhill/kafkacat:1.5.0
container_name: kafkacat
entrypoint:
- /bin/sh
- -c
- |
while [ 1 -eq 1 ];do sleep 60;done
Edit: retaining previous answer as it's still useful & relevant:
Debezium will write message to a topic based on the name of the table. In your example this would be postgres.test.mytable.
This is why kafkacat is useful, because you can run
kafkacat -b broker:9092 -L
to see a list of all your topics and partitions. Once you've got the topic
kafkacat -b broker:9092 -t postgres.test.mytable -C
to read from it.
Check out details on kafkacat including how to run it with Docker
There's also a demo of it all in action with Docker Compose here
Related
A Kafka topic for my table is not created when using the debezium/connect Docker image. Here's how I'm starting the container:
docker run -it --rm --name debezium -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my-connect-configs \
-e OFFSET_STORAGE_TOPIC=my-connect-offsets -e BOOTSTRAP_SERVERS=192.168.56.1:9092 \
-e CONNECT_NAME=my-connector -e CONNECT_CONNECTOR_CLASS=io.debezium.connector.postgresql.PostgresConnector \
-e CONNECT_TOPIC_PREFIX=my-prefix -e CONNECT_DATABASE_HOSTNAME=host.docker.internal -e CONNECT_DATABASE_PORT=5432 \
-e CONNECT_DATABASE_USER=postgres -e CONNECT_DATABASE_PASSWORD=root -e DATABASE_SERVER_NAME=mydb \
-e CONNECT_DATABASE_DBNAME=mydb -e CONNECT_TABLE_INCLUDE_LIST=myschema.my_table -e CONNECT_PLUGIN_NAME=pgoutput \
debezium/connect
I've tried using CONNECT__ instead of CONNECT_, but I get the same results. A topic for the table is not created if I use the API:
curl -H 'Content-Type: application/json' 127.0.0.1:8083/connectors --data '
{
"name": "prism",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"topic.prefix": my-connector",
"database.hostname": "host.docker.internal",
"database.port": "5432",
"database.user": "postgres",
"database.password": "root",
"database.server.name": "mydb",
"database.dbname" : "mydb",
"table.include.list": "myschema.my_table",
"plugin.name": "pgoutput"
}
}'
The topics my-connect-configs and my-connect-offsets, specified by CONFIG_STORAGE_TOPIC and OFFSET_STORAGE_TOPIC, are created.
http://localhost:8083/connectors/my-connector/status shows this:
{"name":"my-connector","connector":{"state":"RUNNING","worker_id":"172.17.0.3:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"172.17.0.3:8083"}],"type":"source"}
I was able to create a topic when using bin/connect-standalone.sh instead of the Docker image as per this question.
Automatic topic creation is enabled and I don't see any errors/warnings in the log.
Check kafka connect container logs, what msg it shows when it tries to insert the data to kafka cluster.
You need to enable auto topic creation in kafka broker config, checkout this doc
Make sure that table exist in the db and "table.include.list": "myschema.my_table", is correct. For experimentation you can remove this config temporary.
You can use UI platform created by redpanda team to manage topic, broker and kafka-connect config - here
The issue was that the underlying table didn't have any data, and, therefore, the topic was not created. The topic is created either if the table has data when the connector is started or if rows are added while the connector is running.
I have a postgres running on a pi within a docker container.
Debezium connector is running on my local machine (same as zookeeper and kafka).
The kafka topic is up and running and I can see the changes which I make in the postgres going into the kafka topic. So far so good.
Now I started another docker container locally which is not from the same docker compose file as my other containers. THIS is supposed to be my REPLICA DATABASE.
I copied the confluentinc-kafka-connect-jdbc-10.5.0 into the docker container.
sudo docker cp confluentinc-kafka-connect-jdbc-10.5.0 CONTAINER_ID:/kafka/connect/
Changed the user and usergroup and the restated the container.
docker exec -it --user root <container-id> /bin/bash
chown -R <username>:<groupname> <folder/file>
Now I created the jdbc-sink connector.
curl --location --request POST 'http://localhost:8083/connectors/' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "piserver.public.customers",
"connection.url": "jdbc:postgresql:192.168.128.2:5432/postgres",
"connection.user": "postgres",
"connection.password": "postgres",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"auto.create": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value"
}
}
'
I get back 201 created.
The error I get is after a few seconds of running:
curl --location --request GET 'localhost:8083/connectors/jdbc-sink/status' \
--data-raw ''
ERROR trace
{
"id": 0,
"state": "FAILED",
"worker_id": "192.168.112.4:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:611)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.\n\tat io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:59)\n\tat io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:64)\n\tat io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)\n\t... 10 more\nCaused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:319)\n\tat org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)\n\tat org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)\n\tat org.postgresql.Driver.makeConnection(Driver.java:400)\n\tat org.postgresql.Driver.connect(Driver.java:259)\n\tat java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)\n\tat java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)\n\tat io.confluent.connect.jdbc.dialect.GenericDatabaseDialect.getConnection(GenericDatabaseDialect.java:250)\n\tat io.confluent.connect.jdbc.dialect.PostgreSqlDatabaseDialect.getConnection(PostgreSqlDatabaseDialect.java:103)\n\tat io.confluent.connect.jdbc.util.CachedConnectionProvider.newConnection(CachedConnectionProvider.java:80)\n\tat io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:52)\n\t... 13 more\nCaused by: java.net.ConnectException: Connection refused (Connection refused)\n\tat java.base/java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)\n\tat java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)\n\tat java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)\n\tat java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.base/java.net.Socket.connect(Socket.java:609)\n\tat org.postgresql.core.PGStream.createSocket(PGStream.java:241)\n\tat org.postgresql.core.PGStream.<init>(PGStream.java:98)\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:109)\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:235)\n\t... 23 more\n"
}
Short:
Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
The hosts I tried in my config:
"connection.url": "jdbc:postgresql:192.168.128.2:5432/postgres" // got this IP from docker inspect POSTGRES_CONTAINER
"connection.url": "jdbc:postgresql:host.docker.internal:5432/postgres"
"connection.url": "jdbc:postgresql:localhost:5432/postgres"
None of these worked
Always got the same error with cannot access localhost:5432.
Also tried connecting the docker postgres container(replica) to my docker-compose network.
Any thoughts on this thanks.
Small resume.
POSTGRES(on PI)->DEBEZIUM Connector(locally)-->KAFKA-> JDBC-SINK from within KAFKA -> POSTGRES( will be replica, runs locally)
Dont use IP addresses between containers, and don't use localhost within containers to try to reach other containers - https://docs.docker.com/network/bridge/
Ideally, you'd use Docker Compose to start all services, otherwise you need to create the network bridge yourself
docker network create database-bridge
docker run --network=database-bridge --name=postgres ...
docker run --network=database-bridge ... # repeat for zookeeper, kafka, and debezium
Or look at the networks that compose created, and attach the new container to that, since you say
started another docker container locally which is not from the same docker compose file
docker network ls # look for a name that matches the folder where you ran docker-compose
docker run --network=<name> ... jdbc-connector
Then use jdbc:postgresql://postgres:5432/postgres to connect to that container by its hostname.
If the JDBC Connector is running with connect-distributed.sh and not Docker, only then can you use localhost:5432, but you need a port mapping from the Postgres container to the host.
Im trying to setup a new ElasticSearchSink-job on our KafkaConnect cluster. The cluster has been working smoothly for a couple of months with a SASL-SSL secured connection to Kafka and HTTPS to an Elastic-instance on host A. The KC-Cluster normally run in Kubernetes but for testing purposes I also run it locally using docker(image based on Confluent's KC-image v6.0.0), the Kafka resides in a test-environment and the job is started using REST-calls.
The docker-composed-file used for running it localy looks like this
version: '3.7'
services:
connect:
build:
dockerfile: Dockerfile.local
context: ./
container_name: kafka-connect
ports:
- "8083:8083"
environment:
KAFKA_OPTS: -Djava.security.krb5.conf=/<path-to>/secrets/krb5.conf
-Djava.security.auth.login.config=/<path-to>/rest-basicauth-jaas.conf
CONNECT_BOOTSTRAP_SERVERS: <KAFKA-INSTANCE-1>:2181,<KAFKA-INSTANCE-2>:2181,<KAFKA-INSTANCE-3>:2181
CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect
CONNECT_REST_PORT: 8083
CONNECT_REST_EXTENSION_CLASSES: org.apache.kafka.connect.rest.basic.auth.extension.BasicAuthSecurityRestExtension
CONNECT_GROUP_ID: <kc-group>
CONNECT_CONFIG_STORAGE_TOPIC: service-assurance.test.internal.connect.configs
CONNECT_OFFSET_STORAGE_TOPIC: service-assurance.test.internal.connect.offsets
CONNECT_STATUS_STORAGE_TOPIC: service-assurance.test.internal.connect.status
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.converters.IntegerConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_ZOOKEEPER_CONNECT: <KAFKA-INSTANCE-1>:2181,<KAFKA-INSTANCE-2>:2181,<KAFKA-INSTANCE-3>:2181
CONNECT_SECURITY_PROTOCOL: SASL_SSL
CONNECT_SASL_KERBEROS_SERVICE_NAME: "kafka"
CONNECT_SASL_JAAS_CONFIG: com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/<path-to>/kafka-connect.keytab" \
principal="<AD-USER>";
CONNECT_SASL_MECHANISM: GSSAPI
CONNECT_SSL_TRUSTSTORE_LOCATION: "/<path-to>/truststore.jks"
CONNECT_SSL_TRUSTSTORE_PASSWORD: <pwd>
CONNECT_CONSUMER_SECURITY_PROTOCOL: SASL_SSL
CONNECT_CONSUMER_SASL_KERBEROS_SERVICE_NAME: "kafka"
CONNECT_CONSUMER_SASL_JAAS_CONFIG: com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/<path-to>/kafka-connect.keytab" \
principal="<AD-USER>";
CONNECT_CONSUMER_SASL_MECHANISM: GSSAPI
CONNECT_CONSUMER_SSL_TRUSTSTORE_LOCATION: "/<path-to>/truststore.jks"
CONNECT_CONSUMER_SSL_TRUSTSTORE_PASSWORD: <pwd>
CONNECT_PLUGIN_PATH: "/usr/share/java,/etc/kafka-connect/jars"
With a similar kuberneted-configuration.
The connector is started using something like:
curl -X POST -H "Content-Type: application/json" --data '{
"name": "connector-name",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": 2,
"batch.size": 200,
"max.buffered.records": 1500,
"flush.timeout.ms": 120000,
"topics": "topic.connector",
"auto.create.indices.at.start": false,
"key.ignore": true,
"value.converter.schemas.enable": false,
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"schema.ignore": true,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"behavior.on.malformed.documents" : "ignore",
"behavior.on.null.values": "ignore",
"connection.url": "https://<elastic-host>",
"connection.username": "<user>",
"connection.password": "<pwd>",
"type.name": "_doc"
}
}' <host>/connectors/
Now, Ive been tasked to setup yet another connector, this time hosted on host B. The problem that I am experiencing is the infamous:
sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
I have modified the working truststore to include both the CA-Root-certificate for host B aswell. I belive the truststore is working as I am able to use it from a java code-snippet (actually found on a Atlassian page, SSLPoke.class) to connect to both A and B succesfully.
The connectors connecting to host A still work with the newly updated truststore but not the connector connecting to host B.
I have scanned to internet for clues on how to solve this and come across suggestions to explicitly add:
"elastic.https.ssl.truststore.location": "/<pathto>/truststore.jks",
"elastic.https.ssl.truststore.password": "<pwd>",
To the connector-configuration. Some other page suggested adding the truststore to the KC-configuration KAFKA_OPTS as such:
KAFKA_OPTS: -Djava.security.krb5.conf=/<path-to>/secrets/krb5.conf
-Djava.security.auth.login.config=/<path-to>/rest-basicauth-jaas.conf
-Djavax.net.ssl.trustStore=/<path-to>/truststore.jks
Following these suggestions I can actually get the connector connecting to host B to start succefully. But now comes the anoying part. With adding the extra param to KAFKA_OPTS my old connectors connecting to A stops working!! - with the exact same error! So now I have a case of either having connectors connecting to A OR connectors connecting to B working, but not at the same time.
Please, If anyone could give me some pointers or ideas on how to fix this it would be much appreciated, cause this is driving me nuts.
I'm new to kafka, I'm trying to use the debezium postgres connector.
but even using postgres version 11 with the standard plugin I get this error:
org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: ERROR: could not access file "decoderbufs": No such file or directory
To run kafka / debezium I'm using the image of the fast-data-dev docker as you can see below
# this is our kafka cluster.
kafka-cluster:
image: landoop/fast-data-dev:latest
environment:
ADV_HOST: 127.0.0.1 # Change to 192.168.99.100 if using Docker Toolbox
RUNTESTS: 0 # Disable Running tests so the cluster starts faster
ports:
- 2181:2181 # Zookeeper
- 3030:3030 # Landoop UI
- 8081-8083:8081-8083 # REST Proxy, Schema Registry, Kafka Connect ports
- 9581-9585:9581-9585 # JMX Ports
- 9092:9092 # Kafka Broker
after running i can open my localhost: 3030 to choose the debezium connector, i configured it this way:
and I'm using aws postgres rds in version 11.5
I saw several tutorials using wal2json, but I didn't find it in rds.extensions and didn't see a way to add it. Anyway, as of version 10, debezium can use pgoutput and apparently no configuration is necessary.
the rds.logical_replication property is set to 1
when executing SHOW wal_level; in the terminal I see that it returns logical
in the documentation says that you have to set max_wal_senders = 1 and max_replication_slots = 1
put in the rds the minimum is 5, so I left the default that is 10
I did not define the role REPLICATION because from what I understand in the rds there is no way
in this image you can see the version used is 11.5
but I get the error as you can see below
You haven't set "plugin.name" property to "pgoutput" in your Debezium connector properties which you've already figured out. But this answer is for others who don't know where to set this option and for more clarity.
In the following connector properties:
As you haven't set plugin.name option it is taking the default value which is decoderbufs and that's why you were getting the following error:
Setting the "plugin.name" explicitly to "pgoutput" should solve the issue.
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.user": "postgres",
"database.dbname": "xxxxx",
"tasks.max": "1",
"database.hostname": "xxxx.rds.amazonaws.com",
"database.password": "xxxx",
"database.server.name": "database-1",
"database.port": "5432",
"plugin.name": "pgoutput" --> this property
}
I'm trying to sink the table data one DB to another DB using kafka debezium ( Kafka streaming ) with the help of docker.
DB stream is working fine. But streamed data to sink another MySQL DB process getting an error.
For my connector sink configurations as below.
{
"name": "mysql_sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"topics": "mysql-connect.kafka_test.employee",
"connection.url": "jdbc:mysql://localhost/kafka_test_1&user=debezium&password=xxxxx",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"pk.fields": "id",
"pk.mode": "record_value",
"errors.tolerance": "all",
"errors.log.enable":"true",
"errors.log.include.messages":"true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"name": "mysql_sink"
}
}
But I'm getting an error.
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:560)
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/kafka_test_1&user=debezium&password=xxxxx
io.confluent.connect.jdbc.util.CachedConnectionProvider.getValidConnection(CachedConnectionProvider.java:59)
io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:52)
io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:66)
org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)\n\t... 10 more\nCaused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost/kafka_test_1&user=debezium&password=xxxxx
java.sql.DriverManager.getConnection(DriverManager.java:689)
java.sql.DriverManager.getConnection(DriverManager.java:247)
io.confluent.connect.jdbc.util.CachedConnectionProvider.newConnection(CachedConnectionProvider.java:66)
io.confluent.connect.jdbc.util.CachedConnectionProvider.getValidConnection(CachedConnectionProvider.java:52)\n\t... 13 more
I'm using docker.
version: '3'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
command: [start-kafka.sh]
ports:
- "9092:9092"
links:
- zookeeper
environment:
KAFKA_LISTENERS: PLAINTEXT://:9092,
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://:9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- zookeeper
connect:
build:
context: debezium-jdbc
ports:
- "8083:8083"
links:
- kafka
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: my_connect_configs
OFFSET_STORAGE_TOPIC: my_connect_offsets
CLASSPATH: /kafka/connect/kafka-connect-jdbc-5.3.1.jar
I tried so many things I don't know why I'm getting this error and one more thing I don't have a knowledge of java.
Thanks in advance.
You're getting this error because the JDBCSink (and JDBCSource) connectors use JDBC (as the name implies) to connect to the database, and you have not made the JDBC driver for MySQL available to the connector.
The best way to fix this is to copy the MySQL JDBC driver into the same folder as kafka-connect-jdbc (which on the Docker image is /usr/share/java/kafka-connect-jdbc/).
If you're using Docker Compose then you have three options.
Build a custom Docker image with the driver installed
Download the driver locally
# Download to host machine
mkdir local-jdbc-drivers
cd local-jdbc-drivers
curl https://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.18.tar.gz | tar xz
and mount it into the container into the path of Kafka Connect JDBC:
volumes:
- ${PWD}/local-jdbc-drivers:/usr/share/java/kafka-connect-jdbc/driver-jars/
Install it at runtime like this:
command:
- /bin/bash
- -c
- |
# JDBC Drivers
# ------------
# MySQL
cd /usr/share/java/kafka-connect-jdbc/
curl https://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.18.tar.gz | tar xz
# Now launch Kafka Connect
sleep infinity &
/etc/confluent/docker/run
For more details see this blog.
I have been struggling a lot dealing with the same error No suitable driver found when trying to load a mysql table using kafka connect.
I am using kakfa (not confluent platform) and found out that you can either have two problems:
jdbc url is malformed
the driver chosed for your kafka is not the right one.
I have used the latest driver mysql-connector-java-8.0.21 and received the no suitable driver error. However, when I switched to version mysql-connector-java-5.1.49 (released this year 2020) everything worked like a charm.
You can get the driver versions from maven repo:
https://mvnrepository.com/artifact/mysql/mysql-connector-java
Copy the driver to the classpath, in my case if downloaded kafka and copied into kafka_2.12-2.3.1/libs/ directory
My problem was something that is a little funny actually. I had the necessary jar file in my plugin path, everything is ok until this point. But I had 3 of the same jar file located in different folders. So I searched for them by using:
find /kafka/ -name \ojdbc*.jar
and I removed the 2 of them. After restarting the service, everything started to work normally. A little probability but you may have the same problem :p