Create Kafka-Connect cluster with Docker Compose to be used by ksqlDB - apache-kafka

What I essentially try to do is to have multiple Kafka Connect instances with Docker Compose. I want ksqlDB to use this cluster. For now, they all run on a single machine, but eventually I want to deploy this to a multi-node environment. My problem is that ksqlDB apparently can't find the Kafka Connect cluster. There is the KSQL_KSQL_CONNECT_URL, which stands for the URL of a single Kafka Connect instance. Not providing this variable results in the default value, which is localhost:8083.
I found this docker-compose file, which I think does what I want to do: ksqlDB and multiple Kafka Connect instances. Unfortunately, it didn't help me that much, since it uses an old version of KSQL Server. Here is my docker-compose file:
---
version: '3'
services:
ksqldb-server-connect-test:
image: confluentinc/ksqldb-server:0.15.0
hostname: ksqldb-server-connect-test
container_name: ksqldb-server-connect-test
#ports:
# - "8088:8088"
network_mode: "host"
environment:
KSQL_KSQL_SERVICE_ID: "default_"
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: localhost:9092
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://localhost:8081
#KSQL_KSQL_CONNECT_URL: http://localhost:8083
ksqldb-cli-connect-test:
image: confluentinc/ksqldb-cli:0.15.0
container_name: ksqldb-cli-connect-test
network_mode: "host"
depends_on:
- ksqldb-server-connect-test
entrypoint: /bin/sh
tty: true
schema-registry-connect-test:
image: confluentinc/cp-schema-registry:6.0.1
container_name: schema-registry-connect-test
network_mode: "host"
#ports:
# - "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: localhost:9092
restart: always
kafka-connect-1:
image: confluentinc/cp-kafka-connect-base:6.0.1
container_name: kafka-connect-1
network_mode: "host"
environment:
CONNECT_BOOTSTRAP_SERVERS: "localhost:9092"
CONNECT_REST_PORT: 8082
CONNECT_GROUP_ID: kafka-connect-test
CONNECT_CONFIG_STORAGE_TOPIC: _connect-configs-test
CONNECT_OFFSET_STORAGE_TOPIC: _connect-offsets-test
CONNECT_STATUS_STORAGE_TOPIC: _connect-status-test
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://localhost:8081'
CONNECT_REST_ADVERTISED_HOST_NAME: "localhost"
CONNECT_LOG4J_APPENDER_STDOUT_LAYOUT_CONVERSIONPATTERN: "[%d] %p %X{connector.context}%m (%c:%L)%n"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_PARTITIONS: "25"
CONNECT_STATUS_STORAGE_PARTITIONS: "5"
CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components,/data/connect-jars
volumes:
- $PWD/data/connect-jars/:/usr/share/java/kafka-connect-jdbc/jars/
- $PWD/jmx:/usr/app/
kafka-connect-2:
image: confluentinc/cp-kafka-connect-base:6.0.1
container_name: kafka-connect-2
network_mode: "host"
environment:
CONNECT_BOOTSTRAP_SERVERS: "localhost:9092"
CONNECT_REST_PORT: 8084
CONNECT_GROUP_ID: kafka-connect-test
CONNECT_CONFIG_STORAGE_TOPIC: _connect-configs-test
CONNECT_OFFSET_STORAGE_TOPIC: _connect-offsets-test
CONNECT_STATUS_STORAGE_TOPIC: _connect-status-test
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://localhost:8081'
CONNECT_REST_ADVERTISED_HOST_NAME: "localhost"
CONNECT_LOG4J_APPENDER_STDOUT_LAYOUT_CONVERSIONPATTERN: "[%d] %p %X{connector.context}%m (%c:%L)%n"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_PARTITIONS: "25"
CONNECT_STATUS_STORAGE_PARTITIONS: "5"
CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components,/data/connect-jars
volumes:
- $PWD/data/connect-jars/:/usr/share/java/kafka-connect-jdbc/jars/
- $PWD/jmx:/usr/app/
Note that I use network_mode: "host" because the Kafka cluster itself does not run in a Docker container, so this eases the communication to Kafka in my case.
Does anybody have an idea or a solution on how to get ksqlDB connected to a Kafka Connect cluster using only docker-compose?

what I need to achieve is fault tolerance.
OK, so what you need is >1 Kafka Connect worker, within a single Kafka Connect group. This is what you've got with your configuration of the same storage topics and group.id 👍
So the question is how to get ksqlDB to connect to a cluster of Kafka Connect workers. Since Kafka Connect uses Kafka itself to hold configuration, it doesn't matter which worker it connects to. ksql.connect.url (and thus KSQL_KSQL_CONNECT_URL environment variable in docker) is the correct way to do this, but it's not clear from the docs if you can specify multiple values.
If you can't then I'm guessing you'd need to stick a stateless load balancer in front of the workers and point ksqlDB at that.
Also, the hostname is going to be the name of the container (kafka-connect-1 / kafka-connect-2), not localhost.

Related

ksqldb not failing over to standby schema registry

I am trying to test failover scenario for kafka schema registry.
I spanned up two Schema registry docker containers(Primary and standby) and I have a KSQLDB server running in a docker container pointing to primary schema registry. The source kafka connecter is streaming the data from the database to kafka topics. The ksqlDB server is able to validate the schema of the kafka message using primary schema registry. Now I shutdown the primary schema registry. The ksqldb server is not failing over to the stand by schema registry to validate the schema, causing ksqldb server not receiving the data from kafka topics.
How should ksqldb server should know what is the standby schema-registry that it need to connect to when primary is down.
Below is docker-compose.yml file that I have used
schema-registry:
image: confluentinc/cp-schema-registry:${CP_VERSION}
depends_on:
- zookeeper
- kafka
ports:
- "8081:8081"
container_name: schema-registry
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_ORIGIN: '*'
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_METHODS: 'GET,POST,PUT,OPTIONS'
SCHEMA_REGISTRY_LEADER_ELIGIBILITY : "true"
SCHEMA_REGISTRY_GROUP_ID : "schema-registry-group"
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
schema-registry-2:
image: confluentinc/cp-schema-registry:${CP_VERSION}
depends_on:
- kafka
- schema-registry
ports:
- "8082:8082"
container_name: schema-registry-2
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry-2
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_ORIGIN: '*'
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_METHODS: 'GET,POST,PUT,OPTIONS'
SCHEMA_REGISTRY_LEADER_ELIGIBILITY : "true"
SCHEMA_REGISTRY_GROUP_ID : "schema-registry-group"
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8082
primary-ksqldb-server:
image: ${KSQL_IMAGE_BASE}confluentinc/ksqldb-server:${KSQL_VERSION}
hostname: primary-ksqldb-server
container_name: primary-ksqldb-server
depends_on:
- kafka
- schema-registry
ports:
- "8088:8088"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: kafka:9092
KSQL_KSQL_ADVERTISED_LISTENER : http://localhost:8088
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_EXTENSION_DIR: "/usr/ksqldb/ext/"
KSQL_KSQL_SERVICE_ID: "nrt_"
KSQL_KSQL_STREAMS_NUM_STANDBY_REPLICAS: 1
KSQL_KSQL_QUERY_PULL_ENABLE_STANDBY_READS: "true"
KSQL_KSQL_HEARTBEAT_ENABLE: "true"
KSQL_KSQL_LAG_REPORTING_ENABLE : "true"
KSQL_KSQL_QUERY_PULL_MAX_ALLOWED_OFFSET_LAG : 100
KSQL_LOG4J_APPENDER_KAFKA_APPENDER: "org.apache.kafka.log4jappender.KafkaLog4jAppender"
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_LAYOUT: "io.confluent.common.logging.log4j.StructuredJsonLayout"
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_BROKERLIST: localhost:9092
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_TOPIC: KSQL_LOG
KSQL_LOG4J_LOGGER_IO_CONFLUENT_KSQL: INFO,kafka_appender
KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true"
KSQL_JMX_OPTS: >
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=1099
When I stop primary schema registry, ksqldb is supposed to connect to standy schema registry
How would it know the other is available if you don't provide it?
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081,http://schema-registry-2:8082
In other words, you shut down schema-registry container, so it will simply not respond. It will not forward requests or update the clients to talk to another server... So, you need to provide a URL-list, or you need to setup an external reverse proxy to round-robin the requests to the active instance.

Failed to collect cluster Default info java.lang.IllegalStateException: Error while creating AdminClient for Cluster Default

I would like to use network_mode: bridge for kafka for being able to reach kafka through localhost:9092 from another service
I'm trying to use the provectus/kafka-ui but when I open the consumers menu I get the following error
my docker-compose.yml file :
kafka-ui:
container_name: kafka-ui
image: provectuslabs/kafka-ui:latest
ports:
- 8080:8080
depends_on:
- kafka
environment:
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
KAFKA_CLUSTERS_0_JMXPORT: 9997
kafka:
image: johnnypark/kafka-zookeeper
ports:
- "2181:2181"
- "9092:9092"
network_mode: bridge
environment:
ADVERTISED_HOST: 127.0.0.1
NUM_PARTITIONS: 1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
log error:
2022-01-13 09:16:50,014 ERROR [parallel-5] c.p.k.u.s.MetricsService: Failed to collect cluster Default info
java.lang.IllegalStateException: Error while creating AdminClient for Cluster Default
provectus/kafka-ui
I was using the johnnypark/kafka-zookeeper library for both kafka and zookeeper. I was able to solve this problem by using two separate libraries as in the example below
zookeeper1:
image: confluentinc/cp-zookeeper:5.2.4
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka1:
image: confluentinc/cp-kafka:5.3.1
depends_on:
- zookeeper1
ports:
- 9093:9093
- 9998:9998
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper1:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:29092,PLAINTEXT_HOST://localhost:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
JMX_PORT: 9998
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka1 -Dcom.sun.management.jmxremote.rmi.port=9998
being able to reach kafka through localhost:9092 from another service
You can't use localhost to reach Kafka since that would be the Kafka UI container itself.
Changing ADVERTISED_HOST to kafka and using kafka:9092 from other containers is correct for a bridge network. However, this have the side effect of preventing any access to Kafka outside the Docker network, such as clients directly on the host machine.
Internal and External clients can be configured separately. bitnami/bitnami-docker-kafka
Here's an example using Bitnami's Kafka Image - this allows host clients to connect on port 9093 while allowing kafka-ui to connect with the default port.
version: "3"
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
- '9093:9093'
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS=CLIENT://:9092,EXTERNAL://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=CLIENT://kafka:9092,EXTERNAL://localhost:9093
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
kafka-ui:
image: provectuslabs/kafka-ui
container_name: kafka-ui
ports:
- "8081:8081"
restart: always
environment:
- KAFKA_CLUSTERS_0_NAME=local
- KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092
- SERVER_PORT=8081

Kafka connector "Unable to connect to the server" - dockerized kafka-connect worker that connects to confluent cloud

I'm following similar example as in this blog post:
https://rmoff.net/2019/11/12/running-dockerised-kafka-connect-worker-on-gcp/
Except that I'm not running kafka connect worker on GCP but locally.
Everything is fine I run the docker-compose up and kafka connect starts but when I try to create instance of source connector via CURL I get the following ambiguous message (Note: there is literally no log being outputed in the kafka connect logs):
{"error_code":400,"message":"Connector configuration is invalid and contains the following 1 error(s):\nUnable to connect to the server.\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}
I know I can connect to confluent cloud because I see that there are topics being created:
docker-connect-configs
docker-connect-offsets
docker-connect-status
My docker-compose.yml looks like this:
---
version: '2'
services:
kafka-connect-01:
image: confluentinc/cp-kafka-connect:5.4.0
container_name: kafka-connect-01
restart: always
depends_on:
# - zookeeper
# - kafka
- schema-registry
ports:
- 8083:8083
environment:
CONNECT_LOG4J_APPENDER_STDOUT_LAYOUT_CONVERSIONPATTERN: "[%d] %p %X{connector.context}%m (%c:%L)%n"
CONNECT_BOOTSTRAP_SERVERS: "my-server-name.confluent.cloud:9092"
CONNECT_REST_PORT: 8083
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect-01"
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
#CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: 'http://my-server-name.confluent.cloud:8081'
#CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://my-server-name.confluent.cloud:8081'
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: "INFO"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR"
CONNECT_REPLICATION_FACTOR: "3"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_PLUGIN_PATH: '/usr/share/java'
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
#ENV VARS FOR CCLOUD CONNECTION
CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: "https"
CONNECT_SASL_MECHANISM: PLAIN
CONNECT_SECURITY_PROTOCOL: SASL_SSL
CONNECT_SASL_JAAS_CONFIG: "${SASL_JAAS_CONFIG}"
CONNECT_CONSUMER_SECURITY_PROTOCOL: SASL_SSL
CONNECT_CONSUMER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: https
CONNECT_CONSUMER_SASL_MECHANISM: PLAIN
CONNECT_CONSUMER_SASL_JAAS_CONFIG: "${SASL_JAAS_CONFIG}"
CONNECT_PRODUCER_SECURITY_PROTOCOL: SASL_SSL
CONNECT_PRODUCER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: https
CONNECT_PRODUCER_SASL_MECHANISM: PLAIN
CONNECT_PRODUCER_SASL_JAAS_CONFIG: "${SASL_JAAS_CONFIG}"
volumes:
- db-leach:/db-leach/
- $PWD/connectors:/usr/share/java/kafka-connect-jdbc/jars/
command:
- /bin/bash
- -c
I have dockerized mongo instances running and I want to create mongo source connector, this is my CURL request:
curl -X PUT http://localhost:8083/connectors/my-mongo-source-connector/config -H "Content-Type: application/json" -d '{
"tasks.max":"1",
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri":"mongodb://mongo1:27017,mongo2:27017,mongo3:27017",
"topic.prefix":"topic.prefix",
"topic.suffix":"mySuffix",
"database":"myMongoDB",
"collection":"myMongoCollection",
"copy.existing": "true",
"output.format.key": "json",
"output.format.value": "json",
"change.stream.full.document": "updateLookup",
"publish.full.document.only": "false",
"confluent.topic.bootstrap.servers" : "'${CCLOUD_BROKER_HOST}':9092",
"confluent.topic.sasl.jaas.config" : "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"'${CCLOUD_API_KEY}'\" password=\"'${CCLOUD_API_SECRET}'\";",
"confluent.topic.security.protocol": "SASL_SSL",
"confluent.topic.ssl.endpoint.identification.algorithm": "https",
"confluent.topic.sasl.mechanism": "PLAIN"
}';
What am I missing?
I managed to get it to work, this is a correct configuration...
The message "Unable to connect to the server" was because I had wrongly deployed mongo instance so it's not related to kafka-connect or confluent cloud.
I'm going to leave this question as an example if somebody struggles with this in the future. It took me a while to figure out how to configure docker-compose for kafka-connect that connects to confluent cloud.

How do I configure kafka-connect w/ "securityMechanism=9, encryptionAlgorithm=2" for a db2 database connection in my docker-compose file?

QUESTION:
How do I configure "securityMechanism=9, encryptionAlgorithm=2" for a db2 database connection in my docker-compose file?
NOTE: When running my local kafka installation (kafka_2.13-2.6.0) to connect to a db2 database on the network, I only had to modify the bin/connect-standalone.sh file
by modifying the existing "EXTRA_ARGS=" line like this:
(...)
EXTRA_ARGS=${EXTRA_ARGS-'-name connectStandalone -Ddb2.jcc.securityMechanism=9 -Ddb2.jcc.encryptionAlgorithm=2'}
(...)
it worked fine.
However, when I tried using the same idea for a containerized kafka/broker "service" (docker-compose.yml),
by mounting a volume with the modified "connect-standalone" file content (to replace the "/usr/bin/connect-standalone" file in the container) it did not work.
I did verify that the container's file was changed.
...I receive this exception when I attempt to use a kafka-jdbc-source-connector to connect to the database:
Caused by: com.ibm.db2.jcc.am.SqlInvalidAuthorizationSpecException: [jcc][t4][201][11237][4.25.13] Connection authorization failure occurred.
Reason: Security mechanism not supported. ERRORCODE=-4214, SQLSTATE=28000
So, again, how do I configure the securityMechanism/encryptionAlgorithm setting in a docker-compose.yml?
Thx for any help
-sairn
here is a docker-compose.yml - you can see I've tried mounting volume with the modified "connect-standalone" file in both the broker(kafka) service and the kafka-connect service... neither achieved the desired effect
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.0.0
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-enterprise-kafka:6.0.0
container_name: kafka
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://kafka:9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 100
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: kafka:29092
CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:2181
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'true'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
JVM_OPTS: "-Ddb2.jcc.securityMechanism=9 -Ddb2.jcc.encryptionAlgorithm=2"
volumes:
- ./connect-standalone:/usr/bin/connect-standalone
schema-registry:
image: confluentinc/cp-schema-registry:6.0.0
container_name: schema-registry
hostname: schema-registry
depends_on:
- zookeeper
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
SCHEMA_REGISTRY_LISTENERS: http://schema-registry:8081
kafka-connect:
image: confluentinc/cp-kafka-connect:6.0.0
container_name: kafka-connect
hostname: kafka-connect
depends_on:
- kafka
- schema-registry
ports:
- "8083:8083"
environment:
CONNECT_BOOTSTRAP_SERVERS: "kafka:29092"
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect"
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: kafka-connect
CONNECT_CONFIG_STORAGE_TOPIC: kafka-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: kafka-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: kafka-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR
JVM_OPTS: "-Ddb2.jcc.securityMechanism=9 -Ddb2.jcc.encryptionAlgorithm=2"
volumes:
- ./kafka-connect-jdbc-10.0.1.jar:/usr/share/java/kafka-connect-jdbc/kafka-connect-jdbc-10.0.1.jar
- ./db2jcc-db2jcc4.jar:/usr/share/java/kafka-connect-jdbc/db2jcc-db2jcc4.jar
- ./connect-standalone:/usr/bin/connect-standalone
Fwiw, the connector looks similar to this...
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "CONNECTOR01",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url":"jdbc:db2://THEDBURL:50000/XXXXX",
"connection.user":"myuserid",
"connection.password":"mypassword",
"poll.interval.ms":"15000",
"table.whitelist":"YYYYY.TABLEA",
"topic.prefix":"tbl-",
"mode":"timestamp",
"timestamp.initial":"-1",
"timestamp.column.name":"TIME_UPD",
"poll.interval.ms":"15000"
}
}'
Try to use KAFKA_OPTS instead of JVM_OPTS

Kafka connect is sending a malformed json

I'm trying to perform a proof of concept using kafka-connect with a rabbitMQ connector. Basically, I have two simple spring boot applications; a RabbitMQ producer and a Kafka consumer. The consumer can not handle the messages from the connector because it's transforming somehow my JSON message; RabbitMQ sends {"transaction": "PAYMENT", "amount": "$125.0"} and kafka-connect prints X{"transaction": "PAYMENT", "amount": "$125.0"}. Please note the X at the beginning. If I add a field, let's say "foo": "bar" then that letter becomes a t or whatever.
Dockerfile (connector):
FROM confluentinc/cp-kafka-connect-base:5.3.2
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-rabbitmq:latest
Please generate the image as follows: docker build . -t rabbit-connector, so you can reference it in the docker-compose file as rabbit-connector.
docker-compose.yml:
version: '2'
networks:
kafka-connect-network:
driver: bridge
services:
zookeeper:
image: confluentinc/cp-zookeeper:5.3.2
networks:
- kafka-connect-network
ports:
- '31000:31000'
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
KAFKA_JMX_HOSTNAME: "localhost"
KAFKA_JMX_PORT: 31000
kafka:
image: confluentinc/cp-enterprise-kafka:5.3.2
networks:
- kafka-connect-network
ports:
- '9092:9092'
- '31001:31001'
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 100
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: kafka:29092
CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:2181
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'false'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
KAFKA_JMX_HOSTNAME: "localhost"
KAFKA_JMX_PORT: 31001
schema-registry:
image: confluentinc/cp-schema-registry:5.3.2
depends_on:
- zookeeper
- kafka
networks:
- kafka-connect-network
ports:
- '8081:8081'
- '31002:31002'
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: zookeeper:2181
SCHEMA_REGISTRY_JMX_HOSTNAME: "localhost"
SCHEMA_REGISTRY_JMX_PORT: 31002
rabbitmq:
image: rabbitmq
environment:
RABBITMQ_DEFAULT_USER: guest
RABBITMQ_DEFAULT_PASS: guest
RABBITMQ_DEFAULT_VHOST: "/"
networks:
- kafka-connect-network
ports:
- '15672:15672'
- '5672:5672'
kafka-connect:
image: rabbit-connector
networks:
- kafka-connect-network
ports:
- '8083:8083'
- '31004:31004'
environment:
CONNECT_BOOTSTRAP_SERVERS: "kafka:29092"
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect"
CONNECT_LOG4J_ROOT_LOGLEVEL: "ERROR"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "1"
CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components
KAFKA_JMX_HOSTNAME: "localhost"
KAFKA_JMX_PORT: 31004
depends_on:
- zookeeper
- kafka
- schema-registry
- rabbitmq
rest-proxy:
image: confluentinc/cp-kafka-rest:5.3.2
depends_on:
- zookeeper
- kafka
- schema-registry
networks:
- kafka-connect-network
ports:
- '8082:8082'
- '31005:31005'
environment:
KAFKA_REST_HOST_NAME: rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: 'kafka:29092'
KAFKA_REST_LISTENERS: "http://0.0.0.0:8082"
KAFKA_REST_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
KAFKAREST_JMX_HOSTNAME: "localhost"
KAFKAREST_JMX_PORT: 31005
schema.avsc:
{
"type": "record",
"name": "CustomMessage",
"namespace": "com.poc.model",
"fields": [
{
"name": "transaction",
"type": "string"
},
{
"name": "amount",
"type": "string"
}
]
}
So here I am using a StringConverter for my key (which I don't care to be honest) and AvroConverter for the value. Maybe I am missing something or I'm misconfiguring my kafka-connect worker.
My connector configuration is (connector-config.json):
{
"name" : "rabbit_to_kafka_poc",
"config" : {
"connector.class" : "io.confluent.connect.rabbitmq.RabbitMQSourceConnector",
"tasks.max" : "1",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"kafka.topic" : "spectrum-message",
"rabbitmq.queue" : "spectrum-queue",
"rabbitmq.username": "guest",
"rabbitmq.password": "guest",
"rabbitmq.host": "rabbitmq",
"rabbitmq.port": "5672",
"rabbitmq.virtual.host": "/"
}
}
To register my connector I do curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d #connector-config.json.
Once I configure everything, I run the following command to print out my messages:
kafka-avro-console-consumer --bootstrap-server localhost:9092 \
--topic spectrum-message \
--from-beginning
And the JSON starts with a letter, so my question is why is this happening? I think something is encoding my message but my rabbitMQ producer is sending a plain JSON message. I can confirm by testing with a RabbitMQ consumer and debugging my application to the point where the message is being sent out.
You need to use the ByteArrayConverter. It's just bytes that the connector pulls from RabbitMQ - it won't try to coerce it to a schema. Even if you serialise it to Avro, the schema is just a single field of bytes:
$ curl -s -XGET localhost:8081/subjects/rabbit-test-avro-00-value/versions/1 | jq '.'
{
"subject": "rabbit-test-avro-00-value",
"version": 1,
"id": 1,
"schema": "\"bytes\""
}
If you want to write it to a topic in Avro (which is a good idea) with a schema then use something like Kafka Streams or ksqlDB to do this, applying a stream processor to the source topic which Kafka Connect writes to with the ByteArrayConverter.
For example in ksqlDB you would do:
-- Inspect the topic - ksqlDB recognises the format as JSON
ksql> PRINT 'rabbit-test-00' FROM BEGINNING;
Format:JSON
{"ROWTIME":1578477403591,"ROWKEY":"null","transaction":"PAYMENT","amount":"$125.0"}
{"ROWTIME":1578477598555,"ROWKEY":"null","transaction":"PAYMENT","amount":"$125.0"}
-- Declare the schema
CREATE STREAM rabbit (transaction VARCHAR,
amount VARCHAR)
WITH (KAFKA_TOPIC='rabbit-test-00',
VALUE_FORMAT='JSON');
-- Reserialise to Avro
CREATE STREAM TRANSACTIONS WITH (VALUE_FORMAT='AVRO',
KAFKA_TOPIC='reserialised_data') AS
SELECT *
FROM rabbit
EMIT CHANGES;
For more details, see this blog that I've written up.
You don't have JSON messages, you have Avro messages coming out of Kafka based on the usages of the AvroConverter.
That letter is not actually a letter, but your terminal showing the UTF8 representation of the first 5 bytes of the binary data. That commonly happens when using regular console consumer, rather than the avro-console-consumer which itself will parse the bytes out of the topic correctly for Avro data
If you want JSON throughout, use the JSONConverter instead