Cannot fix issue after submitting spark structured streaming job for reading from kafka.
Code example of spark job:
object KafkaStructuredStreaming {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName(getClass.getName)
.master("spark://spark-master:7077")
.getOrCreate()
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "kafka:9092")
.option("startingOffsets", "earliest")
.option("subscribe", "tweet-upload-6")
.option("enable.auto.commit", false)
.option("group.id", "Structured-Streaming-Examples")
.option("failOnDataLoss", false)
.load()
df.printSchema()
val consoleOutput = df.writeStream
.outputMode("append")
.format("console")
.start()
consoleOutput.awaitTermination()
}
}
One note: Kafka and spark nodes are in the same docker network, everything worked before with spark-streaming but i switched to structured streaming because i had issue with single input stream -> multiple output streams.
I am getting now error:
07721793-driver-0] Error connecting to node kafka:9092 (id: 1001 rack: null)
submit-spark-job | java.net.UnknownHostException: kafka
submit-spark-job | at java.net.InetAddress.getAllByName0(InetAddress.java:1282)
submit-spark-job | at java.net.InetAddress.getAllByName(InetAddress.java:1194)
submit-spark-job | at java.net.InetAddress.getAllByName(InetAddress.java:1128)
submit-spark-job | at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)
submit-spark-job | at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)
submit-spark-job | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:512)
submit-spark-job | at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:466)
submit-spark-job | at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)
submit-spark-job | at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)
submit-spark-job | at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)
submit-spark-job | at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:498)
submit-spark-job | at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
submit-spark-job | at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:306)
submit-spark-job | at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1367)
submit-spark-job | 21/10/18 22:44:41 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-4d4a80ae-01a7-4679-8393-42aaa0e35e6a-307721793-driver-0-1, groupId=spark-kafka-source-4d4a80ae-01a7-4679-8393-42aaa0e35e6a-307721793-driver-0] Error connecting to node kafka:9092 (id: 1001 rack: null)
submit-spark-job | java.net.UnknownHostException: kafka
also
85949862d6d3-1705030665-driver-0] Group coordinator kafka:9092 (id: 2147482646 rack: null) is unavailable or invalid due to cause: null.isDisconnected: true. Rediscovery will be attempted.
submit-spark-job | 21/10/18 23:03:46 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0-1, groupId=spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0] Connection to node 1001 (kafka/172.18.0.4:9092) could not be established. Broker may not be available.
submit-spark-job | 21/10/18 23:03:47 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0-1, groupId=spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0] Connection to node 1001 (kafka/172.18.0.4:9092) could not be established. Broker may not be available.
submit-spark-job | 21/10/18 23:03:47 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0-1, groupId=spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0] Connection to node 1001 (kafka/172.18.0.4:9092) could not be established. Broker may not be available.
submit-spark-job | 21/10/18 23:03:48 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0-1, groupId=spark-kafka-source-aafbb352-744d-438a-bd45-85949862d6d3-1705030665-driver-0] Connection to node 1001 (kafka/172.18.0.4:9092) could not be established. Broker may not be available.
submit-spark-job | 21/10/18 23:03:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (172.18.0.7 executor 0): org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
submit-spark-job | at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:823)
submit-spark-job | at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:665)
submit-spark-job | at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:613)
submit-spark-job | at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumer.createConsumer(KafkaDataConsumer.scala:124)
submit-spark-job | at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumer.<init>(KafkaDataConsumer.scala:61)
submit-spark-job | at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$ObjectFactory.create(InternalKafkaConsumerPool.scala:206)
submit-spark-job | at org.apache.spark.sql.kafka010.consumer.InternalKafkaConsumerPool$ObjectFactory.create(InternalKafkaConsumerPool.scala:201)
submit-spark-job | at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactor
Docker compose files:
version: '2'
services:
zookeeper:
image: zookeeper
ports:
- "2181:2181"
kafka:
image: linuxkitpoc/kafka:latest
ports:
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_ADVERTISED_PORT: "9092"
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
tweet-producer:
image: mkovacevic/tweet-producer-app:latest
ports:
- "8080:8080"
tty: true
depends_on:
- kafka
(created in default network : twitter-streaming_default)
and
version: '3'
services:
spark-master:
image: bde2020/spark-master:3.1.1-hadoop3.2
container_name: spark-master
hostname: spark-master
ports:
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:3.1.1-hadoop3.2
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
spark-worker-2:
image: bde2020/spark-worker:3.1.1-hadoop3.2
container_name: spark-worker-2
depends_on:
- spark-master
ports:
- "8082:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
submit-spark:
image: mkovacevic/spark-streaming-app:latest
container_name: submit-spark-job
depends_on:
- spark-master
- spark-worker-1
- spark-worker-2
networks:
default:
external: true
name: twitter-streaming_default
Any suggestions??
Each compose file creates its own, isolated default bridge network.
If you want to use two files, then you explicitly have to add the network to every service. Otherwise, you need to put all services in the one file
By the way, linuxkitpoc/kafka image hasn't been updated in years. I suggest you use something else for Kafka. Personal recommendation would be ones from Bitnami
Related
I have created a simple python code that generates user_id, receipent_id and amount. I have created a kafka producer and consumer. Python code returns data as a json. Now I am trying connect my data to Neo4j through kafka but I am unable to do it.
https://neo4j.com/docs/kafka/quickstart-connect/
I started to check the documents but when I directly copy the docker-compose.yml
---
version: '2'
services:
neo4j:
image: neo4j:4.0.3-enterprise
hostname: neo4j
container_name: neo4j
ports:
- "7474:7474"
- "7687:7687"
environment:
NEO4J_kafka_bootstrap_servers: broker:9093
NEO4J_AUTH: neo4j/connect
NEO4J_dbms_memory_heap_max__size: 8G
NEO4J_ACCEPT_LICENSE_AGREEMENT: yes
zookeeper:
image: confluentinc/cp-zookeeper
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-enterprise-kafka
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "9092:9092"
expose:
- "9093"
environment:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9093,OUTSIDE://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:9093
# workaround if we change to a custom name the schema_registry fails to start
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:2181
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: 'true'
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
schema_registry:
image: confluentinc/cp-schema-registry
hostname: schema_registry
container_name: schema_registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema_registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
connect:
image: confluentinc/cp-kafka-connect
hostname: connect
container_name: connect
depends_on:
- zookeeper
- broker
- schema_registry
ports:
- "8083:8083"
volumes:
- ./plugins:/tmp/connect-plugins
environment:
CONNECT_BOOTSTRAP_SERVERS: 'broker:9093'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema_registry:8081'
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema_registry:8081'
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
CONNECT_PLUGIN_PATH: /usr/share/java,/tmp/connect-plugins
CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=DEBUG,org.I0Itec.zkclient=DEBUG,org.reflections=ERROR
control-center:
image: confluentinc/cp-enterprise-control-center
hostname: control-center
container_name: control-center
depends_on:
- zookeeper
- broker
- schema_registry
- connect
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:9093'
CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
CONTROL_CENTER_CONNECT_CLUSTER: 'connect:8083'
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONFLUENT_METRICS_TOPIC_REPLICATION: 1
PORT: 9021
Docker containers
I get an error from schema-registry container which is
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
===> Running preflight checks ...
===> Check if Zookeeper is healthy ...
[2022-12-14 12:18:38,319] INFO Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:host.name=schema_registry (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.version=11.0.16.1 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.home=/usr/lib/jvm/zulu11-ca (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.class.path=/usr/share/java/cp-base-new/disk-usage-agent-7.3.0.jar:/usr/share/java/cp-base-new/reload4j-1.2.19.jar:/usr/share/java/cp-base-new/kafka-server-common-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jopt-simple-5.0.4.jar:/usr/share/java/cp-base-new/scala-logging_2.13-3.9.4.jar:/usr/share/java/cp-base-new/scala-java8-compat_2.13-1.0.2.jar:/usr/share/java/cp-base-new/zookeeper-3.6.3.jar:/usr/share/java/cp-base-new/json-simple-1.1.1.jar:/usr/share/java/cp-base-new/metrics-core-2.2.0.jar:/usr/share/java/cp-base-new/audience-annotations-0.5.0.jar:/usr/share/java/cp-base-new/kafka-storage-api-7.3.0-ccs.jar:/usr/share/java/cp-base-new/kafka-clients-7.3.0-ccs.jar:/usr/share/java/cp-base-new/slf4j-reload4j-1.7.36.jar:/usr/share/java/cp-base-new/snappy-java-1.1.8.4.jar:/usr/share/java/cp-base-new/commons-cli-1.4.jar:/usr/share/java/cp-base-new/scala-collection-compat_2.13-2.6.0.jar:/usr/share/java/cp-base-new/jackson-core-2.13.2.jar:/usr/share/java/cp-base-new/jmx_prometheus_javaagent-0.14.0.jar:/usr/share/java/cp-base-new/kafka-raft-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jackson-module-scala_2.13-2.13.2.jar:/usr/share/java/cp-base-new/re2j-1.6.jar:/usr/share/java/cp-base-new/jose4j-0.7.9.jar:/usr/share/java/cp-base-new/snakeyaml-1.30.jar:/usr/share/java/cp-base-new/logredactor-metrics-1.0.10.jar:/usr/share/java/cp-base-new/logredactor-1.0.10.jar:/usr/share/java/cp-base-new/jackson-dataformat-yaml-2.13.2.jar:/usr/share/java/cp-base-new/kafka_2.13-7.3.0-ccs.jar:/usr/share/java/cp-base-new/kafka-storage-7.3.0-ccs.jar:/usr/share/java/cp-base-new/utility-belt-7.3.0.jar:/usr/share/java/cp-base-new/jackson-annotations-2.13.2.jar:/usr/share/java/cp-base-new/minimal-json-0.9.5.jar:/usr/share/java/cp-base-new/lz4-java-1.8.0.jar:/usr/share/java/cp-base-new/zookeeper-jute-3.6.3.jar:/usr/share/java/cp-base-new/zstd-jni-1.5.2-1.jar:/usr/share/java/cp-base-new/jackson-dataformat-csv-2.13.2.jar:/usr/share/java/cp-base-new/slf4j-api-1.7.36.jar:/usr/share/java/cp-base-new/jackson-databind-2.13.2.2.jar:/usr/share/java/cp-base-new/jolokia-jvm-1.7.1.jar:/usr/share/java/cp-base-new/paranamer-2.8.jar:/usr/share/java/cp-base-new/gson-2.9.0.jar:/usr/share/java/cp-base-new/metrics-core-4.1.12.1.jar:/usr/share/java/cp-base-new/kafka-metadata-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jackson-datatype-jdk8-2.13.2.jar:/usr/share/java/cp-base-new/common-utils-7.3.0.jar:/usr/share/java/cp-base-new/scala-reflect-2.13.5.jar:/usr/share/java/cp-base-new/scala-library-2.13.5.jar:/usr/share/java/cp-base-new/argparse4j-0.7.0.jar:/usr/share/java/cp-base-new/jolokia-core-1.7.1.jar (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.version=5.10.104-linuxkit (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:user.name=appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:user.home=/home/appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:user.dir=/home/appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.memory.free=51MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.memory.max=952MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,320] INFO Client environment:os.memory.total=60MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,326] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher#3c0a50da (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,332] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2022-12-14 12:18:38,341] INFO jute.maxbuffer value is 1048575 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2022-12-14 12:18:38,351] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:38,372] INFO Opening socket connection to server zookeeper/172.18.0.2:2181. (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:38,375] INFO SASL config status: Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:38,388] INFO Socket connection established, initiating session, client: /172.18.0.5:47172, server: zookeeper/172.18.0.2:2181 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:38,542] INFO Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x10000250f890000, negotiated timeout = 40000 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:38,587] WARN An exception was thrown while closing send thread for session 0x10000250f890000. (org.apache.zookeeper.ClientCnxn)
EndOfStreamException: Unable to read additional data from server sessionid 0x10000250f890000, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
[2022-12-14 12:18:38,699] INFO Session: 0x10000250f890000 closed (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:38,699] INFO EventThread shut down for session: 0x10000250f890000 (org.apache.zookeeper.ClientCnxn)
Using log4j config /etc/schema-registry/log4j.properties
===> Check if Kafka is healthy ...
[2022-12-14 12:18:39,567] INFO Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,567] INFO Client environment:host.name=schema_registry (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.version=11.0.16.1 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.home=/usr/lib/jvm/zulu11-ca (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.class.path=/usr/share/java/cp-base-new/disk-usage-agent-7.3.0.jar:/usr/share/java/cp-base-new/reload4j-1.2.19.jar:/usr/share/java/cp-base-new/kafka-server-common-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jopt-simple-5.0.4.jar:/usr/share/java/cp-base-new/scala-logging_2.13-3.9.4.jar:/usr/share/java/cp-base-new/scala-java8-compat_2.13-1.0.2.jar:/usr/share/java/cp-base-new/zookeeper-3.6.3.jar:/usr/share/java/cp-base-new/json-simple-1.1.1.jar:/usr/share/java/cp-base-new/metrics-core-2.2.0.jar:/usr/share/java/cp-base-new/audience-annotations-0.5.0.jar:/usr/share/java/cp-base-new/kafka-storage-api-7.3.0-ccs.jar:/usr/share/java/cp-base-new/kafka-clients-7.3.0-ccs.jar:/usr/share/java/cp-base-new/slf4j-reload4j-1.7.36.jar:/usr/share/java/cp-base-new/snappy-java-1.1.8.4.jar:/usr/share/java/cp-base-new/commons-cli-1.4.jar:/usr/share/java/cp-base-new/scala-collection-compat_2.13-2.6.0.jar:/usr/share/java/cp-base-new/jackson-core-2.13.2.jar:/usr/share/java/cp-base-new/jmx_prometheus_javaagent-0.14.0.jar:/usr/share/java/cp-base-new/kafka-raft-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jackson-module-scala_2.13-2.13.2.jar:/usr/share/java/cp-base-new/re2j-1.6.jar:/usr/share/java/cp-base-new/jose4j-0.7.9.jar:/usr/share/java/cp-base-new/snakeyaml-1.30.jar:/usr/share/java/cp-base-new/logredactor-metrics-1.0.10.jar:/usr/share/java/cp-base-new/logredactor-1.0.10.jar:/usr/share/java/cp-base-new/jackson-dataformat-yaml-2.13.2.jar:/usr/share/java/cp-base-new/kafka_2.13-7.3.0-ccs.jar:/usr/share/java/cp-base-new/kafka-storage-7.3.0-ccs.jar:/usr/share/java/cp-base-new/utility-belt-7.3.0.jar:/usr/share/java/cp-base-new/jackson-annotations-2.13.2.jar:/usr/share/java/cp-base-new/minimal-json-0.9.5.jar:/usr/share/java/cp-base-new/lz4-java-1.8.0.jar:/usr/share/java/cp-base-new/zookeeper-jute-3.6.3.jar:/usr/share/java/cp-base-new/zstd-jni-1.5.2-1.jar:/usr/share/java/cp-base-new/jackson-dataformat-csv-2.13.2.jar:/usr/share/java/cp-base-new/slf4j-api-1.7.36.jar:/usr/share/java/cp-base-new/jackson-databind-2.13.2.2.jar:/usr/share/java/cp-base-new/jolokia-jvm-1.7.1.jar:/usr/share/java/cp-base-new/paranamer-2.8.jar:/usr/share/java/cp-base-new/gson-2.9.0.jar:/usr/share/java/cp-base-new/metrics-core-4.1.12.1.jar:/usr/share/java/cp-base-new/kafka-metadata-7.3.0-ccs.jar:/usr/share/java/cp-base-new/jackson-datatype-jdk8-2.13.2.jar:/usr/share/java/cp-base-new/common-utils-7.3.0.jar:/usr/share/java/cp-base-new/scala-reflect-2.13.5.jar:/usr/share/java/cp-base-new/scala-library-2.13.5.jar:/usr/share/java/cp-base-new/argparse4j-0.7.0.jar:/usr/share/java/cp-base-new/jolokia-core-1.7.1.jar (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:os.version=5.10.104-linuxkit (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:user.name=appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:user.home=/home/appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,568] INFO Client environment:user.dir=/home/appuser (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,569] INFO Client environment:os.memory.free=50MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,569] INFO Client environment:os.memory.max=952MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,569] INFO Client environment:os.memory.total=60MB (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,574] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher#221af3c0 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,578] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2022-12-14 12:18:39,587] INFO jute.maxbuffer value is 1048575 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2022-12-14 12:18:39,597] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,621] INFO Opening socket connection to server zookeeper/172.18.0.2:2181. (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,623] INFO SASL config status: Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,639] INFO Socket connection established, initiating session, client: /172.18.0.5:47176, server: zookeeper/172.18.0.2:2181 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,658] INFO Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x10000250f890001, negotiated timeout = 40000 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,678] WARN An exception was thrown while closing send thread for session 0x10000250f890001. (org.apache.zookeeper.ClientCnxn)
EndOfStreamException: Unable to read additional data from server sessionid 0x10000250f890001, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
[2022-12-14 12:18:39,785] INFO Session: 0x10000250f890001 closed (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,785] INFO EventThread shut down for session: 0x10000250f890001 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,785] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=40000 watcher=io.confluent.admin.utils.ZookeeperConnectionWatcher#55a1c291 (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,786] INFO jute.maxbuffer value is 1048575 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2022-12-14 12:18:39,786] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,787] INFO Opening socket connection to server zookeeper/172.18.0.2:2181. (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,787] INFO SASL config status: Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,788] INFO Socket connection established, initiating session, client: /172.18.0.5:47178, server: zookeeper/172.18.0.2:2181 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,799] INFO Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x10000250f890002, negotiated timeout = 40000 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:39,979] INFO Session: 0x10000250f890002 closed (org.apache.zookeeper.ZooKeeper)
[2022-12-14 12:18:39,979] INFO EventThread shut down for session: 0x10000250f890002 (org.apache.zookeeper.ClientCnxn)
[2022-12-14 12:18:40,122] INFO AdminClientConfig values:
bootstrap.servers = [broker:9093]
client.dns.lookup = use_all_dns_ips
client.id =
connections.max.idle.ms = 300000
default.api.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.connect.timeout.ms = null
sasl.login.read.timeout.ms = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.login.retry.backoff.max.ms = 10000
sasl.login.retry.backoff.ms = 100
sasl.mechanism = GSSAPI
sasl.oauthbearer.clock.skew.seconds = 30
sasl.oauthbearer.expected.audience = null
sasl.oauthbearer.expected.issuer = null
sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
sasl.oauthbearer.jwks.endpoint.url = null
sasl.oauthbearer.scope.claim.name = scope
sasl.oauthbearer.sub.claim.name = sub
sasl.oauthbearer.token.endpoint.url = null
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
(org.apache.kafka.clients.admin.AdminClientConfig)
[2022-12-14 12:18:40,312] INFO Kafka version: 7.3.0-ccs (org.apache.kafka.common.utils.AppInfoParser)
[2022-12-14 12:18:40,312] INFO Kafka commitId: b8341813ae2b0444 (org.apache.kafka.common.utils.AppInfoParser)
[2022-12-14 12:18:40,312] INFO Kafka startTimeMs: 1671020320310 (org.apache.kafka.common.utils.AppInfoParser)
Using log4j config /etc/schema-registry/log4j.properties
===> Launching ...
===> Launching schema-registry ...
[2022-12-14 12:18:41,910] INFO SchemaRegistryConfig values:
access.control.allow.headers =
access.control.allow.methods =
access.control.allow.origin =
access.control.skip.options = true
authentication.method = NONE
authentication.realm =
authentication.roles = [*]
authentication.skip.paths = []
avro.compatibility.level =
compression.enable = true
connector.connection.limit = 0
csrf.prevention.enable = false
csrf.prevention.token.endpoint = /csrf
csrf.prevention.token.expiration.minutes = 30
csrf.prevention.token.max.entries = 10000
debug = false
dos.filter.delay.ms = 100
dos.filter.enabled = false
dos.filter.insert.headers = true
dos.filter.ip.whitelist = []
dos.filter.managed.attr = false
dos.filter.max.idle.tracker.ms = 30000
dos.filter.max.requests.ms = 30000
dos.filter.max.requests.per.connection.per.sec = 25
dos.filter.max.requests.per.sec = 25
dos.filter.max.wait.ms = 50
dos.filter.throttle.ms = 30000
dos.filter.throttled.requests = 5
host.name = schema_registry
http2.enabled = true
idle.timeout.ms = 30000
inter.instance.headers.whitelist = []
inter.instance.protocol = http
kafkastore.bootstrap.servers = []
kafkastore.checkpoint.dir = /tmp
kafkastore.checkpoint.version = 0
kafkastore.connection.url = zookeeper:2181
kafkastore.group.id =
kafkastore.init.timeout.ms = 60000
kafkastore.sasl.kerberos.kinit.cmd = /usr/bin/kinit
kafkastore.sasl.kerberos.min.time.before.relogin = 60000
kafkastore.sasl.kerberos.service.name =
kafkastore.sasl.kerberos.ticket.renew.jitter = 0.05
kafkastore.sasl.kerberos.ticket.renew.window.factor = 0.8
kafkastore.sasl.mechanism = GSSAPI
kafkastore.security.protocol = PLAINTEXT
kafkastore.ssl.cipher.suites =
kafkastore.ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1
kafkastore.ssl.endpoint.identification.algorithm =
kafkastore.ssl.key.password = [hidden]
kafkastore.ssl.keymanager.algorithm = SunX509
kafkastore.ssl.keystore.location =
kafkastore.ssl.keystore.password = [hidden]
kafkastore.ssl.keystore.type = JKS
kafkastore.ssl.protocol = TLS
kafkastore.ssl.provider =
kafkastore.ssl.trustmanager.algorithm = PKIX
kafkastore.ssl.truststore.location =
kafkastore.ssl.truststore.password = [hidden]
kafkastore.ssl.truststore.type = JKS
kafkastore.timeout.ms = 500
kafkastore.topic = _schemas
kafkastore.topic.replication.factor = 3
kafkastore.topic.skip.validation = false
kafkastore.update.handlers = []
kafkastore.write.max.retries = 5
leader.eligibility = true
listener.protocol.map = []
listeners = []
master.eligibility = null
metric.reporters = []
metrics.jmx.prefix = kafka.schema.registry
metrics.num.samples = 2
metrics.sample.window.ms = 30000
metrics.tag.map = []
mode.mutability = true
nosniff.prevention.enable = false
port = 8081
proxy.protocol.enabled = false
reject.options.request = false
request.logger.name = io.confluent.rest-utils.requests
request.queue.capacity = 2147483647
request.queue.capacity.growby = 64
request.queue.capacity.init = 128
resource.extension.class = []
resource.extension.classes = []
resource.static.locations = []
response.http.headers.config =
response.mediatype.default = application/vnd.schemaregistry.v1+json
response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json]
rest.servlet.initializor.classes = []
schema.cache.expiry.secs = 300
schema.cache.size = 1000
schema.canonicalize.on.consume = []
schema.compatibility.level = backward
schema.providers = []
schema.registry.group.id = schema-registry
schema.registry.inter.instance.protocol =
schema.registry.resource.extension.class = []
server.connection.limit = 0
shutdown.graceful.ms = 1000
ssl.cipher.suites = []
ssl.client.auth = false
ssl.client.authentication = NONE
ssl.enabled.protocols = []
ssl.endpoint.identification.algorithm = null
ssl.key.password = [hidden]
ssl.keymanager.algorithm =
ssl.keystore.location =
ssl.keystore.password = [hidden]
ssl.keystore.reload = false
ssl.keystore.type = JKS
ssl.keystore.watch.location =
ssl.protocol = TLS
ssl.provider =
ssl.trustmanager.algorithm =
ssl.truststore.location =
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
suppress.stack.trace.response = true
thread.pool.max = 200
thread.pool.min = 8
websocket.path.prefix = /ws
websocket.servlet.initializor.classes = []
(io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig)
[2022-12-14 12:18:42,007] INFO Logging initialized #879ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2022-12-14 12:18:42,066] INFO Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
[2022-12-14 12:18:42,172] WARN DEPRECATION warning: `listeners` configuration is not configured. Falling back to the deprecated `port` configuration. (io.confluent.rest.ApplicationServer)
[2022-12-14 12:18:42,175] INFO Adding listener with HTTP/2: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer)
[2022-12-14 12:18:42,589] WARN DEPRECATION warning: `listeners` configuration is not configured. Falling back to the deprecated `port` configuration. (io.confluent.rest.ApplicationServer)
[2022-12-14 12:18:42,744] ERROR Server died unexpectedly: (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
org.apache.kafka.common.config.ConfigException: No supported Kafka endpoints are configured. kafkastore.bootstrap.servers must have at least one endpoint matching kafkastore.security.protocol.
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig.endpointsToBootstrapServers(SchemaRegistryConfig.java:666)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig.bootstrapBrokers(SchemaRegistryConfig.java:615)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1566)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.<init>(KafkaSchemaRegistry.java:171)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:71)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:285)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:270)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:44)
Server dies unexpectedly because there is no supported Kafka endpoints are configured.I found similar problems that had been asked like 6 years ago. So that did not help
I have searched confluent docs, I have tried to use different types of versions.
The error is informing you that you need to remove the (deprecated) property SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL and use SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS instead, and set it to broker:9093
You could start with a working compose file then add Neo4j to that.
Note: The Schema Registry is not a requirement to use Kafka Connect with or without Neo4j, or any Python library.
Trying to send data to mongodb through kafka using schema registry
My docker compsoe looks like
schema-registry:
image: confluentinc/cp-schema-registry:latest
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
networks:
- localnet
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS:
"PLAINTEXT://broker:29092"
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
connect:
image: quickstart-connect-1.7.0:1.0
build:
context: .
dockerfile: Dockerfile-MongoConnect
hostname: connect
container_name: connect
depends_on:
- zookeeper
- broker
- schema-registry
networks:
- localnet
environment:
CONNECT_BOOTSTRAP_SERVERS: "broker:29092"
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: connect-cluster-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_KEY_CONVERTER: "io.confluent.connect.avro.AvroConverter"
CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter"
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: "http://localhost:8081"
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://localhost:8081"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_ZOOKEEPER_CONNECT: "zookeeper:2181"
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_AUTO_CREATE_TOPICS_ENABLE: "true"
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
and my sink connector is :
curl -X POST \
-H "Content-Type: application/json" \
--data '
{"name": "mongo-sink-dinos",
"config": {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"connection.uri":"mongodb://mongo1:27017/?replicaSet=rs0",
"database":"quickstart",
"collection":"abcd",
"topics":"abcdf",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://localhost:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://localhost:8081"
}
}
' \
http://connect:8083/connectors -w "\n"
I use command : kafka-avro-console-producer --topic dinosaurs --
broker-list broker:29092 \
--property schema.registry.url="http://localhost:8081" \
--property value.schema="$(< abc.avsc)"
(I have a file abc.avsc in schema registry)
But the data is not pushed to mongodb, it is actually received by consumer. When I check Connect logs it shows like :
org.apache.kafka.common.config.ConfigException: Missing required
configuration "schema.registry.url" which has no default value.
What might the reason that the data is not pushed to mongodb
I'm trying to write Kafka topic data to local Dynamodb. However, the Connector state is always in degraded state. Below is my connector cofig properties.
{
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"name": "dynamo-sink-connector",
"connector.class": "io.confluent.connect.aws.dynamodb.DynamoDbSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"topics": [
"KAFKA_STOCK"
],
"aws.dynamodb.pk.hash": "value.companySymbol",
"aws.dynamodb.pk.sort": "value.txTime",
"aws.dynamodb.endpoint": "http://localhost:8000",
"confluent.topic.bootstrap.servers": [
"broker:29092"
]
}
I was referring to this https://github.com/RWaltersMA/mongo-source-sink and replaced mongo with DynamoDB sink
Could someone provide a simple working example, please?
Below is the full example for using AWS DynamoDB Sink connector.
Thanks! to #OneCricketeer for his suggestions
Dockerfile-DynamoDBConnect which is referred to in docker-compose.yml below
FROM confluentinc/cp-kafka-connect:latest
ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-aws-dynamodb:latest
docker-compose.yml
version: '3.6'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
networks:
- localnet
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-kafka:latest
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "19092:19092"
- "9092:9092"
networks:
- localnet
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:19092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'
schema-registry:
image: confluentinc/cp-schema-registry:latest
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
networks:
- localnet
environment:
SCHEMA_REGISTRY_HOST_NAME: localhost
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: 'zookeeper:2181'
dynamodb-local:
command: "-jar DynamoDBLocal.jar -sharedDb -dbPath ./data"
image: "amazon/dynamodb-local:latest"
container_name: dynamodb-local
ports:
- "8000:8000"
networks:
- localnet
volumes:
- "./docker/dynamodb:/home/dynamodblocal/data"
working_dir: /home/dynamodblocal
connect:
image: confluentinc/cp-kafka-connect-base:latest
build:
context: .
dockerfile: Dockerfile-DynamoDBConnect
hostname: connect
container_name: connect
depends_on:
- zookeeper
- broker
- schema-registry
ports:
- "8083:8083"
networks:
- localnet
environment:
CONNECT_BOOTSTRAP_SERVERS: 'broker:19092'
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter"
CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter"
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_LOG4J_ROOT_LOGLEVEL: "INFO"
CONNECT_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR"
CONNECT_PLUGIN_PATH: "/usr/share/java,/usr/share/confluent-hub-components"
CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
# Assumes image is based on confluentinc/kafka-connect-datagen:latest which is pulling 5.3.0 Connect image
CLASSPATH: "/usr/share/java/monitoring-interceptors/monitoring-interceptors-5.3.0.jar"
CONNECT_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
CONNECT_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
command: "bash -c 'if [ ! -d /usr/share/confluent-hub-components/confluentinc-kafka-connect-datagen ]; then echo \"WARNING: Did not find directory for kafka-connect-datagen (did you remember to run: docker-compose up -d --build ?)\"; fi ; /etc/confluent/docker/run'"
volumes:
- ../build/confluent/kafka-connect-aws-dynamodb:/usr/sahre/confluent-hub-components/confluentinc-kafka-connect-aws-dynamodb
- $HOME/.aws/credentialstest:/home/appuser/.aws/credentials
- $HOME/.aws/configtest:/home/appuser/.aws/config
rest-proxy:
image: confluentinc/cp-kafka-rest:5.3.0
depends_on:
- zookeeper
- broker
- schema-registry
ports:
- "8082:8082"
hostname: rest-proxy
container_name: rest-proxy
networks:
- localnet
environment:
KAFKA_REST_HOST_NAME: rest-proxy
KAFKA_REST_BOOTSTRAP_SERVERS: 'broker:19092'
KAFKA_REST_LISTENERS: "http://0.0.0.0:8082"
KAFKA_REST_SCHEMA_REGISTRY_URL: 'http://schema-registry:8081'
control-center:
image: confluentinc/cp-enterprise-control-center:6.0.0
hostname: control-center
container_name: control-center
networks:
- localnet
depends_on:
- broker
- schema-registry
- connect
- dynamodb-local
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: PLAINTEXT://broker:19092
CONTROL_CENTER_KAFKA_CodeCamp_BOOTSTRAP_SERVERS: PLAINTEXT://broker:19092
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_REPLICATION: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONTROL_CENTER_METRICS_TOPIC_REPLICATION: 1
CONTROL_CENTER_METRICS_TOPIC_PARTITIONS: 1
# Amount of heap to use for internal caches. Increase for better throughput
CONTROL_CENTER_STREAMS_CACHE_MAX_BYTES_BUFFERING: 100000000
CONTROL_CENTER_STREAMS_CONSUMER_REQUEST_TIMEOUT_MS: "960032"
CONTROL_CENTER_STREAMS_NUM_STREAM_THREADS: 1
# HTTP and HTTPS to Control Center UI
CONTROL_CENTER_REST_LISTENERS: http://0.0.0.0:9021
PORT: 9021
# Connect
CONTROL_CENTER_CONNECT_CONNECT1_CLUSTER: http://connect:8083
# Schema Registry
CONTROL_CENTER_SCHEMA_REGISTRY_URL: http://schema-registry:8081
networks:
localnet:
attachable: true
Java Example to publish Messages in Avro Format
public class AvroProducer {
public static void main(String[] args) {
// Variables (bootstrap server, topic name, logger)
final Logger logger = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
final String bootstrapServers = "127.0.0.1:9092";
final String topicName = "stockdata";
// Properties declaration (bootstrap server, key serializer, value serializer
// Note use of the ProducerConfig object
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class.getName());
properties.setProperty(AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
// Create Producer object
// Note the generics
KafkaProducer<String, Stock> producer = new KafkaProducer<>(properties);
Stock stock = Stock.newBuilder()
.setStockCode("APPL")
.setStockName("Apple")
.setStockPrice(150.0)
.build();
// Create ProducerRecord object
ProducerRecord<String, Stock> rec = new ProducerRecord<>(topicName, stock);
// Send data to the producer (optional callback)
producer.send(rec);
// Call producer flush() and/or close()
producer.flush();
producer.close();
}
}
I use Mysql database. Suppose I have a table for orders. And using debezium mysql connect for Kafka, the order topic has been created. But I have trouble creating a stream in ksqldb.
CREATE STREAM orders WITH (
kafka_topic = 'myserver.mydatabase.orders',
value_format = 'avro'
);
my docker-compose file look like this
zookeeper:
image: confluentinc/cp-zookeeper:latest
container_name: zookeeper
privileged: true
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
container_name: kafka
depends_on:
- zookeeper
ports:
- '9092:9092'
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
schema-registry:
image: confluentinc/cp-schema-registry:latest
container_name: schema-registry
depends_on:
- kafka
- zookeeper
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: "zookeeper:2181"
SCHEMA_REGISTRY_HOST_NAME: schema-registry
kafka-connect:
hostname: kafka-connect
image: confluentinc/cp-kafka-connect:latest
container_name: kafka-connect
ports:
- 8083:8083
depends_on:
- schema-registry
environment:
CONNECT_BOOTSTRAP_SERVERS: kafka:9092
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: "quickstart-avro"
CONNECT_CONFIG_STORAGE_TOPIC: "quickstart-avro-config"
CONNECT_OFFSET_STORAGE_TOPIC: "quickstart-avro-offsets"
CONNECT_STATUS_STORAGE_TOPIC: "quickstart-avro-status"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect"
CONNECT_LOG4J_ROOT_LOGLEVEL: DEBUG
CONNECT_PLUGIN_PATH: "/usr/share/java,/etc/kafka-connect/jars"
volumes:
- $PWD/kafka/jars:/etc/kafka-connect/jars
ksqldb-server:
image: confluentinc/ksqldb-server:latest
hostname: ksqldb-server
container_name: ksqldb-server
depends_on:
- kafka
ports:
- "8088:8088"
environment:
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: "kafka:9092"
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
ksqldb-cli:
image: confluentinc/ksqldb-cli:latest
container_name: ksqldb-cli
depends_on:
- kafka
- ksqldb-server
- schema-registry
entrypoint: /bin/sh
tty: true
subject must be created for this table first. What is the difference between the avro, json?
Using debezium mysql connect for Kafka
You can set that to use AvroConverter, then the subject will be created automatically
Otherwise, you can have KSQL use VALUE_FORMAT=JSON and you need to manually specify all the field names. Unclear what difference you're asking about (they are different serialization formats), but from a KSQL perspective, JSON alone is seen as plain-text (similar to DELIMITED) and needs to be parsed, as compared to the other formats like Avro where the schema+fields are already known.
I solved the issue. Using this configuration, you can send mysql table in the topic without the before and next state.
CREATE SOURCE CONNECTOR final_connector WITH (
'connector.class' = 'io.debezium.connector.mysql.MySqlConnector',
'database.hostname' = 'mysql',
'database.port' = '3306',
'database.user' = 'root',
'database.password' = 'mypassword',
'database.allowPublicKeyRetrieval' = 'true',
'database.server.id' = '184055',
'database.server.name' = 'db',
'database.whitelist' = 'mydb',
'database.history.kafka.bootstrap.servers' = 'kafka:9092',
'database.history.kafka.topic' = 'mydb',
'table.whitelist' = 'mydb.user',
'include.schema.changes' = 'false',
'transforms'= 'unwrap,extractkey',
'transforms.unwrap.type'= 'io.debezium.transforms.ExtractNewRecordState',
'transforms.extractkey.type'= 'org.apache.kafka.connect.transforms.ExtractField$Key',
'transforms.extractkey.field'= 'id',
'key.converter'= 'org.apache.kafka.connect.converters.IntegerConverter',
'value.converter'= 'io.confluent.connect.avro.AvroConverter',
'value.converter.schema.registry.url'= 'http://schema-registry:8081'
);
and create your stream simply !
This video can help you a lot
https://www.youtube.com/watch?v=2fUOi9wJPhk&t=1550s
I have setup a Single broker instance of Kafka along with Zookeeper, Kafka-tools,Schema-registry and control-center.The setup is done using docker compose and using the Confluent provided images.Here is how the docker-compose looks like:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 2181:2181
broker:
image: confluentinc/cp-server:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: broker:29092
CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:2181
CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
CONFLUENT_METRICS_ENABLE: "true"
CONFLUENT_SUPPORT_CUSTOMER_ID: "anonymous"
kafka-tools:
image: confluentinc/cp-kafka:latest
hostname: kafka-tools
container_name: kafka-tools
command: ["tail", "-f", "/dev/null"]
network_mode: "host"
schema-registry:
image: confluentinc/cp-schema-registry:5.5.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: "zookeeper:2181"
control-center:
image: confluentinc/cp-enterprise-control-center:latest
hostname: control-center
container_name: control-center
depends_on:
- zookeeper
- broker
- schema-registry
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:29092'
CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
CONTROL_CENTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081"
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONFLUENT_METRICS_TOPIC_REPLICATION: 1
PORT: 9021
Trying to the confluent-kafka python client to create the Producer and Consumer application.The following is the code of my Kafka producer:
from datetime import datetime
import os
import json
from uuid import uuid4
from confluent_kafka import SerializingProducer
from confluent_kafka.serialization import StringSerializer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.json_schema import JSONSerializer
class BotExecutionProducer(object):
"""
Class represents the
Bot execution Stats
"""
def __init__(self,ticketId,accountId,executionTime,status):
self.ticketId = ticketId
self.accountId = accountId
self.executionTime = executionTime
self.timestamp = str(datetime.now())
self.status = status
def botexecution_to_dict(self,botexecution,ctx):
"""
Returns a Dict representation of the
KafkaBotExecution instance
botexecution : KafkaBotExecution instance
ctx: SerializaionContext
"""
return dict(ticketId=self.ticketId,
accountId=self.accountId,
executionTime=self.executionTime,
timestamp=self.timestamp,
status=self.status
)
def delivery_report(self,err, msg):
"""
Reports the failure or success of a message delivery.
"""
if err is not None:
print("Delivery failed for User record {}: {}".format(msg.key(), err))
return
print('User record {} successfully produced to {} [{}] at offset {}'.format(
msg.key(), msg.topic(), msg.partition(), msg.offset()))
def send(self):
"""
Will connect to Kafka Broker
validate and send the message
"""
topic = "bots.execution"
schema_str = """
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "BotExecutions",
"description": "BotExecution Stats",
"type": "object",
"properties": {
"ticketId": {
"description": "Ticket ID",
"type": "string"
},
"accountId": {
"description": "Customer's AccountID",
"type": "string"
},
"executionTime": {
"description": "Bot Execution time in seconds",
"type": "number"
},
"timestamp": {
"description": "Timestamp",
"type": "string"
},
"status": {
"description": "Execution Status",
"type": "string"
}
},
"required": [ "ticketId", "accountId", "executionTime", "timestamp", "status"]
}
"""
schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
json_serializer = JSONSerializer(schema_str,schema_registry_client,self.botexecution_to_dict)
producer_conf = {
'bootstrap.servers': "localhost:9092",
'key.serializer': StringSerializer('utf_8'),
'value.serializer': json_serializer,
'acks': 0,
}
producer = SerializingProducer(producer_conf)
print(f'Producing records to topic {topic}')
producer.poll(0.0)
try:
print(self)
producer.produce(topic=topic,key=str(uuid4()),partition=1,
value=self,on_delivery=self.delivery_report)
except ValueError:
print("Invalid Input,discarding record.....")
Now when I execute the code it should create a Kafka topic and push the JSON data to that topic, but that does not seem to work it keeps showing an error that a replication factor of 3 has been specified when there is only one broker.is there a way to define the replication factor in the above code.The kafka borker works perfectly when do the same using the Kafka cli.
Is there something that I am missing?
I'm not sure the full difference between cp-server and cp-kafka images, but you can add a variable for the default replication factor of automatically created topics
KAFKA_DEFAULT_REPLICATION_FACTOR: 1
If that doesn't work, import and use AdminClient