How to setup local Kafka to validate schema? - apache-kafka

I read that schema validation is available in Kafka Confluent Server only. But maybe since then someone has found a solution? Does anyone know how can I test message validity running Kafka locally?

how to enforce broker to validate producers' input
Kafka doesn't do this. Confluent Server can, yes, but it is enterprise licensed. There is no alternative to server-side validation without forking Kafka like Confluent did. Otherwise, you would need to write your own Serializer class, but that won't scale to every producer client you may use.
You can still use Docker as the other answer shows, and you can use confluentinc/cp-server image, then you simply add an environment variable to enable schema valdiation.
https://docs.confluent.io/platform/current/schema-registry/schema-validation.html
Otherwise, simply using JsonSchemaSerializer, for example, will validate each record does adhere to a schema (client-side validation), but it won't stop anyone else from sending garbage data into the topic (server-side validation).

If you know a little bite docker-compose, you can deploy a Kafka ecosystem for testing purposes.
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.1.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-kafka:7.1.0
hostname: broker
container_name: broker
depends_on:
- zookeeper
ports:
- "29092:29092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
schema-registry:
image: confluentinc/cp-schema-registry:7.1.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- broker
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:9092'
I got the docker-compose file from https://github.com/confluentinc/kafka-tutorials/blob/master/_includes/tutorials/aggregating-sum/ksql/code/docker-compose.yml, but for your testing, it is not needed the KSQL components and you can delete those.

Related

ksqldb not failing over to standby schema registry

I am trying to test failover scenario for kafka schema registry.
I spanned up two Schema registry docker containers(Primary and standby) and I have a KSQLDB server running in a docker container pointing to primary schema registry. The source kafka connecter is streaming the data from the database to kafka topics. The ksqlDB server is able to validate the schema of the kafka message using primary schema registry. Now I shutdown the primary schema registry. The ksqldb server is not failing over to the stand by schema registry to validate the schema, causing ksqldb server not receiving the data from kafka topics.
How should ksqldb server should know what is the standby schema-registry that it need to connect to when primary is down.
Below is docker-compose.yml file that I have used
schema-registry:
image: confluentinc/cp-schema-registry:${CP_VERSION}
depends_on:
- zookeeper
- kafka
ports:
- "8081:8081"
container_name: schema-registry
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_ORIGIN: '*'
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_METHODS: 'GET,POST,PUT,OPTIONS'
SCHEMA_REGISTRY_LEADER_ELIGIBILITY : "true"
SCHEMA_REGISTRY_GROUP_ID : "schema-registry-group"
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
schema-registry-2:
image: confluentinc/cp-schema-registry:${CP_VERSION}
depends_on:
- kafka
- schema-registry
ports:
- "8082:8082"
container_name: schema-registry-2
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry-2
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_ORIGIN: '*'
SCHEMA_REGISTRY_ACCESS_CONTROL_ALLOW_METHODS: 'GET,POST,PUT,OPTIONS'
SCHEMA_REGISTRY_LEADER_ELIGIBILITY : "true"
SCHEMA_REGISTRY_GROUP_ID : "schema-registry-group"
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8082
primary-ksqldb-server:
image: ${KSQL_IMAGE_BASE}confluentinc/ksqldb-server:${KSQL_VERSION}
hostname: primary-ksqldb-server
container_name: primary-ksqldb-server
depends_on:
- kafka
- schema-registry
ports:
- "8088:8088"
environment:
KSQL_CONFIG_DIR: "/etc/ksql"
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: kafka:9092
KSQL_KSQL_ADVERTISED_LISTENER : http://localhost:8088
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
KSQL_KSQL_EXTENSION_DIR: "/usr/ksqldb/ext/"
KSQL_KSQL_SERVICE_ID: "nrt_"
KSQL_KSQL_STREAMS_NUM_STANDBY_REPLICAS: 1
KSQL_KSQL_QUERY_PULL_ENABLE_STANDBY_READS: "true"
KSQL_KSQL_HEARTBEAT_ENABLE: "true"
KSQL_KSQL_LAG_REPORTING_ENABLE : "true"
KSQL_KSQL_QUERY_PULL_MAX_ALLOWED_OFFSET_LAG : 100
KSQL_LOG4J_APPENDER_KAFKA_APPENDER: "org.apache.kafka.log4jappender.KafkaLog4jAppender"
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_LAYOUT: "io.confluent.common.logging.log4j.StructuredJsonLayout"
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_BROKERLIST: localhost:9092
KSQL_LOG4J_APPENDER_KAFKA_APPENDER_TOPIC: KSQL_LOG
KSQL_LOG4J_LOGGER_IO_CONFLUENT_KSQL: INFO,kafka_appender
KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true"
KSQL_JMX_OPTS: >
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.rmi.port=1099
When I stop primary schema registry, ksqldb is supposed to connect to standy schema registry
How would it know the other is available if you don't provide it?
KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081,http://schema-registry-2:8082
In other words, you shut down schema-registry container, so it will simply not respond. It will not forward requests or update the clients to talk to another server... So, you need to provide a URL-list, or you need to setup an external reverse proxy to round-robin the requests to the active instance.

Unable to connect to Kafka(in docker) from Internet

I am attempting to make it so I can connect to Kafka from over the internet. I have opened Ports: 9092 and 2181 on my router.
I have had no luck at all! I am using OffsetExplorer and I am able to ping kafka from another network. The IP of the system that is is running on is: 10.0.1.104
I AM able to connect to kafka on the local network from another computer though.
Here is my Kafka Docker-Compose:
version: '3.7'
services:
zookeeper:
image: wurstmeister/zookeeper:3.4.6
environment:
JVMFLAGS: "-Djava.security.auth.login.config=/etc/zookeeper/zookeeper_jaas.conf"
volumes:
- ./zookeeper_jaas.conf:/etc/zookeeper/zookeeper_jaas.conf
ports:
- 2181:2181
kafka:
image: wurstmeister/kafka:2.13-2.8.1
depends_on:
- zookeeper
ports:
- 9092:9092
- 29092:29092
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: INTERNAL://:9093,EXTERNAL://:9092
KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9093,EXTERNAL://10.0.1.104:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:SASL_PLAINTEXT,EXTERNAL:SASL_PLAINTEXT
ALLOW_PLAINTEXT_LISTENER: 'yes'
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/kafka_jaas.conf"
volumes:
- ./kafka_server_jaas.conf:/etc/kafka/kafka_jaas.conf
Connection attempts result in this on Kafka's output
Thank you very much!

Kafka Consumer is not receiving Messages on docker

I'm a begginer on kafka as well as docker, I have been doing a course and working with kafka producer and consumer but for some reason it is not working.
When I do use of the producer the message are saved in the topic (I have already checked it) but when I try to get the message using the consumer it is not working and I have no idea why.
It had worked previously but not anymore.
The unique difference I have in this case is that I'm using the confluentinc image instead of the bitnami image.
So, if anyone has any idea or solution I would really appreciate it.
I share my compose and an screenshot so you can see it.
version: "3.2"
services:
###############################################################
zookeeper:
image: 'confluentinc/cp-zookeeper:latest'
container_name: zookeeper
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 2181:2181
###############################################################
broker:
image: 'confluentinc/cp-kafka:latest'
container_name: broker
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
# Exposes 9092 for external connections to the broker
# Use kafka:29092 for connections internal on the docker network
# See https://rmoff.net/2018/08/02/kafka-listeners-explained/ for details
KAFKA_LISTENERS: "PLAINTEXT://:29092,PLAINTEXT_HOST://:9092"
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_DELETE_TOPIC_ENABLE: "true"
Producer and Consumer
All is running in my local machine.
Take a look at docker-compose logs broker...
You should see a lot of Error processing create topic request CreatableTopic(name='__consumer_offsets', numPartitions=50, replicationFactor=3
Without a valid __consumer_offsets topic, no consumer will be able to run and commit offsets. Similarly, transactions won't work either (which are enabled by default in latest Kafka)
Add these variables and re-create the containers
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1

Failed to collect cluster Default info java.lang.IllegalStateException: Error while creating AdminClient for Cluster Default

I would like to use network_mode: bridge for kafka for being able to reach kafka through localhost:9092 from another service
I'm trying to use the provectus/kafka-ui but when I open the consumers menu I get the following error
my docker-compose.yml file :
kafka-ui:
container_name: kafka-ui
image: provectuslabs/kafka-ui:latest
ports:
- 8080:8080
depends_on:
- kafka
environment:
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
KAFKA_CLUSTERS_0_JMXPORT: 9997
kafka:
image: johnnypark/kafka-zookeeper
ports:
- "2181:2181"
- "9092:9092"
network_mode: bridge
environment:
ADVERTISED_HOST: 127.0.0.1
NUM_PARTITIONS: 1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
log error:
2022-01-13 09:16:50,014 ERROR [parallel-5] c.p.k.u.s.MetricsService: Failed to collect cluster Default info
java.lang.IllegalStateException: Error while creating AdminClient for Cluster Default
provectus/kafka-ui
I was using the johnnypark/kafka-zookeeper library for both kafka and zookeeper. I was able to solve this problem by using two separate libraries as in the example below
zookeeper1:
image: confluentinc/cp-zookeeper:5.2.4
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka1:
image: confluentinc/cp-kafka:5.3.1
depends_on:
- zookeeper1
ports:
- 9093:9093
- 9998:9998
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper1:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:29092,PLAINTEXT_HOST://localhost:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
JMX_PORT: 9998
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka1 -Dcom.sun.management.jmxremote.rmi.port=9998
being able to reach kafka through localhost:9092 from another service
You can't use localhost to reach Kafka since that would be the Kafka UI container itself.
Changing ADVERTISED_HOST to kafka and using kafka:9092 from other containers is correct for a bridge network. However, this have the side effect of preventing any access to Kafka outside the Docker network, such as clients directly on the host machine.
Internal and External clients can be configured separately. bitnami/bitnami-docker-kafka
Here's an example using Bitnami's Kafka Image - this allows host clients to connect on port 9093 while allowing kafka-ui to connect with the default port.
version: "3"
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
- '9093:9093'
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS=CLIENT://:9092,EXTERNAL://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=CLIENT://kafka:9092,EXTERNAL://localhost:9093
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
kafka-ui:
image: provectuslabs/kafka-ui
container_name: kafka-ui
ports:
- "8081:8081"
restart: always
environment:
- KAFKA_CLUSTERS_0_NAME=local
- KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092
- SERVER_PORT=8081

Zipkin Docker not consuming kafka "zipkin" topic

Currently following the https://github.com/openzipkin/zipkin quickstart for the docker startup of zipkin and kafka.
I set up a docker-compose that starts several services: kafka, zookeeper, neo4j and zipkin.
I gave zipkin through the "KAFKA.BOOTSTRAP.SERVERS: broker:19092" the same address as the functioning neo4j.
zipkin:
image: ghcr.io/openzipkin/zipkin:${TAG:-latest}
container_name: zipkin
depends_on:
- broker
environment:
KAFKA.BOOTSTRAP.SERVERS: broker:19092
ports:
# Port used for the Zipkin UI and HTTP Api
- 9411:9411
my Kafka docker-compose part looks like this:
broker:
image: confluentinc/cp-kafka:latest
hostname: broker
container_name: broker
ports:
- "9092:9092"
- "9101:9101"
depends_on:
- zookeeper
environment:
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://broker:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_BROKER_ID: 1
KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO"
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
The topic "zipkin" gets pushed to kafka, since I can consume it with my spring application.
The zipkin settings in my application.properties of the Spring implementation looks like this.
spring.zipkin.baseUrl= localhost:9411
spring.zipkin.sender.type= kafka
I couldn't find anything about how I can actually debug zipkin itself. It starts up without errors in my docker-compose.
I actually found it myself. In case anyone encounters this too.
KAFKA.BOOTSTRAP.SERVERS: broker:19092
is not a valid environment variable. it needs to be in the style of this:
KAFKA_BOOTSTRAP_SERVERS: broker:19092