Client app tries to producing topic on kafka but stucks and, don't return either any error message neither 200 OK - kubernetes

We deploy kafka and zookeeper pods on a kubernetes cluster. These two are connected to each other properly. But when we want produce a topic through a client app the PUT request stuck in pending and after a lot of time no message returned! How can I debug this situation?
The .yaml files for kafka and zookeeper and client app is like below:
kafka.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: kafka
spec:
containers:
- env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: kafka
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_CREATE_TOPICS
value: newsrawdata:1:1
- name: KAFKA_ZOOKEEPER_CONNECT
value: 192.168.88.42:30573
- name: KAFKA_PORT
value: "9092"
- name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
value: "1000"
image: wurstmeister/kafka
name: kafka
ports:
- containerPort: 9092
- containerPort: 9094
resources: {}
volumeMounts:
- mountPath: /var/run/docker.sock
name: kafka-claim0
hostname: kafka
restartPolicy: Always
volumes:
- name: kafka-claim0
persistentVolumeClaim:
claimName: kafka-claim0
status: {}
zookeeper.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: zookeeper
spec:
containers:
- env:
- name: ALLOW_ANONYMOUS_LOGIN
value: "yes"
image: wurstmeister/zookeeper
name: zookeeper
ports:
- containerPort: 2181
resources: {}
hostname: zookeeper
restartPolicy: Always
status: {}
app.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
service: broker-service
name: broker-app
spec:
replicas: 1
template:
metadata:
labels:
service: broker-service
spec:
imagePullSecrets:
- name: pullsecret
containers:
- env:
- name: OHH_COMMON_REDEPLOY
value: THIS_WILL_BE_REPLACED
- name: ASPNETCORE_ENVIRONMENT
value: docker
image: localgitlabregistry/broker.app:v0.01
name: broker-app
imagePullPolicy: "Always"
ports:
- containerPort: 80
- containerPort: 443
nodeSelector:
role: slave1
restartPolicy: Always
And the services are like below:
kafka-service.yaml:
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
ports:
- name: "9092"
port: 9092
targetPort: 9092
- name: "9094"
port: 9094
targetPort: 9094
clusterIP: None
# type: NodePort
selector:
io.kompose.service: kafka
status:
loadBalancer: {}
zookeeper-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
io.kompose.service: zookeeper
status:
loadBalancer: {}
app-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
service: broker-service
name: broker-service
spec:
ports:
- name: "57270"
port: 80
targetPort: 80
- name: "44348"
port: 443
targetPort: 443
selector:
service: broker-service
type:
NodePort
The log from kafka pod is like below:
waiting for kafka to be ready
[Configuring] 'advertised.port' in '/opt/kafka/config/server.properties'
Excluding KAFKA_HOME from broker config
[Configuring] 'advertised.host.name' in '/opt/kafka/config/server.properties'
[Configuring] 'port' in '/opt/kafka/config/server.properties'
[Configuring] 'broker.id' in '/opt/kafka/config/server.properties'
Excluding KAFKA_VERSION from broker config
[Configuring] 'zookeeper.connect' in '/opt/kafka/config/server.properties'
[Configuring] 'log.dirs' in '/opt/kafka/config/server.properties'
[Configuring] 'zookeeper.connect.timeout.ms' in '/opt/kafka/config/server.properties'
[2019-09-29 08:06:56,783] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-09-29 08:06:57,767] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2019-09-29 08:06:57,768] INFO starting (kafka.server.KafkaServer)
[2019-09-29 08:06:57,769] INFO Connecting to zookeeper on 192.168.88.42:30573 (kafka.server.KafkaServer)
[2019-09-29 08:06:57,796] INFO [ZooKeeperClient Kafka server] Initializing a new session to
.
.
.
[2019-09-29 08:06:57,804] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.version=4.4.0-116-generic (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,806] INFO Initiating client connection, connectString=192.168.88.42:30573 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$#2667f029 (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,822] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-09-29 08:06:57,847] INFO Opening socket connection to server 192.168.88.42/192.168.88.42:30573. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,865] INFO Socket connection established to 192.168.88.42/192.168.88.42:30573, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,880] INFO Session establishment complete on server 192.168.88.42/192.168.88.42:30573, sessionid = 0x10005366a620042, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,886] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2019-09-29 08:06:58,448] INFO Cluster ID = b8bTvrC2T6iidAcNqD482A (kafka.server.KafkaServer)
[2019-09-29 08:06:58,455] WARN No meta.properties file under dir /kafka/kafka-logs-kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2019-09-29 08:06:58,632] INFO KafkaConfig values:
advertised.host.name = kafka
advertised.listeners = null
advertised.port = 9092
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name =
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
.
.
.
zookeeper.connect = 192.168.88.42:30573
zookeeper.connection.timeout.ms = 6000
zookeeper.max.in.flight.requests = 10
zookeeper.session.timeout.ms = 6000
zookeeper.set.acl = false
zookeeper.sync.time.ms = 2000
(kafka.server.KafkaConfig)
[2019-09-29 08:06:58,659] INFO KafkaConfig values:
advertised.host.name = kafka
advertised.listeners = null
advertised.port = 9092
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name =
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
kafka.metrics.reporters = []
leader.imbalance.check.interval.seconds = 300
leader.imbalance.per.broker.percentage = 10
listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners = null
log.cleaner.backoff.ms = 15000
log.cleaner.dedupe.buffer.size = 134217728
log.cleaner.delete.retention.ms = 86400000
log.cleaner.enable = true
log.cleaner.io.buffer.load.factor = 0.9
log.cleaner.io.buffer.size = 524288
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
log.cleaner.max.compaction.lag.ms = 9223372036854775807
log.cleaner.min.cleanable.ratio = 0.5
log.cleaner.min.compaction.lag.ms = 0
log.cleaner.threads = 1
log.cleanup.policy = [delete]
log.dir = /tmp/kafka-logs
.
.
.
unclean.leader.election.enable = false
zookeeper.connect = 192.168.88.42:30573
zookeeper.connection.timeout.ms = 6000
zookeeper.max.in.flight.requests = 10
zookeeper.session.timeout.ms = 6000
zookeeper.set.acl = false
zookeeper.sync.time.ms = 2000
(kafka.server.KafkaConfig)
[2019-09-29 08:06:58,721] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,722] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,724] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,797] INFO Log directory /kafka/kafka-logs-kafka not found, creating it. (kafka.log.LogManager)
[2019-09-29 08:06:58,814] INFO Loading logs. (kafka.log.LogManager)
[2019-09-29 08:06:58,834] INFO Logs loading complete in 20 ms. (kafka.log.LogManager)
[2019-09-29 08:06:58,869] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2019-09-29 08:06:58,877] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2019-09-29 08:06:59,505] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2019-09-29 08:06:59,549] INFO [SocketServer brokerId=1033] Created data-plane acceptor and processors for endpoint : EndPoint(null,9092,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.network.SocketServer)
[2019-09-29 08:06:59,550] INFO [SocketServer brokerId=1033] Started 1 acceptor threads for data-plane (kafka.network.SocketServer)
[2019-09-29 08:06:59,587] INFO [ExpirationReaper-1033-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,590] INFO [ExpirationReaper-1033-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,590] INFO [ExpirationReaper-1033-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,600] INFO [ExpirationReaper-1033-ElectPreferredLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,614] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2019-09-29 08:06:59,716] INFO Creating /brokers/ids/1033 (is it secure? false) (kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,743] INFO Stat of the created znode at /brokers/ids/1033 is: 776,776,1569744419734,1569744419734,1,0,0,72063325309108290,180,0,776
(kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,745] INFO Registered broker 1033 at path /brokers/ids/1033 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 776 (kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,748] WARN No meta.properties file under dir /kafka/kafka-logs-kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2019-09-29 08:06:59,882] INFO [ExpirationReaper-1033-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,888] INFO [ExpirationReaper-1033-Heartbeat]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,895] INFO [ExpirationReaper-1033-Rebalance]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,940] INFO [GroupCoordinator 1033]: Starting up. (kafka.coordinator.group.GroupCoordinator)
[2019-09-29 08:06:59,949] INFO [GroupCoordinator 1033]: Startup complete. (kafka.coordinator.group.GroupCoordinator)
[2019-09-29 08:06:59,961] INFO [GroupMetadataManager brokerId=1033] Removed 0 expired offsets in 17 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-09-29 08:06:59,990] INFO [ProducerId Manager 1033]: Acquired new producerId block (brokerId:1033,blockStartProducerId:21000,blockEndProducerId:21999) by writing to Zk with path version 22 (kafka.coordinator.transaction.ProducerIdManager)
[2019-09-29 08:07:00,044] INFO [TransactionCoordinator id=1033] Starting up. (kafka.coordinator.transaction.TransactionCoordinator)
[2019-09-29 08:07:00,056] INFO [Transaction Marker Channel Manager 1033]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2019-09-29 08:07:00,061] INFO [TransactionCoordinator id=1033] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2019-09-29 08:07:00,207] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-09-29 08:07:00,289] INFO [SocketServer brokerId=1033] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2019-09-29 08:07:00,326] INFO Kafka version: 2.3.0 (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,326] INFO Kafka commitId: fc1aaa116b661c8a (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,326] INFO Kafka startTimeMs: 1569744420299 (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,341] INFO [KafkaServer id=1033] started (kafka.server.KafkaServer)
creating topics: newsrawdata:1:1
The log from zookeeper pod:
2019-09-29 08:06:58,003 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x10005366a620042 type:create cxid:0xd zxid:0x306 txntype:-1 reqpath:n/a Error Path:/config/brokers Error:KeeperErrorCode = NodeExists for /config/brokers
2019-09-29 08:07:00,421 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#596] - Got user-level KeeperException when processing sessionid:0x10005366a620042 type:multi cxid:0x3f zxid:0x30d txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election
2019-09-29 08:07:07,512 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /10.44.0.0:39244
2019-09-29 08:07:07,519 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /10.44.0.0:39244
2019-09-29 08:07:07,521 [myid:] - INFO [SyncThread:0:ZooKeeperServer#694] - Established session 0x10005366a620043 with negotiated timeout 30000 for client /10.44.0.0:39244
2019-09-29 08:07:08,034 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#487] - Processed session termination for sessionid: 0x10005366a620043
2019-09-29 08:07:08,045 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /10.44.0.0:39244 which had sessionid 0x10005366a620043
2019-09-29 08:07:13,180 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#138] - Purge task started.
2019-09-29 08:07:13,181 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#144] - Purge task completed.
2019-09-29 09:07:13,180 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#138] - Purge task started.
2019-09-29 09:07:13,182 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#144] - Purge task completed.
log from client app:
Kafka Ip Server:kafka:9092
warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
No XML encryptor configured. Key {16b9a9aa-732a-47ab-bd31-ce341be7f812} may be persisted to storage in unencrypted form.
Hosting environment: docker
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
We set the "BootstrapServers" to "kafka:9092" in the client app. Seems that client cat resolve the kafka in the cluster and see the IP of the kafka pod but no event occurs when we send PUT request. It worth noting that by using this config out of the kubernetes cluster with docker-compose it works as expected! what is wrong with this configuration?

Then make sure that your nodes have proper seletector: node for deployment broker has to have role:slave1 seletector. Otherwise just delete lines with nodeSelector from broker deployment file.
Then add lines to spec of your deployments configuration file:
selector:
matchLabels:
io.kompose.service: kafka
this one is for kafka.yaml
You don't have to label your services, specifying selectors is enough, so delete label field from services configuration files.
Then in kafka deployment configuration file chane lines :
name: KAFKA_ZOOKEEPER_CONNECT
#value: 192.168.88.42:30573
value: your_zookeeper_service_ip:2181
Line value should include ip of your zookeeper service and port 2181, if your zookeeper service have ip 192.168.88.42 value is proper.

Actually, the problem is related to the the kafka configuration (KAFKA_ADVERTISE_LISTENER, KAFKA_INTER_BROKER_LISTENER_NAME, KAFKA_LISTENERS). Anyway the below deployemnt.yaml config is working properly for me now, without any chang to the service files.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
io.kompose.service: kafka
spec:
containers:
- env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: kafka
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_LISTENERS
value: INSIDE://:9092
- name: KAFKA_CREATE_TOPICS
value: newsrawdata:1:1
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: INSIDE
- name: KAFKA_LISTENERS
value: INSIDE://:9092
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: INSIDE:PLAINTEXT
- name: KAFKA_ZOOKEEPER_CONNECT
value: 192.168.88.207:30573
- name: KAFKA_PORT
value: "9092"
- name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
value: "1000"
image: wurstmeister/kafka
name: kafka
ports:
- containerPort: 9092
- containerPort: 9094
hostname: kafka
restartPolicy: Always
The kafka instance currently connect to the zookeeper and client apps can produce and consume properly to the kafka.

Related

ksql server failed to start with error - not a valid KSQL server

Installed the standalone ksqlDB in my local computer.
While running the ksql server, i am getting the following error
Remote server at http://ksqldb-server:8088 does not appear to be a valid KSQL
server. Please ensure that the URL provided is for an active KSQL server.
The server responded with the following error:
Error issuing GET to KSQL server. path:/info
Caused by: java.net.UnknownHostException: failed to resolve 'ksqldb-server'
after 2 queries
Caused by: failed to resolve 'ksqldb-server' after 2 queries
PS: my docker-compose.yml looks like as follows
---
version: '2'
services:
ksqldb-server:
image: confluentinc/ksqldb-server:0.23.1
hostname: ksqldb-server
container_name: ksqldb-server
ports:
- "8088:8088"
environment:
KSQL_LISTENERS: http://0.0.0.0:8088
KSQL_BOOTSTRAP_SERVERS: localhost:9092
KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true"
KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true"
ksqldb-cli:
image: confluentinc/ksqldb-cli:0.23.1
container_name: ksqldb-cli
depends_on:
- ksqldb-server
entrypoint: /bin/sh
tty: true
While running docker-compose up, it is giving following error
[2022-01-11 07:21:54,280] INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager:235)
ksqldb-server | org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: fetchMetadata
ksqldb-server | [2022-01-11 07:21:54,282] ERROR Failed to start KSQL (io.confluent.ksql.rest.server.KsqlServerMain:68)
ksqldb-server | java.lang.RuntimeException: Failed to get Kafka cluster information
ksqldb-server | at io.confluent.ksql.services.KafkaClusterUtil.getKafkaClusterId(KafkaClusterUtil.java:107)
ksqldb-server | at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:669)
ksqldb-server | at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:154)
ksqldb-server | at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:61)
ksqldb-server | Caused by: java.util.concurrent.TimeoutException
ksqldb-server | at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
ksqldb-server | at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
ksqldb-server | at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
ksqldb-server | at io.confluent.ksql.services.KafkaClusterUtil.getKafkaClusterId(KafkaClusterUtil.java:105)
ksqldb-server | ... 3 more
ksqldb-server exited with code 255
^CGracefully stopping... (press Ctrl+C again to force)
Stopping ksqldb-cli ... done
in server.properties file, i added
listeners=PLAINTEXT://localhost:9092
and restarted kafka but it is not solved the problem.
kafka consumer and producer are working fine. the problem is only with kSQL
Any help is appreciated
There are two things that are wrong in this config:
You need to specify in the ksqldb-server yaml the name of the service which you have used to expose the kafka-cluster, if you used strimzi it usually ends up with "kafka-bootstrap"
The second error is that you also need to expose the ksqldb-server so that the ksql-cli can connect to that using the ENV variable KSQL_BOOTSTRAP_SERVERS.
If you want there is a template ksql-db-service from the helm chart you can use:
apiVersion: v1
kind: Service
metadata:
name: my-ksql-db-service
namespace: default
spec:
ports:
- name: http
port: 8088
protocol: TCP
targetPort: ksqldb-server
selector:
app.kubernetes.io/name: ksqldb-server
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
You need to do a similar thing for kafka if you haven't already but exposing port 9092.
However this process might be error-prone since you have to write manually a lot of yamls. There is no need to reinvent the wheel. Use helm charts and operators!
Install kafka using strimzi
Install ksql using Helm chart
You can use this job template to execute the query:
apiVersion: batch/v1
kind: Job
metadata:
namespace: default
name: first-query
spec:
template:
spec:
containers:
- name: ksqldb-query
imagePullPolicy: IfNotPresent
image: confluentinc/cp-ksqldb-cli
command: ['/bin/bash']
args:
- -c
- >-
ksql --execute "select * from your_source emit changes" --config-file ksql-server.properties <KSQL_BOOTSTRAP_SERVERS>
envFrom:
- configMapRef:
name: my-ksql-db-4e876644-ksqldb
restartPolicy: 'Never'
You can template better the yaml-job but did not want to complicate things
Let me know if something is not clear

Kubernetes : WARN datanode.DataNode: Problem connecting to server: namenode:9001

I am a novice to Kubernetes and I currently trying to deploy hadoop in kubernetes. I follow that docker-compose to deploy h https://github.com/big-data-europe/docker-hadoop/blob/master/docker-compose-v3.yml Below there is my yaml for the namenode :
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-namenode
labels:
traefik.docker.network: hbase-namenode
traefik.port: "9870"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
template:
metadata:
labels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-namenode
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9870
env:
- name: CLUSTER_NAME
value: test
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: data-namenode
mountPath: /hadoop/dfs/name
volumes:
- name: data-namenode
persistentVolumeClaim:
claimName: namenode-pvc
I create the pv and pvc for my pod and I attach my service on the pod.
apiVersion: v1
kind: Service
metadata:
name: service-namenode
namespace: hdfs
spec:
selector:
traefik.docker.network: hbase-namenode
type: NodePort
ports:
- name: namenode
protocol: TCP
port: 9870
targetPort: 9870
When I deploy my pod hadoop-namenode, he was up and running correctly on my cluster. I had those logs :
2021-12-14 15:43:25,893 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-12-14 15:43:26,044 INFO namenode.NameNode: createNameNode []
2021-12-14 15:43:26,202 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-12-14 15:43:26,367 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-12-14 15:43:26,368 INFO impl.MetricsSystemImpl: NameNode metrics system started
2021-12-14 15:43:26,422 INFO namenode.NameNodeUtils: fs.defaultFS is hdfs://namenode:9001
2021-12-14 15:43:26,423 INFO namenode.NameNode: Clients should use namenode:9001 to access this namenode/service.
2021-12-14 15:43:26,634 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-14 15:43:26,676 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://0.0.0.0:9870
And finally when i setup my datanode and service using this file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-datanode
labels:
traefik.docker.network: hbase-datanode
traefik.port: "9864"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
template:
metadata:
labels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-datanode
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9864
env:
- name: SERVICE_PRECONDITION
value: "10.47.0.1:9870"
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: datanode
mountPath: /hadoop/dfs/data
volumes:
- name: datanode
persistentVolumeClaim:
claimName: datanode-pvc
All my pod up and running but I got this error on my datanode.
kubectl get pod -n hdfs
NAME READY STATUS RESTARTS AGE
hadoop-datanode-7f9bdb4f54-4hh6t 1/1 Running 0 2d21h
hadoop-namenode-6bddbc7b6-h8bqk 1/1 Running 0 2d22h
2021-12-17 14:46:57,697 INFO server.AbstractConnector: Started ServerConnector#4b5c304e{HTTP/1.1,[http/1.1]}{localhost:41327}
2021-12-17 14:46:57,698 INFO server.Server: Started #2486ms
2021-12-17 14:46:57,943 INFO web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864
2021-12-17 14:46:57,952 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-17 14:46:57,958 INFO datanode.DataNode: dnUserName = root
2021-12-17 14:46:57,958 INFO datanode.DataNode: supergroup = supergroup
2021-12-17 14:46:58,026 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-12-17 14:46:58,043 INFO ipc.Server: Starting Socket Reader #1 for port 9867
2021-12-17 14:46:58,310 INFO datanode.DataNode: Opened IPC server at /0.0.0.0:9867
2021-12-17 14:46:58,332 INFO datanode.DataNode: Refresh request received for nameservices: null
2021-12-17 14:46:58,359 WARN hdfs.DFSUtilClient: Namenode for null remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2021-12-17 14:46:58,363 INFO datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2021-12-17 14:46:58,389 INFO datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to namenode:9001 starting to offer service
2021-12-17 14:46:58,408 INFO ipc.Server: IPC Server Responder: starting
2021-12-17 14:46:58,409 INFO ipc.Server: IPC Server listener on 9867: starting
2021-12-17 14:46:58,480 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:03,481 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:08,483 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:13,484 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:18,486 WARN datanode.DataNode: Problem connecting to server: namenode:9001
I use secret ressource to setup my environment variable for datanode and namenode which give that result for my datanode at the start :
Configuring core
- Setting hadoop.proxyuser.hue.hosts=*
- Setting fs.defaultFS=hdfs://namenode:9001
- Setting hadoop.http.staticuser.user=root
- Setting io.compression.codecs=org.apache.hadoop.io.compress.SnappyCodec
- Setting hadoop.proxyuser.hue.groups=*
Configuring hdfs
- Setting dfs.datanode.data.dir=file:///hadoop/dfs/data
- Setting dfs.namenode.datanode.registration.ip-hostname-check=false
- Setting dfs.webhdfs.enabled=true
- Setting dfs.permissions.enabled=false
Configuring yarn
- Setting yarn.timeline-service.enabled=true
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-vcores=4
- Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
- Setting yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
- Setting yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=98.5
- Setting yarn.log.server.url=http://historyserver:8188/applicationhistory/logs/
- Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
- Setting yarn.timeline-service.generic-application-history.enabled=true
- Setting yarn.log-aggregation-enable=true
- Setting yarn.resourcemanager.hostname=resourcemanager
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-mb=8192
- Setting yarn.nodemanager.aux-services=mapreduce_shuffle
- Setting yarn.resourcemanager.resource_tracker.address=resourcemanager:8031
- Setting yarn.timeline-service.hostname=historyserver
- Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
- Setting yarn.resourcemanager.address=resourcemanager:8032
- Setting mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec
- Setting yarn.nodemanager.remote-app-log-dir=/app-logs
- Setting yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
- Setting mapreduce.map.output.compress=true
- Setting yarn.nodemanager.resource.memory-mb=16384
- Setting yarn.resourcemanager.recovery.enabled=true
- Setting yarn.nodemanager.resource.cpu-vcores=8
Configuring httpfs
Configuring kms
Configuring mapred
- Setting mapreduce.map.java.opts=-Xmx3072m
- Setting mapreduce.reduce.memory.mb=8192
- Setting mapreduce.reduce.java.opts=-Xmx6144m
- Setting yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.map.memory.mb=4096
- Setting mapred.child.java.opts=-Xmx4096m
- Setting mapreduce.reduce.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.framework.name=yarn
- Setting mapreduce.map.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
Configuring for multihomed network
[1/100] 10.47.0.1:9001 is available.
2021-12-17 14:46:55,883 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadoop-datanode-7f9bdb4f54-4mb8r/10.47.0.6
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.2.1
If I look at my open port on my namenode, I see this :
netstat -lpten
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:9001 0.0.0.0:* LISTEN 0 13617204 361/java
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 0 13617098 361/java
On my file core-site.xml on datanode, I have the data below :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Same result on the core-site.xml file of namenode :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Can anyone please help me on identifying the issue ?

Failed to send HTTP request to schema-registry

I am trying to setup a local kafka-connect stack with docker-compose and I have a problem with my scala producer that's supposed to send avro messages to a kafka topic using schema registry.
In my producer (scala) code I do the following:
val kafkaBootstrapServer = "kafka:9092"
val schemaRegistryUrl = "http://schema-registry:8081"
val topicName = "test"
val props = new Properties()
props.put("bootstrap.servers", kafkaBootstrapServer)
props.put("schema.registry.url", schemaRegistryUrl)
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer")
props.put("acks", "1")
and my docker-compose script reads:
---
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:5.5.0
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:5.5.0
hostname: kafka
container_name: kafka
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://kafka:9092"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CREATE_TOPICS: "test:1:1"
schema-registry:
image: confluentinc/cp-schema-registry:5.5.0
hostname: schema-registry
container_name: schema-registry
depends_on:
- zookeeper
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: zookeeper:2181
producer:
image: producer-app:1.0
depends_on:
- schema-registry
- kafka
EDIT: now the schema registry seems to be up:
schema-registry | [2021-01-17 22:27:27,704] INFO HV000001: Hibernate Validator 6.0.17.Final (org.hibernate.validator.internal.util.Version)
kafka | [2021-01-17 22:27:27,918] INFO [Controller id=1001] Processing automatic preferred replica leader election (kafka.controller.KafkaController)
kafka | [2021-01-17 22:27:27,919] TRACE [Controller id=1001] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)
kafka | [2021-01-17 22:27:27,923] DEBUG [Controller id=1001] Topics not in preferred replica for broker 1001 Map() (kafka.controller.KafkaController)
kafka | [2021-01-17 22:27:27,924] TRACE [Controller id=1001] Leader imbalance ratio for broker 1001 is 0.0 (kafka.controller.KafkaController)
schema-registry | [2021-01-17 22:27:28,010] INFO JVM Runtime does not support Modules (org.eclipse.jetty.util.TypeUtil)
schema-registry | [2021-01-17 22:27:28,011] INFO Started o.e.j.s.ServletContextHandler#22d6f11{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
schema-registry | [2021-01-17 22:27:28,035] INFO Started o.e.j.s.ServletContextHandler#15eebbff{/ws,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
schema-registry | [2021-01-17 22:27:28,058] INFO Started NetworkTrafficServerConnector#2698dc7{HTTP/1.1,[http/1.1]}{0.0.0.0:8081} (org.eclipse.jetty.server.AbstractConnector)
schema-registry | [2021-01-17 22:27:28,059] INFO Started #4137ms (org.eclipse.jetty.server.Server)
schema-registry | [2021-01-17 22:27:28,059] INFO Server started, listening for requests... (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)
but prior to this, during the execution of the script I get:
schema-registry | ===> Launching ...
schema-registry | ===> Launching schema-registry ...
producer_1 | [main] ERROR io.confluent.kafka.schemaregistry.client.rest.RestService - Failed to send HTTP request to endpoint: http://schema-registry:8081/subjects/test-value/versions
producer_1 | java.net.ConnectException: Connection refused (Connection refused)
Could this be due to some dependency issue? It's like it is running the producer before completely starting the schema-registry! I did put depends on - schema-registry for the producer ...
It looks like the cause here was that your app was trying to call the Schema Registry before it had finished starting up. Perhaps your app should include some error handling for this condition and maybe retry after a backoff period on the first x failures?
For anyone else going through this, I was getting a connection refused error with my schema registry for the longest time. I have my schema-registry running on a docker container with a separate container for a REST API that is trying to use the schema-registry container. What fixed it for me was changing my connection URL in the REST container from http://localhost:8081 -> http://schema-registry-server:8081.
Connecting to the Schema Registry in the REST container:
schema_registry = SchemaRegistryClient(
url={
"url": "http://schema-registry-server:8081"
},
)
Here's the schema registry part of my docker compose file
# docker-compose.yml
schema-registry-server:
image: confluentinc/cp-schema-registry
hostname: schema-registry-server
container_name: schema-registry-server
depends_on:
- kafka
- zookeeper
ports:
- 8081:8081
environment:
- SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=kafka:9092
- SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:32181
- SCHEMA_REGISTRY_HOST_NAME=schema-registry-server
- SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081
- SCHEMA_REGISTRY_DEBUG=true

How to use aws elastic-search in Zeebe operate

I have deployed zeebe cluster and zeebe operate separately, In the zeebe cluster, the elasticsearch is configured with the AWS ELK but facing issue with zeebe operate.
Configmap.yaml:
kind: ConfigMapmetadata:
name: {{ include "zeebe-operate.fullname" . }}
apiVersion: v1
data:
application.yml: |
# Operate configuration file
camunda.operate:
elasticsearch:
host: {{ .Values.global.elasticsearch.host }}
port: {{ .Values.global.elasticsearch.port }}
username: {{ .Values.global.elasticsearch.username }}
password: {{ .Values.global.elasticsearch.password }}
prefix: zeebe-record-operate
ERROR:
2020-06-29 05:35:10.397 ERROR 6 --- [ main] o.c.o.e.ElasticsearchConnector : Error occurred while connecting to Elasticsearch: clustername [elasticsearch], https://xx.xx.xx.x.x.x.x.ap-south-1.es.amazonaws.com:443. Will be retried...
java.io.IOException: https://xxxxx-xxxx-xxx-xx-xxx-x..ap-south-1.es.amazonaws.com: Name or service not known
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:964) ~[elasticsearch-rest-client-6.8.7.jar!/:6.8.7]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233) ~[elasticsearch-rest-client-6.8.7.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.ClusterClient.health(ClusterClient.java:146) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.camunda.operate.es.ElasticsearchConnector.checkHealth(ElasticsearchConnector.java:89) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector.createEsClient(ElasticsearchConnector.java:75) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector.esClient(ElasticsearchConnector.java:51) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527.CGLIB$esClient$0() ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527$$FastClassBySpringCGLIB$$af2d84c1.invoke() ~[camunda-operate-common-0.23.0.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244) ~[spring-core-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527.esClient() ~[camunda-o
URL is accessible for zeebe cluster but for zeebe operate it is not working as you can see the logs of zeebe broker
Logs of Broker:
2020-06-23 08:23:29.759 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [7/10]: topology manager started in 1 ms
2020-06-23 08:23:29.761 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: metric's server
2020-06-23 08:23:29.820 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: metric's server started in 58 ms
2020-06-23 08:23:29.821 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler
2020-06-23 08:23:29.826 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler started in 2 ms
2020-06-23 08:23:29.826 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [10/10]: zeebe partitions
2020-06-23 08:23:29.832 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 partitions [1/1]: partition 1
2020-06-23 08:23:30.231 [] [main] DEBUG io.zeebe.broker.exporter - Exporter configured with ElasticsearchExporterConfiguration{url='https://xxx.xxx.xxx.xx.xx.xx.xx.x.ap-south-1.es.amazonaws.com:443', index=IndexConfiguration{indexPrefix='zeebe-record', createTemplate=true, command=false, event=true, rejection=false, error=true, deployment=true, incident=true, job=true, message=false, messageSubscription=false, variable=true, variableDocument=true, workflowInstance=true, workflowInstanceCreation=false, workflowInstanceSubscription=false, ignoreVariablesAbove=8191}, bulk=BulkConfiguration{delay=5, size=1000}, authentication=AuthenticationConfiguration{username='kibana'}}
2020-06-23 08:23:30.236 [Broker-0-ZeebePartition-1] [Broker-0-zb-actors-1] DEBUG io.zeebe.broker.system - Removing follower partition service for partition PartitionId{id=1, group=raft-partition}
2020-06-23 08:23:30.325 [Broker-0-ZeebePartition-1] [Broker-0-zb-actors-1] DEBUG io.zeebe.broker.system - Partition role transitioning from null to LEADER
Thanks in Advance!!

Istio blocking connection to MySQL

I would like to deploy the java petstore for kubernetes. In order to achieve this I have 2 simple deployments. The first one is the java web app and the second one is a MySQL database.
When istio is disabled the connection between the app and the DB works well.
Unfortunatly when the istio sidecar is injected the communication between the two stops working.
Here is the deployment file of the web app:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jpetstoreweb
spec:
replicas: 1
template:
metadata:
labels:
app: jpetstoreweb
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: jpetstoreweb
image: wingardiumleviosa/petstore:v7
env:
- name: VERSION
value: "1"
- name: DB_URL
value: "jpetstoredb-service"
- name: DB_PORT
value: "3306"
- name: DB_NAME
value: "jpetstore"
- name: DB_USERNAME
value: "jpetstore"
- name: DB_PASSWORD
value: "foobar"
ports:
- containerPort: 9080
readinessProbe:
httpGet:
path: /
port: 9080
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: jpetstoreweb-service
spec:
selector:
app: jpetstoreweb
ports:
- port: 80
targetPort: 9080
---
And next the deployment file of the mySql database :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jpetstoredb
spec:
replicas: 1
template:
metadata:
labels:
app: jpetstoredb
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: jpetstoredb
image: wingardiumleviosa/petstoredb:v1
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "foobar"
- name: MYSQL_DATABASE
value: "jpetstore"
- name: MYSQL_USER
value: "jpetstore"
- name: MYSQL_PASSWORD
value: "foobar"
---
apiVersion: v1
kind: Service
metadata:
name: jpetstoredb-service
spec:
selector:
app: jpetstoredb
ports:
- port: 3306
targetPort: 3306
Finally the error logs from the web app trying to connect to the DB :
Exception thrown by application class 'org.springframework.web.servlet.FrameworkServlet.processRequest:488'
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLException: Communication link failure: java.io.EOFException, underlying cause: null ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1395) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1539) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1930) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1168) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1279) at com.mysql.jdbc.MysqlIO.sqlQuery(MysqlIO.java:1225) at com.mysql.jdbc.Connection.execSQL(Connection.java:2278) at com.mysql.jdbc.Connection.execSQL(Connection.java:2237) at com.mysql.jdbc.Connection.execSQL(Connection.java:2218) at com.mysql.jdbc.Connection.setAutoCommit(Connection.java:548) at org.apache.commons.dbcp.DelegatingConnection.setAutoCommit(DelegatingConnection.java:331) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.setAutoCommit(PoolingDataSource.java:317) at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:221) at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:350) at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:261) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:101) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy28.getCategory(Unknown Source) at org.springframework.samples.jpetstore.web.spring.ViewCategoryController.handleRequest(ViewCategoryController.java:31) at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:874) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:808) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:476) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1255) at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:743) at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:440) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:182) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:93) at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56) at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:90) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:996) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1134) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1005) at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75) at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:927) at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1023) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:417) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:376) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:532) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:466) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:331) at com.ibm.ws.http.channel.internal.inbound.HttpICLReadCallback.complete(HttpICLReadCallback.java:70) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:501) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:571) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:926) at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1015) at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:232) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:812) ** END NESTED EXCEPTION **
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:488)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:431)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
Extract : Could not open JDBC Connection for transaction
Additionnal info :
1) I can curl the DB from the web app container using CURL and it answers correctly.
2) I use Cilium instead of Calico
3) I installed Istio using HELM
4) Kubernetes is installed on bare metal (no cloud provider)
5) kubectl get pods -n istio-system all istio pods are running
6) kubectl get pods -n kube-system all cilium pods are running
7) Istio is injected using kubectl apply -f <(~/istio-1.0.5/bin/istioctl kube-inject -f ~/jpetstore.yaml) -n foo. If I use any other method Istio is not injecting itself in the Web pod (But works for the DB pod, god knows why)
8) The DB pod is always happy and working well
9) Logs of the istio-proxy container inside the WebApp pod : kubectl logs jpetstoreweb-84c7d8964-s642k istio-proxy -n myns
2018-12-28T03:52:30.610101Z info Version root#6f6ea1061f2b-docker.io/istio-1.0.5-c1707e45e71c75d74bf3a5dec8c7086f32f32fad-Clean
2018-12-28T03:52:30.610167Z info Proxy role: model.Proxy{ClusterID:"", Type:"sidecar", IPAddress:"10.233.72.142", ID:"jpetstoreweb-84c7d8964-s642k.myns", Domain:"myns.svc.cluster.local", Metadata:map[string]string(nil)}
2018-12-28T03:52:30.611217Z info Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system:15007
discoveryRefreshDelay: 1s
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: jpetstoreweb
zipkinAddress: zipkin.istio-system:9411
2018-12-28T03:52:30.611249Z info Monitored certs: []envoy.CertSource{envoy.CertSource{Directory:"/etc/certs/", Files:[]string{"cert-chain.pem", "key.pem", "root-cert.pem"}}}
2018-12-28T03:52:30.611829Z info Starting proxy agent
2018-12-28T03:52:30.611902Z info Received new config, resetting budget
2018-12-28T03:52:30.611912Z info Reconciling configuration (budget 10)
2018-12-28T03:52:30.611926Z info Epoch 0 starting
2018-12-28T03:52:30.613236Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster jpetstoreweb --service-node sidecar~10.233.72.142~jpetstoreweb-84c7d8964-s642k.myns~myns.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only]
[2018-12-28 03:52:30.630][20][info][main] external/envoy/source/server/server.cc:190] initializing epoch 0 (hot restart version=10.200.16384.256.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=4882536)
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:192] statically linked extensions:
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:194] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:197] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash,istio_authn,jwt-auth,mixer
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:200] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:203] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy,mixer
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:205] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:207] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:210] transport_sockets.downstream: alts,envoy.transport_sockets.capture,raw_buffer,tls
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:213] transport_sockets.upstream: alts,envoy.transport_sockets.capture,raw_buffer,tls
[2018-12-28 03:52:30.634][20][info][config] external/envoy/source/server/configuration_impl.cc:50] loading 0 static secret(s)
[2018-12-28 03:52:30.638][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:30.638][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:30.638][20][info][config] external/envoy/source/server/configuration_impl.cc:60] loading 1 listener(s)
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:94] loading tracing configuration
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:103] loading tracing driver: envoy.zipkin
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:116] loading stats sink configuration
[2018-12-28 03:52:30.640][20][info][main] external/envoy/source/server/server.cc:432] starting main dispatch loop
[2018-12-28 03:52:32.010][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:32.011][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:34.691][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:34.691][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:38.483][20][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:130] cm init: initializing cds
[2018-12-28 03:53:01.596][20][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:494] add/update cluster outbound|443||kubernetes.default.svc.cluster.local during init
...
[2018-12-28T04:09:09.561Z] - 115 1548 6 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40318 10.233.72.142:9080 10.233.72.1:43098
[2018-12-28T04:09:14.555Z] - 115 1548 8 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40350 10.233.72.142:9080 10.233.72.1:43130
[2018-12-28T04:09:19.556Z] - 115 1548 5 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40364 10.233.72.142:9080 10.233.72.1:43144
[2018-12-28T04:09:24.558Z] - 115 1548 6 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40378 10.233.72.142:9080 10.233.72.1:43158
10) Using Istio 1.0.5 and kubernetes 1.13.0
All idears are welcome ;-)
Thx
So there really is an issue with Istio 1.0.5 and the JDBC of MySQL
The temporary solution is to delete the mesh ressource in the following way :
kubectl delete meshpolicies.authentication.istio.io default
As stated here and referencing this.
(FYI : I deleted the ressource BEFORE deploying my petstore app.)
As of Istio 1.1.1 there is more data on this problem in the FAQ