How to use aws elastic-search in Zeebe operate - aws-elasticsearch

I have deployed zeebe cluster and zeebe operate separately, In the zeebe cluster, the elasticsearch is configured with the AWS ELK but facing issue with zeebe operate.
Configmap.yaml:
kind: ConfigMapmetadata:
name: {{ include "zeebe-operate.fullname" . }}
apiVersion: v1
data:
application.yml: |
# Operate configuration file
camunda.operate:
elasticsearch:
host: {{ .Values.global.elasticsearch.host }}
port: {{ .Values.global.elasticsearch.port }}
username: {{ .Values.global.elasticsearch.username }}
password: {{ .Values.global.elasticsearch.password }}
prefix: zeebe-record-operate
ERROR:
2020-06-29 05:35:10.397 ERROR 6 --- [ main] o.c.o.e.ElasticsearchConnector : Error occurred while connecting to Elasticsearch: clustername [elasticsearch], https://xx.xx.xx.x.x.x.x.ap-south-1.es.amazonaws.com:443. Will be retried...
java.io.IOException: https://xxxxx-xxxx-xxx-xx-xxx-x..ap-south-1.es.amazonaws.com: Name or service not known
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:964) ~[elasticsearch-rest-client-6.8.7.jar!/:6.8.7]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233) ~[elasticsearch-rest-client-6.8.7.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.elasticsearch.client.ClusterClient.health(ClusterClient.java:146) ~[elasticsearch-rest-high-level-client-6.8.8.jar!/:6.8.7]
at org.camunda.operate.es.ElasticsearchConnector.checkHealth(ElasticsearchConnector.java:89) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector.createEsClient(ElasticsearchConnector.java:75) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector.esClient(ElasticsearchConnector.java:51) ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527.CGLIB$esClient$0() ~[camunda-operate-common-0.23.0.jar!/:?]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527$$FastClassBySpringCGLIB$$af2d84c1.invoke() ~[camunda-operate-common-0.23.0.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244) ~[spring-core-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331) ~[spring-context-5.2.5.RELEASE.jar!/:5.2.5.RELEASE]
at org.camunda.operate.es.ElasticsearchConnector$$EnhancerBySpringCGLIB$$670b527.esClient() ~[camunda-o
URL is accessible for zeebe cluster but for zeebe operate it is not working as you can see the logs of zeebe broker
Logs of Broker:
2020-06-23 08:23:29.759 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [7/10]: topology manager started in 1 ms
2020-06-23 08:23:29.761 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: metric's server
2020-06-23 08:23:29.820 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [8/10]: metric's server started in 58 ms
2020-06-23 08:23:29.821 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler
2020-06-23 08:23:29.826 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [9/10]: leader management request handler started in 2 ms
2020-06-23 08:23:29.826 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [10/10]: zeebe partitions
2020-06-23 08:23:29.832 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 partitions [1/1]: partition 1
2020-06-23 08:23:30.231 [] [main] DEBUG io.zeebe.broker.exporter - Exporter configured with ElasticsearchExporterConfiguration{url='https://xxx.xxx.xxx.xx.xx.xx.xx.x.ap-south-1.es.amazonaws.com:443', index=IndexConfiguration{indexPrefix='zeebe-record', createTemplate=true, command=false, event=true, rejection=false, error=true, deployment=true, incident=true, job=true, message=false, messageSubscription=false, variable=true, variableDocument=true, workflowInstance=true, workflowInstanceCreation=false, workflowInstanceSubscription=false, ignoreVariablesAbove=8191}, bulk=BulkConfiguration{delay=5, size=1000}, authentication=AuthenticationConfiguration{username='kibana'}}
2020-06-23 08:23:30.236 [Broker-0-ZeebePartition-1] [Broker-0-zb-actors-1] DEBUG io.zeebe.broker.system - Removing follower partition service for partition PartitionId{id=1, group=raft-partition}
2020-06-23 08:23:30.325 [Broker-0-ZeebePartition-1] [Broker-0-zb-actors-1] DEBUG io.zeebe.broker.system - Partition role transitioning from null to LEADER
Thanks in Advance!!

Related

why Flink kafka client is trying to connect to localhost:9092 while It is set up to connect to 172.17.0.1:9092?

I am trying to set up a flink jobmanager-taskmanager with docker-compose with this config:
version: "3.7"
services:
jobmanagerconfig:
image: flink:1.13.2-scala_2.12
expose:
- "6133"
- "6123"
ports:
- "8085:8081"
command: standalone-job --job-classname net.mongerbot.configManager.App
volumes:
- ./usrlib/:/opt/flink/usrlib
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanagerconfig
parallelism.default: 2
taskmanager.numberOfTaskSlots: 4
- KAFKA_URI=${KAFKA_URI}
- KAFKA_PORT=${KAFKA_PORT}
- KAFKA_groupId=${KAFKA_groupId}
taskmanagerconfig:
image: flink:1.13.2-scala_2.12
depends_on:
- jobmanagerconfig
links:
- jobmanagerconfig
command: taskmanager
# scale: 1
volumes:
- ./usrlib/:/opt/flink/usrlib
environment:
- KAFKA_URI=${KAFKA_URI}
- KAFKA_PORT=${KAFKA_PORT}
- KAFKA_groupId=${KAFKA_groupId}
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanagerconfig
parallelism.default: 2
taskmanager.numberOfTaskSlots: 4
volumes:
usrlib:
networks:
default:
external:
name: mongerbot_network
The environment variables have the correct value in both containers.
and as the log says the configured kafka client is set up to connect to 172.17.0.1:9092 as well:
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,065 INFO org.apache.kafka.clients.consumer.ConsumerConfig [] - ConsumerConfig values:
docker-taskmanagerconfig-1 | allow.auto.create.topics = true
docker-taskmanagerconfig-1 | auto.commit.interval.ms = 5000
docker-taskmanagerconfig-1 | auto.offset.reset = latest
docker-taskmanagerconfig-1 | bootstrap.servers = [172.17.0.1:9092]
docker-taskmanagerconfig-1 | check.crcs = true
docker-taskmanagerconfig-1 | client.dns.lookup = default
docker-taskmanagerconfig-1 | client.id =
docker-taskmanagerconfig-1 | client.rack =
docker-taskmanagerconfig-1 | connections.max.idle.ms = 540000
docker-taskmanagerconfig-1 | default.api.timeout.ms = 60000
docker-taskmanagerconfig-1 | enable.auto.commit = true
docker-taskmanagerconfig-1 | exclude.internal.topics = true
...
but this is the next lines of logs exactly after the kafka client log:
docker-taskmanagerconfig-1 | value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
docker-taskmanagerconfig-1 |
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,084 INFO org.apache.kafka.clients.consumer.KafkaConsumer [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Subscribed to partition(s): config.subscribe-0, config.subscribe-2
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,090 INFO org.apache.kafka.clients.Metadata [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Cluster ID: s2iVODWcQ2Kbw4R5jL6RCw
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,091 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Discovered group coordinator localhost:9092 (id: 2147483646 rack: null)
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,094 WARN org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Connection to node 2147483646 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,094 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Group coordinator localhost:9092 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,094 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka version: 2.4.1
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,095 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka commitId: c57222ae8cd7866b
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,095 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka startTimeMs: 1670492216094
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,096 INFO org.apache.kafka.clients.consumer.KafkaConsumer [] - [Consumer clientId=consumer-configManager-8, groupId=configManager] Subscribed to partition(s): config.subscribe-1
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,101 INFO org.apache.kafka.clients.Metadata [] - [Consumer clientId=consumer-configManager-8, groupId=configManager] Cluster ID: s2iVODWcQ2Kbw4R5jL6RCw
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,102 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer clientId=consumer-configManager-8, groupId=configManager] Discovered group coordinator localhost:9092 (id: 2147483646 rack: null)
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,103 WARN org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=consumer-configManager-8, groupId=configManager] Connection to node 2147483646 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,104 INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator [] - [Consumer clientId=consumer-configManager-8, groupId=configManager] Group coordinator localhost:9092 (id: 2147483646 rack: null) is unavailable or invalid, will attempt rediscovery
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,197 WARN org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=consumer-configManager-7, groupId=configManager] Connection to node 1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
docker-taskmanagerconfig-1 | 2022-12-08 09:36:56,207 WARN org.apache.kafka.clients.NetworkClient
and as you can see it is trying to connect to localhost:9092.
There's actually no problem to what you're doing, as it's indicated in the logs:
Discovered group coordinator localhost:9092
So it's able to connect successfully. Now why you see 172.17.0.1 in the first place, and then localhost inside your kafka client logs? Well, localhost is just because of the env you passed to the application runtime. And the IP, that's because the raw name localhost which you provided in the configurations, needs to get resolved into some IP address, and you're not running it on your own machine natively, you're using docker. And eventually, 172.17.0.1 happens to be the docker host of the docker daemon of your machine. You can verify this in many ways, I'll link a post here to read more.
this problem is not related to flink or kafka consumer. it was related to kafka server itself.
the server should be configured to accept from 172.17.0.1 but it was setup to accept incoming request for kafka and localhost:
version: '2'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 9092:9092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
I changed PLAINTEXT_HOST://localhost:9092 to PLAINTEXT_HOST://172.17.0.1:9092 and it fixed.
(it was confusing because other clients (conduktor) could connect to the kafka with 172.17.0.1:9092 address even though this address was not in KAFKA_ADVERTISED_LISTENERS

Kubernetes : WARN datanode.DataNode: Problem connecting to server: namenode:9001

I am a novice to Kubernetes and I currently trying to deploy hadoop in kubernetes. I follow that docker-compose to deploy h https://github.com/big-data-europe/docker-hadoop/blob/master/docker-compose-v3.yml Below there is my yaml for the namenode :
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-namenode
labels:
traefik.docker.network: hbase-namenode
traefik.port: "9870"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
template:
metadata:
labels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-namenode
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9870
env:
- name: CLUSTER_NAME
value: test
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: data-namenode
mountPath: /hadoop/dfs/name
volumes:
- name: data-namenode
persistentVolumeClaim:
claimName: namenode-pvc
I create the pv and pvc for my pod and I attach my service on the pod.
apiVersion: v1
kind: Service
metadata:
name: service-namenode
namespace: hdfs
spec:
selector:
traefik.docker.network: hbase-namenode
type: NodePort
ports:
- name: namenode
protocol: TCP
port: 9870
targetPort: 9870
When I deploy my pod hadoop-namenode, he was up and running correctly on my cluster. I had those logs :
2021-12-14 15:43:25,893 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-12-14 15:43:26,044 INFO namenode.NameNode: createNameNode []
2021-12-14 15:43:26,202 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-12-14 15:43:26,367 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-12-14 15:43:26,368 INFO impl.MetricsSystemImpl: NameNode metrics system started
2021-12-14 15:43:26,422 INFO namenode.NameNodeUtils: fs.defaultFS is hdfs://namenode:9001
2021-12-14 15:43:26,423 INFO namenode.NameNode: Clients should use namenode:9001 to access this namenode/service.
2021-12-14 15:43:26,634 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-14 15:43:26,676 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://0.0.0.0:9870
And finally when i setup my datanode and service using this file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-datanode
labels:
traefik.docker.network: hbase-datanode
traefik.port: "9864"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
template:
metadata:
labels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-datanode
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9864
env:
- name: SERVICE_PRECONDITION
value: "10.47.0.1:9870"
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: datanode
mountPath: /hadoop/dfs/data
volumes:
- name: datanode
persistentVolumeClaim:
claimName: datanode-pvc
All my pod up and running but I got this error on my datanode.
kubectl get pod -n hdfs
NAME READY STATUS RESTARTS AGE
hadoop-datanode-7f9bdb4f54-4hh6t 1/1 Running 0 2d21h
hadoop-namenode-6bddbc7b6-h8bqk 1/1 Running 0 2d22h
2021-12-17 14:46:57,697 INFO server.AbstractConnector: Started ServerConnector#4b5c304e{HTTP/1.1,[http/1.1]}{localhost:41327}
2021-12-17 14:46:57,698 INFO server.Server: Started #2486ms
2021-12-17 14:46:57,943 INFO web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864
2021-12-17 14:46:57,952 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-17 14:46:57,958 INFO datanode.DataNode: dnUserName = root
2021-12-17 14:46:57,958 INFO datanode.DataNode: supergroup = supergroup
2021-12-17 14:46:58,026 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-12-17 14:46:58,043 INFO ipc.Server: Starting Socket Reader #1 for port 9867
2021-12-17 14:46:58,310 INFO datanode.DataNode: Opened IPC server at /0.0.0.0:9867
2021-12-17 14:46:58,332 INFO datanode.DataNode: Refresh request received for nameservices: null
2021-12-17 14:46:58,359 WARN hdfs.DFSUtilClient: Namenode for null remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2021-12-17 14:46:58,363 INFO datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2021-12-17 14:46:58,389 INFO datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to namenode:9001 starting to offer service
2021-12-17 14:46:58,408 INFO ipc.Server: IPC Server Responder: starting
2021-12-17 14:46:58,409 INFO ipc.Server: IPC Server listener on 9867: starting
2021-12-17 14:46:58,480 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:03,481 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:08,483 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:13,484 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:18,486 WARN datanode.DataNode: Problem connecting to server: namenode:9001
I use secret ressource to setup my environment variable for datanode and namenode which give that result for my datanode at the start :
Configuring core
- Setting hadoop.proxyuser.hue.hosts=*
- Setting fs.defaultFS=hdfs://namenode:9001
- Setting hadoop.http.staticuser.user=root
- Setting io.compression.codecs=org.apache.hadoop.io.compress.SnappyCodec
- Setting hadoop.proxyuser.hue.groups=*
Configuring hdfs
- Setting dfs.datanode.data.dir=file:///hadoop/dfs/data
- Setting dfs.namenode.datanode.registration.ip-hostname-check=false
- Setting dfs.webhdfs.enabled=true
- Setting dfs.permissions.enabled=false
Configuring yarn
- Setting yarn.timeline-service.enabled=true
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-vcores=4
- Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
- Setting yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
- Setting yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=98.5
- Setting yarn.log.server.url=http://historyserver:8188/applicationhistory/logs/
- Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
- Setting yarn.timeline-service.generic-application-history.enabled=true
- Setting yarn.log-aggregation-enable=true
- Setting yarn.resourcemanager.hostname=resourcemanager
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-mb=8192
- Setting yarn.nodemanager.aux-services=mapreduce_shuffle
- Setting yarn.resourcemanager.resource_tracker.address=resourcemanager:8031
- Setting yarn.timeline-service.hostname=historyserver
- Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
- Setting yarn.resourcemanager.address=resourcemanager:8032
- Setting mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec
- Setting yarn.nodemanager.remote-app-log-dir=/app-logs
- Setting yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
- Setting mapreduce.map.output.compress=true
- Setting yarn.nodemanager.resource.memory-mb=16384
- Setting yarn.resourcemanager.recovery.enabled=true
- Setting yarn.nodemanager.resource.cpu-vcores=8
Configuring httpfs
Configuring kms
Configuring mapred
- Setting mapreduce.map.java.opts=-Xmx3072m
- Setting mapreduce.reduce.memory.mb=8192
- Setting mapreduce.reduce.java.opts=-Xmx6144m
- Setting yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.map.memory.mb=4096
- Setting mapred.child.java.opts=-Xmx4096m
- Setting mapreduce.reduce.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.framework.name=yarn
- Setting mapreduce.map.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
Configuring for multihomed network
[1/100] 10.47.0.1:9001 is available.
2021-12-17 14:46:55,883 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadoop-datanode-7f9bdb4f54-4mb8r/10.47.0.6
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.2.1
If I look at my open port on my namenode, I see this :
netstat -lpten
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:9001 0.0.0.0:* LISTEN 0 13617204 361/java
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 0 13617098 361/java
On my file core-site.xml on datanode, I have the data below :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Same result on the core-site.xml file of namenode :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Can anyone please help me on identifying the issue ?

Exception on spring application startup with spring-cloud-kubernetes config maps dependencies present

I have a few spring services which has both Eureka-client and spring-cloud-starter-kubernetes-fabric8-all dependencies. By default, Eureka is enabled and Kubernetes is disabled.
management:
endpoints:
web:
exposure:
include: '*'
eureka:
client:
enabled: true
serviceUrl:
defaultZone: http://localhost:8080/eureka/
spring:
main:
banner-mode: off
application:
name: service-one
cloud:
kubernetes:
enabled: false
config:
enable-api: false
enabled: false
reload:
enabled: false
zipkin:
enabled: false
When I startup the app, I get the following exception as a warning, eventough kubernetes is disabled.
Running - docker run --rm dhananjay12/demo-service-one:latest
2021-05-02 23:35:51.458 INFO [,,] 1 --- [ main] ubernetesProfileEnvironmentPostProcessor : Did not find service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace]. Ignoring.
2021-05-02 23:35:51.464 WARN [,,] 1 --- [ main] ubernetesProfileEnvironmentPostProcessor : Not running inside kubernetes. Skipping 'kubernetes' profile activation.
2021-05-02 23:35:51.980 WARN [,,] 1 --- [ main] io.fabric8.kubernetes.client.Config : Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
2021-05-02 23:35:51.984 WARN [,,] 1 --- [ main] io.fabric8.kubernetes.client.Config : Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
2021-05-02 23:35:51.987 WARN [,,] 1 --- [ main] o.s.c.k.f.Fabric8AutoConfiguration : No namespace has been detected. Please specify KUBERNETES_NAMESPACE env var, or use a later kubernetes version (1.3 or later)
2021-05-02 23:35:52.441 WARN [service-one,,] 1 --- [ main] s.c.k.f.c.Fabric8ConfigMapPropertySource : Can't read configMap with name: [service-one] in namespace:[null]. Ignoring.
io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [service-one] in namespace: [null] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) ~[kubernetes-client-4.13.2.jar:na]
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) ~[kubernetes-client-4.13.2.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225) ~[kubernetes-client-4.13.2.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186) ~[kubernetes-client-4.13.2.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84) ~[kubernetes-client-4.13.2.jar:na]
at org.springframework.cloud.kubernetes.fabric8.config.Fabric8ConfigMapPropertySource.getData(Fabric8ConfigMapPropertySource.java:61) ~[spring-cloud-kubernetes-fabric8-config-2.0.2.jar:2.0.2]
at org.springframework.cloud.kubernetes.fabric8.config.Fabric8ConfigMapPropertySource.<init>(Fabric8ConfigMapPropertySource.java:50) ~[spring-cloud-kubernetes-fabric8-config-2.0.2.jar:2.0.2]
at org.springframework.cloud.kubernetes.fabric8.config.Fabric8ConfigMapPropertySourceLocator.getMapPropertySource(Fabric8ConfigMapPropertySourceLocator.java:51) ~[spring-cloud-kubernetes-fabric8-config-2.0.2.jar:2.0.2]
at org.springframework.cloud.kubernetes.commons.config.ConfigMapPropertySourceLocator.getMapPropertySourceForSingleConfigMap(ConfigMapPropertySourceLocator.java:81) ~[spring-cloud-kubernetes-commons-2.0.2.jar:2.0.2]
at org.springframework.cloud.kubernetes.commons.config.ConfigMapPropertySourceLocator.lambda$locate$0(ConfigMapPropertySourceLocator.java:67) ~[spring-cloud-kubernetes-commons-2.0.2.jar:2.0.2]
at java.base/java.util.Collections$SingletonList.forEach(Unknown Source) ~[na:na]
at org.springframework.cloud.kubernetes.commons.config.ConfigMapPropertySourceLocator.locate(ConfigMapPropertySourceLocator.java:67) ~[spring-cloud-kubernetes-commons-2.0.2.jar:2.0.2]
at org.springframework.cloud.bootstrap.config.PropertySourceLocator.locateCollection(PropertySourceLocator.java:51) ~[spring-cloud-context-3.0.2.jar:3.0.2]
at org.springframework.cloud.bootstrap.config.PropertySourceLocator.locateCollection(PropertySourceLocator.java:47) ~[spring-cloud-context-3.0.2.jar:3.0.2]
at org.springframework.cloud.bootstrap.config.PropertySourceBootstrapConfiguration.initialize(PropertySourceBootstrapConfiguration.java:95) ~[spring-cloud-context-3.0.2.jar:3.0.2]
at org.springframework.boot.SpringApplication.applyInitializers(SpringApplication.java:650) ~[spring-boot-2.4.5.jar:2.4.5]
at org.springframework.boot.SpringApplication.prepareContext(SpringApplication.java:403) ~[spring-boot-2.4.5.jar:2.4.5]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:338) ~[spring-boot-2.4.5.jar:2.4.5]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1340) ~[spring-boot-2.4.5.jar:2.4.5]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1329) ~[spring-boot-2.4.5.jar:2.4.5]
at com.mynotes.microservices.demo.serviceone.ServiceOneApplication.main(ServiceOneApplication.java:15) ~[classes/:na]
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Name or service not known
The exception is not there if I put the same property in env variable.
Running - docker run --rm -e spring.cloud.kubernetes.config.enable-api=false dhananjay12/demo-service-one:latest
While the app does spin up and the rest of the thing works, can someone suggest, why is this exception there in the first place?
Code - https://github.com/dhananjay12/spring-microservice-demo/blob/master/service-one/src/main/resources/application.yml
Spring boot - 2.4.5, Spring cloud - 2020.0.2
On further analysis and going through the docs, disabling of these features must be set in bootstrap.yml - https://docs.spring.io/spring-cloud-kubernetes/docs/current/reference/html/index.html#kubernetes-ecosystem-awareness.
Of course env variable will have precedence

can flink on k8s use pv using NFS and pvc as the high avalibility storageDir

I want to deploy Flink 1.12.0 on k8s with HA mode, and I don't want to deploy the HDFS cluster, and I have an NFS so that I am created a PV that use NFS as the backend storage, and I created a PVC for Deployment mount.
this is my FLINK configMap
kubernetes.cluster-id: mta-flink
high-availability: org.apache.flink.kubernetes.highavailability. KubernetesHaServicesFactory
high-availability.storageDir: file:///opt/flink/nfs/ha
and this is my jobmanager yaml file:
volumeMounts:
name: flink-config-volume
mountPath: /opt/flink/conf
name: flink-nfs
mountPath: /opt/flink/nfs
securityContext:
runAsUser: 9999 # refers to user flink from official flink image, change if necessary
#fsGroup: 9999
volumes:
name: flink-config-volume
configMap:
name: mta-flink-config
items:
key: flink-conf.yaml
path: flink-conf.yaml
key: log4j-console.properties
path: log4j-console.properties
name: flink-nfs
persistentVolumeClaim:
claimName: mta-flink-nfs-pvc
It can be deployed successfully, but if I browser the jobmanager:8081 website, I get the result below:
{"errors": ["Service temporarily unavailable due to an ongoing leader election. Please refresh."]}
is the PVC can be used as high-availability.storageDir? if it's can be used, how can I fix this error?
the logs of jobmanager shows that the leader is elected already:
2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader election started
2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to acquire leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='mta-flink-resourcemanager-leader'}.
2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager#6d303498
2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='mta-flink-dispatcher-leader'}.
2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened
2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Starting DefaultLeaderElectionService with KubernetesLeaderElectionDriver{configMapName='mta-flink-resourcemanager-leader'}.
2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-restserver-leader.
2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender http://mta-flink-jobmanager:8081 with session ID 9587e13f-322f-4cd5-9fff-b4941462be0f.
2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - http://mta-flink-jobmanager:8081 was granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f
2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader http://mta-flink-jobmanager:8081.
2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-resourcemanager-leader.
2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: StandaloneResourceManager with session ID b1730dc6-0f94-49f4-b519-56917f3027b7.
2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for mta-flink-dispatcher-leader.
2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Successfully Acquired leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant leadership to contender LeaderContender: DefaultDispatcherRunner with session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1.
2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create new DispatcherLeaderProcess with leader session id fbbaa883-69f6-43df-9ca0-c646bc1baad1.
2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Start SessionDispatcherLeaderProcess.
2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Recover all persisted job graphs.
2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all stored job ids from KubernetesStateHandleStore{configMapName='mta-flink-dispatcher-leader'}.
2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink#mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4
2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager.
2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=http://mta-flink-jobmanager:8081, session ID=9587e13f-322f-4cd5-9fff-b4941462be0f.
2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Attempting to renew leader lease 'ConfigMapLock: default - mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader akka.tcp://flink#mta-flink-jobmanager:6123/user/rpc/resourcemanager_0.
2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink#mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, session ID=b1730dc6-0f94-49f4-b519-56917f3027b7.
2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from KubernetesStateHandleStore{configMapName='mta-flink-dispatcher-leader'}
2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs.
2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting FencedAkkaRpcActor with name dispatcher_1.
2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 .
2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader akka.tcp://flink#mta-flink-jobmanager:6123/user/rpc/dispatcher_1.
2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] - Successfully wrote leader information: Leader=akka.tcp://flink#mta-flink-jobmanager:6123/user/rpc/dispatcher_1, session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1.
2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true

Client app tries to producing topic on kafka but stucks and, don't return either any error message neither 200 OK

We deploy kafka and zookeeper pods on a kubernetes cluster. These two are connected to each other properly. But when we want produce a topic through a client app the PUT request stuck in pending and after a lot of time no message returned! How can I debug this situation?
The .yaml files for kafka and zookeeper and client app is like below:
kafka.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: kafka
spec:
containers:
- env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: kafka
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_CREATE_TOPICS
value: newsrawdata:1:1
- name: KAFKA_ZOOKEEPER_CONNECT
value: 192.168.88.42:30573
- name: KAFKA_PORT
value: "9092"
- name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
value: "1000"
image: wurstmeister/kafka
name: kafka
ports:
- containerPort: 9092
- containerPort: 9094
resources: {}
volumeMounts:
- mountPath: /var/run/docker.sock
name: kafka-claim0
hostname: kafka
restartPolicy: Always
volumes:
- name: kafka-claim0
persistentVolumeClaim:
claimName: kafka-claim0
status: {}
zookeeper.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: zookeeper
spec:
containers:
- env:
- name: ALLOW_ANONYMOUS_LOGIN
value: "yes"
image: wurstmeister/zookeeper
name: zookeeper
ports:
- containerPort: 2181
resources: {}
hostname: zookeeper
restartPolicy: Always
status: {}
app.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
service: broker-service
name: broker-app
spec:
replicas: 1
template:
metadata:
labels:
service: broker-service
spec:
imagePullSecrets:
- name: pullsecret
containers:
- env:
- name: OHH_COMMON_REDEPLOY
value: THIS_WILL_BE_REPLACED
- name: ASPNETCORE_ENVIRONMENT
value: docker
image: localgitlabregistry/broker.app:v0.01
name: broker-app
imagePullPolicy: "Always"
ports:
- containerPort: 80
- containerPort: 443
nodeSelector:
role: slave1
restartPolicy: Always
And the services are like below:
kafka-service.yaml:
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: kafka
name: kafka
spec:
ports:
- name: "9092"
port: 9092
targetPort: 9092
- name: "9094"
port: 9094
targetPort: 9094
clusterIP: None
# type: NodePort
selector:
io.kompose.service: kafka
status:
loadBalancer: {}
zookeeper-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: kompose convert
kompose.version: 1.18.0 (06a2e56)
creationTimestamp: null
labels:
io.kompose.service: zookeeper
name: zookeeper
spec:
ports:
- name: "2181"
port: 2181
targetPort: 2181
selector:
io.kompose.service: zookeeper
status:
loadBalancer: {}
app-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
service: broker-service
name: broker-service
spec:
ports:
- name: "57270"
port: 80
targetPort: 80
- name: "44348"
port: 443
targetPort: 443
selector:
service: broker-service
type:
NodePort
The log from kafka pod is like below:
waiting for kafka to be ready
[Configuring] 'advertised.port' in '/opt/kafka/config/server.properties'
Excluding KAFKA_HOME from broker config
[Configuring] 'advertised.host.name' in '/opt/kafka/config/server.properties'
[Configuring] 'port' in '/opt/kafka/config/server.properties'
[Configuring] 'broker.id' in '/opt/kafka/config/server.properties'
Excluding KAFKA_VERSION from broker config
[Configuring] 'zookeeper.connect' in '/opt/kafka/config/server.properties'
[Configuring] 'log.dirs' in '/opt/kafka/config/server.properties'
[Configuring] 'zookeeper.connect.timeout.ms' in '/opt/kafka/config/server.properties'
[2019-09-29 08:06:56,783] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-09-29 08:06:57,767] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2019-09-29 08:06:57,768] INFO starting (kafka.server.KafkaServer)
[2019-09-29 08:06:57,769] INFO Connecting to zookeeper on 192.168.88.42:30573 (kafka.server.KafkaServer)
[2019-09-29 08:06:57,796] INFO [ZooKeeperClient Kafka server] Initializing a new session to
.
.
.
[2019-09-29 08:06:57,804] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:os.version=4.4.0-116-generic (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,804] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,806] INFO Initiating client connection, connectString=192.168.88.42:30573 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$#2667f029 (org.apache.zookeeper.ZooKeeper)
[2019-09-29 08:06:57,822] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-09-29 08:06:57,847] INFO Opening socket connection to server 192.168.88.42/192.168.88.42:30573. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,865] INFO Socket connection established to 192.168.88.42/192.168.88.42:30573, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,880] INFO Session establishment complete on server 192.168.88.42/192.168.88.42:30573, sessionid = 0x10005366a620042, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-09-29 08:06:57,886] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2019-09-29 08:06:58,448] INFO Cluster ID = b8bTvrC2T6iidAcNqD482A (kafka.server.KafkaServer)
[2019-09-29 08:06:58,455] WARN No meta.properties file under dir /kafka/kafka-logs-kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2019-09-29 08:06:58,632] INFO KafkaConfig values:
advertised.host.name = kafka
advertised.listeners = null
advertised.port = 9092
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name =
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
.
.
.
zookeeper.connect = 192.168.88.42:30573
zookeeper.connection.timeout.ms = 6000
zookeeper.max.in.flight.requests = 10
zookeeper.session.timeout.ms = 6000
zookeeper.set.acl = false
zookeeper.sync.time.ms = 2000
(kafka.server.KafkaConfig)
[2019-09-29 08:06:58,659] INFO KafkaConfig values:
advertised.host.name = kafka
advertised.listeners = null
advertised.port = 9092
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name =
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
kafka.metrics.reporters = []
leader.imbalance.check.interval.seconds = 300
leader.imbalance.per.broker.percentage = 10
listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners = null
log.cleaner.backoff.ms = 15000
log.cleaner.dedupe.buffer.size = 134217728
log.cleaner.delete.retention.ms = 86400000
log.cleaner.enable = true
log.cleaner.io.buffer.load.factor = 0.9
log.cleaner.io.buffer.size = 524288
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
log.cleaner.max.compaction.lag.ms = 9223372036854775807
log.cleaner.min.cleanable.ratio = 0.5
log.cleaner.min.compaction.lag.ms = 0
log.cleaner.threads = 1
log.cleanup.policy = [delete]
log.dir = /tmp/kafka-logs
.
.
.
unclean.leader.election.enable = false
zookeeper.connect = 192.168.88.42:30573
zookeeper.connection.timeout.ms = 6000
zookeeper.max.in.flight.requests = 10
zookeeper.session.timeout.ms = 6000
zookeeper.set.acl = false
zookeeper.sync.time.ms = 2000
(kafka.server.KafkaConfig)
[2019-09-29 08:06:58,721] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,722] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,724] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-09-29 08:06:58,797] INFO Log directory /kafka/kafka-logs-kafka not found, creating it. (kafka.log.LogManager)
[2019-09-29 08:06:58,814] INFO Loading logs. (kafka.log.LogManager)
[2019-09-29 08:06:58,834] INFO Logs loading complete in 20 ms. (kafka.log.LogManager)
[2019-09-29 08:06:58,869] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2019-09-29 08:06:58,877] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2019-09-29 08:06:59,505] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2019-09-29 08:06:59,549] INFO [SocketServer brokerId=1033] Created data-plane acceptor and processors for endpoint : EndPoint(null,9092,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.network.SocketServer)
[2019-09-29 08:06:59,550] INFO [SocketServer brokerId=1033] Started 1 acceptor threads for data-plane (kafka.network.SocketServer)
[2019-09-29 08:06:59,587] INFO [ExpirationReaper-1033-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,590] INFO [ExpirationReaper-1033-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,590] INFO [ExpirationReaper-1033-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,600] INFO [ExpirationReaper-1033-ElectPreferredLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,614] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2019-09-29 08:06:59,716] INFO Creating /brokers/ids/1033 (is it secure? false) (kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,743] INFO Stat of the created znode at /brokers/ids/1033 is: 776,776,1569744419734,1569744419734,1,0,0,72063325309108290,180,0,776
(kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,745] INFO Registered broker 1033 at path /brokers/ids/1033 with addresses: ArrayBuffer(EndPoint(kafka,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 776 (kafka.zk.KafkaZkClient)
[2019-09-29 08:06:59,748] WARN No meta.properties file under dir /kafka/kafka-logs-kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2019-09-29 08:06:59,882] INFO [ExpirationReaper-1033-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,888] INFO [ExpirationReaper-1033-Heartbeat]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,895] INFO [ExpirationReaper-1033-Rebalance]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-09-29 08:06:59,940] INFO [GroupCoordinator 1033]: Starting up. (kafka.coordinator.group.GroupCoordinator)
[2019-09-29 08:06:59,949] INFO [GroupCoordinator 1033]: Startup complete. (kafka.coordinator.group.GroupCoordinator)
[2019-09-29 08:06:59,961] INFO [GroupMetadataManager brokerId=1033] Removed 0 expired offsets in 17 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-09-29 08:06:59,990] INFO [ProducerId Manager 1033]: Acquired new producerId block (brokerId:1033,blockStartProducerId:21000,blockEndProducerId:21999) by writing to Zk with path version 22 (kafka.coordinator.transaction.ProducerIdManager)
[2019-09-29 08:07:00,044] INFO [TransactionCoordinator id=1033] Starting up. (kafka.coordinator.transaction.TransactionCoordinator)
[2019-09-29 08:07:00,056] INFO [Transaction Marker Channel Manager 1033]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2019-09-29 08:07:00,061] INFO [TransactionCoordinator id=1033] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2019-09-29 08:07:00,207] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-09-29 08:07:00,289] INFO [SocketServer brokerId=1033] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2019-09-29 08:07:00,326] INFO Kafka version: 2.3.0 (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,326] INFO Kafka commitId: fc1aaa116b661c8a (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,326] INFO Kafka startTimeMs: 1569744420299 (org.apache.kafka.common.utils.AppInfoParser)
[2019-09-29 08:07:00,341] INFO [KafkaServer id=1033] started (kafka.server.KafkaServer)
creating topics: newsrawdata:1:1
The log from zookeeper pod:
2019-09-29 08:06:58,003 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x10005366a620042 type:create cxid:0xd zxid:0x306 txntype:-1 reqpath:n/a Error Path:/config/brokers Error:KeeperErrorCode = NodeExists for /config/brokers
2019-09-29 08:07:00,421 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#596] - Got user-level KeeperException when processing sessionid:0x10005366a620042 type:multi cxid:0x3f zxid:0x30d txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election
2019-09-29 08:07:07,512 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /10.44.0.0:39244
2019-09-29 08:07:07,519 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /10.44.0.0:39244
2019-09-29 08:07:07,521 [myid:] - INFO [SyncThread:0:ZooKeeperServer#694] - Established session 0x10005366a620043 with negotiated timeout 30000 for client /10.44.0.0:39244
2019-09-29 08:07:08,034 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#487] - Processed session termination for sessionid: 0x10005366a620043
2019-09-29 08:07:08,045 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /10.44.0.0:39244 which had sessionid 0x10005366a620043
2019-09-29 08:07:13,180 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#138] - Purge task started.
2019-09-29 08:07:13,181 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#144] - Purge task completed.
2019-09-29 09:07:13,180 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#138] - Purge task started.
2019-09-29 09:07:13,182 [myid:] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask#144] - Purge task completed.
log from client app:
Kafka Ip Server:kafka:9092
warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
No XML encryptor configured. Key {16b9a9aa-732a-47ab-bd31-ce341be7f812} may be persisted to storage in unencrypted form.
Hosting environment: docker
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
We set the "BootstrapServers" to "kafka:9092" in the client app. Seems that client cat resolve the kafka in the cluster and see the IP of the kafka pod but no event occurs when we send PUT request. It worth noting that by using this config out of the kubernetes cluster with docker-compose it works as expected! what is wrong with this configuration?
Then make sure that your nodes have proper seletector: node for deployment broker has to have role:slave1 seletector. Otherwise just delete lines with nodeSelector from broker deployment file.
Then add lines to spec of your deployments configuration file:
selector:
matchLabels:
io.kompose.service: kafka
this one is for kafka.yaml
You don't have to label your services, specifying selectors is enough, so delete label field from services configuration files.
Then in kafka deployment configuration file chane lines :
name: KAFKA_ZOOKEEPER_CONNECT
#value: 192.168.88.42:30573
value: your_zookeeper_service_ip:2181
Line value should include ip of your zookeeper service and port 2181, if your zookeeper service have ip 192.168.88.42 value is proper.
Actually, the problem is related to the the kafka configuration (KAFKA_ADVERTISE_LISTENER, KAFKA_INTER_BROKER_LISTENER_NAME, KAFKA_LISTENERS). Anyway the below deployemnt.yaml config is working properly for me now, without any chang to the service files.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
labels:
io.kompose.service: kafka
name: kafka
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
io.kompose.service: kafka
spec:
containers:
- env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: kafka
- name: KAFKA_ADVERTISED_PORT
value: "9092"
- name: KAFKA_ADVERTISED_LISTENERS
value: INSIDE://:9092
- name: KAFKA_CREATE_TOPICS
value: newsrawdata:1:1
- name: KAFKA_INTER_BROKER_LISTENER_NAME
value: INSIDE
- name: KAFKA_LISTENERS
value: INSIDE://:9092
- name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
value: INSIDE:PLAINTEXT
- name: KAFKA_ZOOKEEPER_CONNECT
value: 192.168.88.207:30573
- name: KAFKA_PORT
value: "9092"
- name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
value: "1000"
image: wurstmeister/kafka
name: kafka
ports:
- containerPort: 9092
- containerPort: 9094
hostname: kafka
restartPolicy: Always
The kafka instance currently connect to the zookeeper and client apps can produce and consume properly to the kafka.