Kafka Connect Connector configuration is invalid and contains the following 2 error(s): Invalid value ... could not be found - kubernetes

I'm trying to make use of Schema Registry Transfer SMT plugin but I get following error.
PUT /connectors/kafka-connector-sb-16-sb-16/config returned 400 (Bad Request): Connector configuration is invalid and contains the following 2 error(s):
      Invalid value cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer for configuration transforms.AvroSchemaTransfer.type: Class cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer could not be found.
      Invalid value null for configuration transforms.AvroSchemaTransfer.type: Not a Transformation
      You can also find the above list of errors at the endpoint `/connector-plugins/{connectorType}/config/validate`
I use strimzi operator to manage my kafka objects in kubernetes.
Kafka Connect config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: {{.Release.Name}}-{{.Values.ENVNAME}}
annotations:
strimzi.io/use-connector-resources: "true"
labels:
app: {{.Release.Name}}
# owner: {{ .Values.owner | default "NotSet" }}
# branch: {{ .Values.branch | default "NotSet" | trunc 56 | trimSuffix "-" }}
spec:
logging:
type: inline
loggers:
connect.root.logger.level: "DEBUG"
template:
pod:
imagePullSecrets:
- name: wecapacrdev001-azurecr-io-pull-secret
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "9404"
prometheus.io/scrape: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
authentication:
type: scram-sha-512
username: {{.Values.ENVNAME}}
passwordSecret:
secretName: {{.Values.ENVNAME}}
password: password
bootstrapServers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
tls:
trustedCertificates:
- secretName: {{.Values.ENVNAME}}-cluster-ca-cert
certificate: ca.crt
config:
group.id: {{.Values.ENVNAME}}-connect-cluster
offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
config.storage.replication.factor: -1
offset.storage.replication.factor: -1
status.storage.replication.factor: -1
key.converter: org.apache.kafka.connect.converters.ByteArrayConverter
value.converter: org.apache.kafka.connect.converters.ByteArrayConverter
image: capcr.azurecr.io/devops/kafka:0.27.1-kafka-2.8.0-arm64-v2.7
jvmOptions:
-Xms: 3g
-Xmx: 3g
livenessProbe:
initialDelaySeconds: 30
timeoutSeconds: 15
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: metrics-config.yml
name: {{.Release.Name}}-{{.Values.ENVNAME}}
readinessProbe:
initialDelaySeconds: 300
timeoutSeconds: 15
replicas: 3
resources:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 3Gi
Connector config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: {{.Release.Name}}-{{.Values.ENVNAME}}
labels:
strimzi.io/cluster: kafka-connect-cluster-{{.Values.ENVNAME}}
app: {{.Release.Name}}
# owner: {{ .Values.owner | default "NotSet" }}
# branch: {{ .Values.branch | default "NotSet" | trunc 56 | trimSuffix "-" }}
spec:
class: org.apache.kafka.connect.mirror.MirrorSourceConnector
config:
offset-syncs.topic.replication.factor: "3"
refresh.topics.interval.seconds: "60"
replication.factor: "3"
key.converter: org.apache.kafka.connect.converters.ByteArrayConverter
value.converter: org.apache.kafka.connect.converters.ByteArrayConverter
source.cluster.alias: SPLITTED
source.cluster.auto.commit.interval.ms: "3000"
source.cluster.auto.offset.reset: earliest
source.cluster.bootstrap.servers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
source.cluster.fetch.max.bytes: "60502835"
source.cluster.group.id: {{.Values.ENVNAME}}-mirroring
source.cluster.max.poll.records: "100"
source.cluster.max.request.size: "60502835"
source.cluster.offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
source.cluster.config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
source.cluster.status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
source.cluster.producer.compression.type: gzip
source.cluster.replica.fetch.max.bytes: "60502835"
source.cluster.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule
required username="{{.Values.ENVNAME}}" password="{{.Values.kafka.password}}";
source.cluster.sasl.mechanism: SCRAM-SHA-512
source.cluster.security.protocol: SASL_SSL
source.cluster.ssl.keystore.location: /opt/kafka/keystore/kafka_keystore.p12
source.cluster.ssl.keystore.password: password
source.cluster.ssl.keystore.type: PKCS12
source.cluster.ssl.truststore.location: /opt/kafka/keystore/kafka_keystore.p12
source.cluster.ssl.truststore.password: password
source.cluster.ssl.truststore.type: PKCS12
sync.topic.acls.enabled: "false"
target.cluster.alias: AGGREGATED
target.cluster.auto.commit.interval.ms: "3000"
target.cluster.auto.offset.reset: earliest
target.cluster.bootstrap.servers: {{.Values.ENVNAME}}-kafka-bootstrap:9093
target.cluster.fetch.max.bytes: "60502835"
target.cluster.group.id: {{.Values.ENVNAME}}-test
target.cluster.max.poll.records: "100"
target.cluster.max.request.size: "60502835"
target.cluster.offset.storage.topic: {{.Values.ENVNAME}}-connect-cluster-offsets
target.cluster.config.storage.topic: {{.Values.ENVNAME}}-connect-cluster-configs
target.cluster.status.storage.topic: {{.Values.ENVNAME}}-connect-cluster-status
target.cluster.producer.compression.type: gzip
target.cluster.replica.fetch.max.bytes: "60502835"
target.cluster.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule
required username="{{.Values.ENVNAME}}" password="{{.Values.kafka.password}}";
target.cluster.sasl.mechanism: SCRAM-SHA-512
target.cluster.security.protocol: SASL_SSL
target.cluster.ssl.keystore.location: /opt/kafka/keystore/kafka_keystore.p12
target.cluster.ssl.keystore.password: password
target.cluster.ssl.keystore.type: PKCS12
target.cluster.ssl.truststore.location: /opt/kafka/keystore/kafka_keystore.p12
target.cluster.ssl.truststore.password: password
target.cluster.ssl.truststore.type: PKCS12
tasks.max: "4"
topics: ^topic-to-be-replicated$
topics.blacklist: ""
transforms: AvroSchemaTransfer
transforms.AvroSchemaTransfer.dest.schema.registry.url: http://schemaregistry-{{.Values.ENVNAME}}-cp-schema-registry:8081
transforms.AvroSchemaTransfer.src.schema.registry.url: http://schemaregistry-{{.Values.ENVNAME}}-cp-schema-registry:8081
transforms.AvroSchemaTransfer.transfer.message.keys: "false"
transforms.AvroSchemaTransfer.type: cricket.jmoore.kafka.connect.transforms.SchemaRegistryTransfer
transforms.TopicRename.regex: (.*)
transforms.TopicRename.replacement: replica.$1
transforms.TopicRename.type: org.apache.kafka.connect.transforms.RegexRouter
tasksMax: 4
I tried to build the plugin and build the docker image like described here:
ARG STRIMZI_BASE_IMAGE=0.29.0-kafka-3.1.1-arm64
FROM quay.io/strimzi/kafka:$STRIMZI_BASE_IMAGE
USER root:root
COPY schema-registry-transfer-smt/target/schema-registry-transfer-smt-0.2.1.jar /opt/kafka/plugins/
USER 1001
I can see in logs that the plugin is loaded
INFO Registered loader: PluginClassLoader{pluginLocation=file:/opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
INFO Loading plugin from: /opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
DEBUG Loading plugin urls: [file:/opt/kafka/plugins/schema-registry-transfer-smt-0.2.1.jar] (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) [main]
However, when I execute kubectl describe KafkaConnect ..., I get following status which probably means the plugin is not loaded.
Status:
Conditions:
Last Transition Time: 2022-10-31T07:22:43.749290Z
Status: True
Type: Ready
Connector Plugins:
Class: org.apache.kafka.connect.file.FileStreamSinkConnector
Type: sink
Version: 2.8.0
Class: org.apache.kafka.connect.file.FileStreamSourceConnector
Type: source
Version: 2.8.0
Class: org.apache.kafka.connect.mirror.MirrorCheckpointConnector
Type: source
Version: 1
Class: org.apache.kafka.connect.mirror.MirrorHeartbeatConnector
Type: source
Version: 1
Class: org.apache.kafka.connect.mirror.MirrorSourceConnector
Type: source
Version: 1
The logs prints classpath jvm.classpath = ... and it doesn't "include schema-registry-transfer-smt-0.2.1.jar".
This look like configuration error. I tried to create and use fat jar, compile with java11, change scope of provided dependencies and many more. The repository's last commit was created in 2019. Does it make sense to bump all dependency versions? The error message doesn't suggest that.

Related

Grafana alert provisioning issue

I'd like to be able to provision alerts, and try to follow this instructions, but no luck!
OS Grafana version: 9.2.0 Running in Kubernetes
Steps that I take:
Created a new alert rule from UI.
Extract alert rule from API:
curl -k https://<my-grafana-url>/api/v1/provisioning/alert-rules/-4pMuQFVk -u admin:<my-admin-password>
It returns the following:
---
id: 18
uid: "-4pMuQFVk"
orgID: 1
folderUID: 3i72aQKVk
ruleGroup: cpu_alert_group
title: my_cpu_alert
condition: B
data:
- refId: A
queryType: ''
relativeTimeRange:
from: 600
to: 0
datasourceUid: _SaubQF4k
model:
editorMode: code
expr: system_cpu_usage
hide: false
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: true
refId: A
- refId: B
queryType: ''
relativeTimeRange:
from: 0
to: 0
datasourceUid: "-100"
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: "-100"
expression: A
hide: false
intervalMs: 1000
maxDataPoints: 43200
refId: B
type: classic_conditions
updated: '2022-12-07T20:01:47Z'
noDataState: NoData
execErrState: Error
for: 5m
Deleted the alert rule from UI.
Made a configmap from above alert-rule, as such:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-alerting
data:
alerting.yaml: |-
apiVersion: 1
groups:
- id: 18
uid: "-4pMuQFVk"
orgID: 1
folderUID: 3i72aQKVk
ruleGroup: cpu_alert_group
title: my_cpu_alert
condition: B
data:
- refId: A
queryType: ''
relativeTimeRange:
from: 600
to: 0
datasourceUid: _SaubQF4k
model:
editorMode: code
expr: system_cpu_usage
hide: false
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: true
refId: A
- refId: B
queryType: ''
relativeTimeRange:
from: 0
to: 0
datasourceUid: "-100"
model:
conditions:
- evaluator:
params:
- 3
type: gt
operator:
type: and
query:
params:
- A
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: "-100"
expression: A
hide: false
intervalMs: 1000
maxDataPoints: 43200
refId: B
type: classic_conditions
updated: '2022-12-07T20:01:47Z'
noDataState: NoData
execErrState: Error
for: 5m
I mount above configmap in grafana container (in /etc/grafna/provisioning/alerting).
The full manifest of the deployment is as follow:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: grafana
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-12-08T18:31:30Z"
generation: 4
labels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: grafana
app.kubernetes.io/version: 9.3.0
helm.sh/chart: grafana-6.46.1
name: grafana
namespace: monitoring
resourceVersion: "648617"
uid: dc06b802-5281-4f31-a2b2-fef3cf53a70b
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
checksum/config: 98cac51656714db48a116d3109994ee48c401b138bc8459540e1a497f994d197
checksum/dashboards-json-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/secret: 203f0e4d883461bdd41fe68515fc47f679722dc2fdda49b584209d1d288a5f07
creationTimestamp: null
labels:
app.kubernetes.io/instance: grafana
app.kubernetes.io/name: grafana
spec:
automountServiceAccountToken: true
containers:
- env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
key: admin-user
name: grafana
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
key: admin-password
name: grafana
- name: GF_PATHS_DATA
value: /var/lib/grafana/
- name: GF_PATHS_LOGS
value: /var/log/grafana
- name: GF_PATHS_PLUGINS
value: /var/lib/grafana/plugins
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning
image: grafana/grafana:9.3.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: grafana
ports:
- containerPort: 3000
name: grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/grafana/grafana.ini
name: config
subPath: grafana.ini
- mountPath: /etc/grafana/provisioning/alerting
name: grafana-alerting
- mountPath: /var/lib/grafana
name: storage
dnsPolicy: ClusterFirst
enableServiceLinks: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 472
runAsGroup: 472
runAsUser: 472
serviceAccount: grafana
serviceAccountName: grafana
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: grafana-alerting
name: grafana-alerting
- configMap:
defaultMode: 420
name: grafana
name: config
- emptyDir: {}
name: storage
However, grafana fail to start with this error:
Failed to start grafana. error: failure to map file alerting.yaml: failure parsing rules: rule group has no name set
I fixed above error by adding Group Name, but similar errors about other missing elements kept showing up again and again (to the point that I stopped chasing, as I couldn't figure out what exactly the correct schema is). Digging in, it looks like the format/schema returned from API in step 2, is different than the schema that's pointed out in the documentation.
Why is the schema of the alert-rule returned from API different? Am I supposed to convert it, and if so how? Otherwise, what am I doing wrong?
Edit: Replaced Statefulset with deployment, since I was able to reproduce this in a normal/minimal Grafana deployment too.

TLS error Nginx-ingress-controller fails to start after AKS upgrade to v1.23.5 from 1.21- traefik still tries to get from *v1beta1.Ingress

We deploy service with helm. The ingress template looks like that:
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ui-app-ingress
{{- with .Values.ingress.annotations}}
annotations:
{{- toYaml . | nindent 4}}
{{- end}}
spec:
rules:
- host: {{ .Values.ingress.hostname }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "ui-app-chart.fullname" . }}
port:
number: 80
tls:
- hosts:
- {{ .Values.ingress.hostname }}
secretName: {{ .Values.ingress.certname }}
as you can see, we already use networking.k8s.io/v1 but if i watch the treafik logs, i find this error:
1 reflector.go:138] pkg/mod/k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.Ingress: failed to list *v1beta1.Ingress: the server could not find the requested resource (get ingresses.extensions)
what results in tls cert error:
time="2022-06-07T15:40:35Z" level=debug msg="Serving default certificate for request: \"example.de\""
time="2022-06-07T15:40:35Z" level=debug msg="http: TLS handshake error from 10.1.0.4:57484: remote error: tls: unknown certificate"
time="2022-06-07T15:40:35Z" level=debug msg="Serving default certificate for request: \"example.de\""
time="2022-06-07T15:53:06Z" level=debug msg="Serving default certificate for request: \"\""
time="2022-06-07T16:03:31Z" level=debug msg="Serving default certificate for request: \"<ip-adress>\""
time="2022-06-07T16:03:32Z" level=debug msg="Serving default certificate for request: \"<ip-adress>\""
PS C:\WINDOWS\system32>
i already found out that networking.k8s.io/v1beta1 is not longer served, but networking.k8s.io/v1 was defined in the template all the time as ApiVersion.
Why does it still try to get from v1beta1? And how can i fix this?
We use this TLSOptions:
apiVersion: traefik.containo.us/v1alpha1
kind: TLSOption
metadata:
name: default
namespace: default
spec:
minVersion: VersionTLS12
maxVersion: VersionTLS13
cipherSuites:
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
we use helm-treafik rolled out with terraform:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: traefik
creationTimestamp: "2021-06-12T10:06:11Z"
generation: 2
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-9.19.1
name: traefik
namespace: traefik
resourceVersion: "86094434"
uid: 903a6f54-7698-4290-bc59-d234a191965c
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/name: traefik
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-9.19.1
spec:
containers:
- args:
- --global.checknewversion
- --global.sendanonymoususage
- --entryPoints.traefik.address=:9000/tcp
- --entryPoints.web.address=:8000/tcp
- --entryPoints.websecure.address=:8443/tcp
- --api.dashboard=true
- --ping=true
- --providers.kubernetescrd
- --providers.kubernetesingress
- --providers.file.filename=/etc/traefik/traefik.yml
- --accesslog=true
- --accesslog.format=json
- --log.level=DEBUG
- --entrypoints.websecure.http.tls
- --entrypoints.web.http.redirections.entrypoint.to=websecure
- --entrypoints.web.http.redirections.entrypoint.scheme=https
- --entrypoints.web.http.redirections.entrypoint.permanent=true
- --entrypoints.web.http.redirections.entrypoint.to=:443
image: traefik:2.4.8
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
name: traefik
ports:
- containerPort: 9000
name: traefik
protocol: TCP
- containerPort: 8000
name: web
protocol: TCP
- containerPort: 8443
name: websecure
protocol: TCP
readinessProbe:
failureThreshold: 1
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
resources: {}
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data
- mountPath: /tmp
name: tmp
- mountPath: /etc/traefik
name: traefik-cm
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65532
serviceAccount: traefik
serviceAccountName: traefik
terminationGracePeriodSeconds: 60
tolerations:
- effect: NoSchedule
key: env
operator: Equal
value: conhub
volumes:
- emptyDir: {}
name: data
- emptyDir: {}
name: tmp
- configMap:
defaultMode: 420
name: traefik-cm
name: traefik-cm
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2022-06-07T09:19:58Z"
lastUpdateTime: "2022-06-07T09:19:58Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-12T10:06:11Z"
lastUpdateTime: "2022-06-07T16:39:01Z"
message: ReplicaSet "traefik-84c6f5f98b" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 3
replicas: 3
updatedReplicas: 3
resource "helm_release" "traefik" {
name = "traefik"
namespace = "traefik"
create_namespace = true
repository = "https://helm.traefik.io/traefik"
chart = "traefik"
set {
name = "service.spec.loadBalancerIP"
value = azurerm_public_ip.pub_ip.ip_address
}
set {
name = "service.annotations.service\\.beta\\.kubernetes\\.io/azure-load-balancer-resource-group"
value = var.resource_group_aks
}
set {
name = "additionalArguments"
value = "{--accesslog=true,--accesslog.format=json,--log.level=DEBUG,--entrypoints.websecure.http.tls,--entrypoints.web.http.redirections.entrypoint.to=websecure,--entrypoints.web.http.redirections.entrypoint.scheme=https,--entrypoints.web.http.redirections.entrypoint.permanent=true,--entrypoints.web.http.redirections.entrypoint.to=:443}"
}
set {
name = "deployment.replicas"
value = 3
}
timeout = 600
depends_on = [
azurerm_kubernetes_cluster.aks
]
}
I found out that the problem was the version of the traefik image:
i quick fixed it by setting the latest image:
kubectl set image deployment/traefik traefik=traefik:2.7.0 -n traefik

can't get custom metrics for hpa from datadog

hey guys i’m trying to setup datadog as custom metric for my kubernetes hpa using the official guide:
https://docs.datadoghq.com/agent/cluster_agent/external_metrics/?tab=helm
running on EKS 1.18 & Datadog Cluster Agent (v1.10.0).
the problem is that i can't get the external metrics's for my HPA:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hibob-hpa
spec:
minReplicas: 1
maxReplicas: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: something
metrics:
- type: External
external:
metricName: **kubernetes_state.container.cpu_limit**
metricSelector:
matchLabels:
pod: **something-54c4bd4db7-pm9q5**
targetAverageValue: 9
horizontal-pod-autoscaler unable to get external metric:
canary/nginx.net.request_per_s/&LabelSelector{MatchLabels:map[string]string{kube_app_name: nginx,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: the server is currently unable to handle the request (get nginx.net.request_per_s.external.metrics.k8s.io)
This is the errors i'm getting inside the cluster-agent:
datadog-cluster-agent-585897dc8d-x8l82 cluster-agent 2021-08-20 06:46:14 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/metrics_retriever.go:77 in retrieveMetricsValues) | Unable to fetch external metrics: [Error while executing metric query avg:nginx.net.request_per_s{kubea_app_name:ingress-nginx}.rollup(30): API error 403 Forbidden: {"status":********#datadoghq.com"}, strconv.Atoi: parsing "": invalid syntax]
# datadog-cluster-agent status
Getting the status from the agent.
2021-08-19 15:28:21 UTC | CLUSTER | WARN | (pkg/util/log/log.go:541 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
===============================
Datadog Cluster Agent (v1.10.0)
===============================
Status date: 2021-08-19 15:28:21.519850 UTC
Agent start: 2021-08-19 12:11:44.266244 UTC
Pid: 1
Go Version: go1.14.12
Build arch: amd64
Agent flavor: cluster_agent
Check Runners: 4
Log Level: INFO
Paths
=====
Config File: /etc/datadog-agent/datadog-cluster.yaml
conf.d: /etc/datadog-agent/conf.d
Clocks
======
System UTC time: 2021-08-19 15:28:21.519850 UTC
Hostnames
=========
ec2-hostname: ip-10-30-162-8.eu-west-1.compute.internal
hostname: i-00d0458844a597dec
instance-id: i-00d0458844a597dec
socket-fqdn: datadog-cluster-agent-585897dc8d-x8l82
socket-hostname: datadog-cluster-agent-585897dc8d-x8l82
hostname provider: aws
unused hostname providers:
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
Metadata
========
Leader Election
===============
Leader Election Status: Running
Leader Name is: datadog-cluster-agent-585897dc8d-x8l82
Last Acquisition of the lease: Thu, 19 Aug 2021 12:13:14 UTC
Renewed leadership: Thu, 19 Aug 2021 15:28:07 UTC
Number of leader transitions: 17 transitions
Custom Metrics Server
=====================
External metrics provider uses DatadogMetric - Check status directly from Kubernetes with: `kubectl get datadogmetric`
Admission Controller
====================
Disabled: The admission controller is not enabled on the Cluster Agent
=========
Collector
=========
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 787
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 660
Service Checks: Last Run: 3, Total: 2,343
Average Execution Time : 1.898s
Last Execution Date : 2021-08-19 15:28:17.000000 UTC
Last Successful Execution Date : 2021-08-19 15:28:17.000000 UTC
=========
Forwarder
=========
Transactions
============
Deployments: 350
Dropped: 0
DroppedOnInput: 0
Nodes: 497
Pods: 3
ReplicaSets: 576
Requeued: 0
Retried: 0
RetryQueueSize: 0
Services: 263
Transaction Successes
=====================
Total number: 3442
Successes By Endpoint:
check_run_v1: 786
intake: 181
orchestrator: 1,689
series_v1: 786
==========
Endpoints
==========
https://app.datadoghq.eu - API Key ending with:
- f295b
=====================
Orchestrator Explorer
=====================
ClusterID: f7b4f97a-3cf2-11ea-aaa8-0a158f39909c
ClusterName: production
ContainerScrubbing: Enabled
======================
Orchestrator Endpoints
======================
===============
Forwarder Stats
===============
Pods: 3
Deployments: 350
ReplicaSets: 576
Services: 263
Nodes: 497
===========
Cache Stats
===========
Elements in the cache: 393
Pods:
Last Run: (Hits: 0 Miss: 0) | Total: (Hits: 7 Miss: 5)
Deployments:
Last Run: (Hits: 36 Miss: 1) | Total: (Hits: 40846 Miss: 2444)
ReplicaSets:
Last Run: (Hits: 297 Miss: 1) | Total: (Hits: 328997 Miss: 19441)
Services:
Last Run: (Hits: 44 Miss: 0) | Total: (Hits: 49520 Miss: 2919)
Nodes:
Last Run: (Hits: 9 Miss: 0) | Total: (Hits: 10171 Miss: 755)```
and this is what i get from datadogmetric:
Name: dcaautogen-2f116f4425658dca91a33dd22a3d943bae5b74
Namespace: datadog
Labels: <none>
Annotations: <none>
API Version: datadoghq.com/v1alpha1
Kind: DatadogMetric
Metadata:
Creation Timestamp: 2021-08-19T15:14:14Z
Generation: 1
Managed Fields:
API Version: datadoghq.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:status:
.:
f:autoscalerReferences:
f:conditions:
.:
k:{"type":"Active"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
k:{"type":"Error"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:message:
f:reason:
f:status:
f:type:
k:{"type":"Updated"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
k:{"type":"Valid"}:
.:
f:lastTransitionTime:
f:lastUpdateTime:
f:status:
f:type:
f:currentValue:
Manager: datadog-cluster-agent
Operation: Update
Time: 2021-08-19T15:14:44Z
Resource Version: 164942235
Self Link: /apis/datadoghq.com/v1alpha1/namespaces/datadog/datadogmetrics/dcaautogen-2f116f4425658dca91a33dd22a3d943bae5b74
UID: 6e9919eb-19ca-4131-b079-4a8a9ac577bb
Spec:
External Metric Name: nginx.net.request_per_s
Query: avg:nginx.net.request_per_s{kube_app_name:nginx}.rollup(30)
Status:
Autoscaler References: canary/hibob-hpa
Conditions:
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: True
Type: Active
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: False
Type: Valid
Last Transition Time: 2021-08-19T15:14:14Z
Last Update Time: 2021-08-19T15:53:14Z
Status: True
Type: Updated
Last Transition Time: 2021-08-19T15:14:44Z
Last Update Time: 2021-08-19T15:53:14Z
Message: Global error (all queries) from backend
Reason: Unable to fetch data from Datadog
Status: True
Type: Error
Current Value: 0
Events: <none>
this is my cluster agent deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "18"
meta.helm.sh/release-name: datadog
meta.helm.sh/release-namespace: datadog
creationTimestamp: "2021-02-05T07:36:39Z"
generation: 18
labels:
app.kubernetes.io/instance: datadog
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: datadog
app.kubernetes.io/version: "7"
helm.sh/chart: datadog-2.7.0
name: datadog-cluster-agent
namespace: datadog
resourceVersion: "164881216"
selfLink: /apis/apps/v1/namespaces/datadog/deployments/datadog-cluster-agent
uid: ec52bb4b-62af-4007-9bab-d5d16c48e02c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: datadog-cluster-agent
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
ad.datadoghq.com/cluster-agent.check_names: '["prometheus"]'
ad.datadoghq.com/cluster-agent.init_configs: '[{}]'
ad.datadoghq.com/cluster-agent.instances: |
[{
"prometheus_url": "http://%%host%%:5000/metrics",
"namespace": "datadog.cluster_agent",
"metrics": [
"go_goroutines", "go_memstats_*", "process_*",
"api_requests",
"datadog_requests", "external_metrics", "rate_limit_queries_*",
"cluster_checks_*"
]
}]
checksum/api_key: something
checksum/application_key: something
checksum/clusteragent_token: something
checksum/install_info: something
creationTimestamp: null
labels:
app: datadog-cluster-agent
name: datadog-cluster-agent
spec:
containers:
- env:
- name: DD_HEALTH_PORT
value: "5555"
- name: DD_API_KEY
valueFrom:
secretKeyRef:
key: api-key
name: datadog
optional: true
- name: DD_APP_KEY
valueFrom:
secretKeyRef:
key: app-key
name: datadog-appkey
- name: DD_EXTERNAL_METRICS_PROVIDER_ENABLED
value: "true"
- name: DD_EXTERNAL_METRICS_PROVIDER_PORT
value: "8443"
- name: DD_EXTERNAL_METRICS_PROVIDER_WPA_CONTROLLER
value: "false"
- name: DD_EXTERNAL_METRICS_PROVIDER_USE_DATADOGMETRIC_CRD
value: "true"
- name: DD_EXTERNAL_METRICS_AGGREGATOR
value: avg
- name: DD_CLUSTER_NAME
value: production
- name: DD_SITE
value: datadoghq.eu
- name: DD_LOG_LEVEL
value: INFO
- name: DD_LEADER_ELECTION
value: "true"
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_CLUSTER_AGENT_KUBERNETES_SERVICE_NAME
value: datadog-cluster-agent
- name: DD_CLUSTER_AGENT_AUTH_TOKEN
valueFrom:
secretKeyRef:
key: token
name: datadog-cluster-agent
- name: DD_KUBE_RESOURCES_NAMESPACE
value: datadog
- name: DD_ORCHESTRATOR_EXPLORER_ENABLED
value: "true"
- name: DD_ORCHESTRATOR_EXPLORER_CONTAINER_SCRUBBING_ENABLED
value: "true"
- name: DD_COMPLIANCE_CONFIG_ENABLED
value: "false"
image: gcr.io/datadoghq/cluster-agent:1.10.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 6
httpGet:
path: /live
port: 5555
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
name: cluster-agent
ports:
- containerPort: 5005
name: agentport
protocol: TCP
- containerPort: 8443
name: metricsapi
protocol: TCP
readinessProbe:
failureThreshold: 6
httpGet:
path: /ready
port: 5555
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/datadog-agent/install_info
name: installinfo
readOnly: true
subPath: install_info
dnsConfig:
options:
- name: ndots
value: "3"
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: datadog-cluster-agent
serviceAccountName: datadog-cluster-agent
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: datadog-installinfo
name: installinfo
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-05-13T15:46:33Z"
lastUpdateTime: "2021-05-13T15:46:33Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-02-05T07:36:39Z"
lastUpdateTime: "2021-08-19T12:12:06Z"
message: ReplicaSet "datadog-cluster-agent-585897dc8d" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 18
readyReplicas: 1
replicas: 1
updatedReplicas: 1
For the record i got this sorted.
According to the helm default values file you must set the app key in order to use metrics provider:
# datadog.appKey -- Datadog APP key required to use metricsProvider
## If you are using clusterAgent.metricsProvider.enabled = true, you must set
## a Datadog application key for read access to your metrics.
appKey: # <DATADOG_APP_KEY>
I guess this is a lack of information in the docs and also a check that is missing at the cluster-agent startup. Going to open an issue about it.
From the official documentation on troubleshooting the agent here, you have:
If you see the following error when describing the HPA manifest:
Warning FailedComputeMetricsReplicas 3s (x2 over 33s) horizontal-pod-autoscaler failed to get nginx.net.request_per_s external metric: unable to get external metric default/nginx.net.request_per_s/&LabelSelector{MatchLabels:map[string]string{kube_container_name: nginx,},MatchExpressions:[],}: unable to fetch metrics from external metrics API: the server is currently unable to handle the request (get nginx.net.request_per_s.external.metrics.k8s.io)
Make sure the Datadog Cluster Agent is running, and the service exposing the port 8443, whose name is registered in the APIService, is up.
I believe the key phrase here is whose name is registered in the APIService. Did you perform the API Service registration for your external metrics service? This source should provide some details on how to set it up. Since you're getting 403 - Unauthorized errors, it simply implies the TLS setup is causing issues.
Perhaps you can follow the guide in general and ensure that your node-agent is functioning correctly and has token environment variable correctly configured.

HELM UPGRADE ISSUE: spec.template.spec.containers[0].volumeMounts[2].name: Not found: "NAME"

I have been trying to create a POD with HELM UPGRADE:
helm upgrade --values=$(System.DefaultWorkingDirectory)/_NAME-deploy-CI/drop/values-NAME.yaml --namespace sda-NAME-pro --install --reset-values --debug --wait NAME .
but running into below error:
2020-07-08T12:51:28.0678161Z upgrade.go:367: [debug] warning: Upgrade "NAME" failed: failed to create resource: Deployment.apps "NAME" is invalid: [spec.template.spec.volumes[1].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[2].name: Not found: "secretvol"]
2020-07-08T12:51:28.0899772Z Error: UPGRADE FAILED: failed to create resource: Deployment.apps "NAME" is invalid: [spec.template.spec.volumes[1].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[2].name: Not found: "secretvol"]
YML part
volumeMounts:
- name: secretvol
mountPath: "/etc/secret-vol"
readOnly: true
volumes:
- name: jks
secret:
secretName: {{ .Values.secret.jks }}
- name: secretvol
secret:
secretName: {{ .Values.secret.secretvol }}
Maybe, the first deploy need another command the first time? how can I specify these value to test it?
TL;DR
The issue you've encountered:
2020-07-08T12:51:28.0899772Z Error: UPGRADE FAILED: failed to create resource: Deployment.apps "NAME" is invalid: [spec.template.spec.volumes[1].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[2].name: Not found: "secretvol"]
is connected with the fact that the variable: {{ .Values.secret.secretvol }} is missing.
To fix it you will need to set this value in either:
Helm command that you are using
File that stores your values in the Helm's chart.
A tip!
You can run your Helm command with --debug --dry-run to output generated YAML's. This should show you where the errors could be located.
There is an official documentation about values in Helm. Please take a look here:
Helm.sh: Docs: Chart template guid: Values files
Basing on
I have been trying to create a POD with HELM UPGRADE:
I've made an example basing on your issue and how you can fix it.
Steps:
Create a helm chart with correct values
Edit the values to reproduce the error
Create a helm chart
For the simplicity of the setup I created basic Helm chart.
Below is the structure of files and directories:
❯ tree helm-dir
helm-dir
├── Chart.yaml
├── templates
│ └── pod.yaml
└── values.yaml
1 directory, 3 files
Create Chart.yaml file
Below is the Chart.yaml file:
apiVersion: v2
name: helm-pod
description: A Helm chart for spawning pod with volumeMount
version: 0.1.0
Create a values.yaml file
Below is the simple values.yaml file which will be used by default in the $ helm install command
usedImage: ubuntu
confidentialName: secret-password # name of the secret in Kubernetes
Create a template for a pod
This template is stored in templates directory with a name pod.yaml
Below YAML definition will be a template for spawned pod:
apiVersion: v1
kind: Pod
metadata:
name: {{ .Values.usedImage }} # value from "values.yaml"
labels:
app: {{ .Values.usedImage }} # value from "values.yaml"
spec:
restartPolicy: Never
containers:
- name: {{ .Values.usedImage }} # value from "values.yaml"
image: {{ .Values.usedImage }} # value from "values.yaml"
imagePullPolicy: Always
command:
- sleep
- infinity
volumeMounts:
- name: secretvol # same name as in spec.volumes.name
mountPath: "/etc/secret-vol"
readOnly: true
volumes:
- name: secretvol # same name as in spec.containers.volumeMounts.name
secret:
secretName: {{ .Values.confidentialName }} # value from "values.yaml"
With above example you should be able to run $ helm install --name test-pod .
You should get output similar to this:
NAME: test-pod
LAST DEPLOYED: Thu Jul 9 14:47:46 2020
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Pod
NAME READY STATUS RESTARTS AGE
ubuntu 0/1 ContainerCreating 0 0s
Disclaimer!
The ubuntu pod is in the ContainerCreating state as there is no secret named secret-password in the cluster.
You can get more information about your pods by running:
$ kubectl describe pod POD_NAME
Edit the values to reproduce the error
The error you got as described earlier is most probably connected with the fact that the value: {{ .Values.secret.secretvol }} was missing.
If you were to edit the values.yaml file to:
usedImage: ubuntu
# confidentialName: secret-password # name of the secret in Kubernetes
Notice the added #.
You should get below error when trying to deploy this chart:
Error: release test-pod failed: Pod "ubuntu" is invalid: [spec.volumes[0].secret.secretName: Required value, spec.containers[0].volumeMounts[0].name: Not found: "secretvol"]
I previously mentioned the --debug --dry-run parameters for Helm.
If you run:
$ helm install --name test-pod --debug --dry-run .
You should get the output similar to this (this is only the part):
---
# Source: helm-pod/templates/pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ubuntu # value from "values.yaml"
labels:
app: ubuntu # value from "values.yaml"
spec:
restartPolicy: Never
containers:
- name: ubuntu # value from "values.yaml"
image: ubuntu # value from "values.yaml"
imagePullPolicy: Always
command:
- sleep
- infinity
volumeMounts:
- name: secretvol # same name as in spec.volumes.name
mountPath: "/etc/secret-vol"
readOnly: true
volumes:
- name: secretvol # same name as in spec.containers.volumeMounts.name
secret:
secretName: # value from "values.yaml"
As you can see the value of secretName was missing. That's the reason above error was showing up.
secretName: # value from "values.yaml"
Thank you Dawik, here we have the output:
2020-07-10T11:34:26.3090526Z LAST DEPLOYED: Fri Jul 10 11:34:25 2020
2020-07-10T11:34:26.3091661Z NAMESPACE: sda-NAME
2020-07-10T11:34:26.3092410Z STATUS: pending-upgrade
2020-07-10T11:34:26.3092796Z REVISION: 13
2020-07-10T11:34:26.3093182Z TEST SUITE: None
2020-07-10T11:34:26.3093781Z USER-SUPPLIED VALUES:
2020-07-10T11:34:26.3105880Z affinity: {}
2020-07-10T11:34:26.3106801Z containers:
2020-07-10T11:34:26.3107446Z port: 8080
2020-07-10T11:34:26.3108124Z portName: http
2020-07-10T11:34:26.3108769Z protocol: TCP
2020-07-10T11:34:26.3109440Z env:
2020-07-10T11:34:26.3110613Z APP_NAME: NAME
2020-07-10T11:34:26.3112959Z JAVA_OPTS_EXT: -Djava.security.egd=file:/dev/./urandom -Dcom.sun.net.ssl.checkRevocation=*** -Djavax.net.ssl.trustStore=/etc/truststore/jssecacerts
2020-07-10T11:34:26.3115219Z -Djavax.net.ssl.trustStorePassword=changeit
2020-07-10T11:34:26.3116160Z SPRING_CLOUD_CONFIG_PROFILE: pro
2020-07-10T11:34:26.3116974Z TZ: Europe/Madrid
2020-07-10T11:34:26.3117647Z WILY_MOM_PORT: 5001
2020-07-10T11:34:26.3119640Z spring_application_name: NAME
2020-07-10T11:34:26.3121048Z spring_cloud_config_uri: URI
2020-07-10T11:34:26.3122038Z envSecrets: {}
2020-07-10T11:34:26.3122789Z fullnameOverride: ""
2020-07-10T11:34:26.3123489Z image:
2020-07-10T11:34:26.3124470Z pullPolicy: Always
2020-07-10T11:34:26.3125908Z repository: NAME-REPO
2020-07-10T11:34:26.3126955Z imagePullSecrets: []
2020-07-10T11:34:26.3127675Z ingress:
2020-07-10T11:34:26.3128727Z enabled: ***
2020-07-10T11:34:26.3129509Z livenessProbe: {}
2020-07-10T11:34:26.3130143Z nameOverride: ""
2020-07-10T11:34:26.3131148Z nameSpace: sda-NAME
2020-07-10T11:34:26.3131820Z nodeSelector: {}
2020-07-10T11:34:26.3132444Z podSecurityContext: {}
2020-07-10T11:34:26.3133135Z readinessProbe: {}
2020-07-10T11:34:26.3133742Z replicaCount: 1
2020-07-10T11:34:26.3134636Z resources:
2020-07-10T11:34:26.3135362Z limits:
2020-07-10T11:34:26.3135865Z cpu: 150m
2020-07-10T11:34:26.3136404Z memory: 1444Mi
2020-07-10T11:34:26.3137257Z requests:
2020-07-10T11:34:26.3137851Z cpu: 100m
2020-07-10T11:34:26.3138391Z memory: 1024Mi
2020-07-10T11:34:26.3138942Z route:
2020-07-10T11:34:26.3139486Z alternateBackends: []
2020-07-10T11:34:26.3140087Z annotations: null
2020-07-10T11:34:26.3140642Z enabled: true
2020-07-10T11:34:26.3141226Z fullnameOverride: ""
2020-07-10T11:34:26.3142695Z host:HOST-NAME
2020-07-10T11:34:26.3143480Z labels: null
2020-07-10T11:34:26.3144217Z nameOverride: ""
2020-07-10T11:34:26.3145137Z path: ""
2020-07-10T11:34:26.3145637Z service:
2020-07-10T11:34:26.3146439Z name: NAME
2020-07-10T11:34:26.3147049Z targetPort: http
2020-07-10T11:34:26.3147607Z weight: 100
2020-07-10T11:34:26.3148121Z status: ""
2020-07-10T11:34:26.3148623Z tls:
2020-07-10T11:34:26.3149162Z caCertificate: null
2020-07-10T11:34:26.3149820Z certificate: null
2020-07-10T11:34:26.3150467Z destinationCACertificate: null
2020-07-10T11:34:26.3151091Z enabled: true
2020-07-10T11:34:26.3151847Z insecureEdgeTerminationPolicy: None
2020-07-10T11:34:26.3152483Z key: null
2020-07-10T11:34:26.3153032Z termination: edge
2020-07-10T11:34:26.3154104Z wildcardPolicy: None
2020-07-10T11:34:26.3155687Z secret:
2020-07-10T11:34:26.3156714Z jks: NAME-jks
2020-07-10T11:34:26.3157408Z jssecacerts: jssecacerts
2020-07-10T11:34:26.3157962Z securityContext: {}
2020-07-10T11:34:26.3158490Z service:
2020-07-10T11:34:26.3159127Z containerPort: 8080
2020-07-10T11:34:26.3159627Z port: 8080
2020-07-10T11:34:26.3160103Z portName: http
2020-07-10T11:34:26.3160759Z targetPort: 8080
2020-07-10T11:34:26.3161219Z type: ClusterIP
2020-07-10T11:34:26.3161694Z serviceAccount:
2020-07-10T11:34:26.3162482Z create: ***
2020-07-10T11:34:26.3162990Z name: null
2020-07-10T11:34:26.3163451Z tolerations: []
2020-07-10T11:34:26.3163836Z
2020-07-10T11:34:26.3164534Z COMPUTED VALUES:
2020-07-10T11:34:26.3165022Z affinity: {}
2020-07-10T11:34:26.3165474Z containers:
2020-07-10T11:34:26.3165931Z port: 8080
2020-07-10T11:34:26.3166382Z portName: http
2020-07-10T11:34:26.3166861Z protocol: TCP
2020-07-10T11:34:26.3167284Z env:
2020-07-10T11:34:26.3168046Z APP_NAME: NAME
2020-07-10T11:34:26.3169887Z JAVA_OPTS_EXT: -Djava.security.egd=file:/dev/./urandom -Dcom.sun.net.ssl.checkRevocation=*** -Djavax.net.ssl.trustStore=/etc/truststore/jssecacerts
2020-07-10T11:34:26.3175782Z -Djavax.net.ssl.trustStorePassword=changeit
2020-07-10T11:34:26.3176587Z SPRING_CLOUD_CONFIG_PROFILE: pro
2020-07-10T11:34:26.3177184Z TZ: Europe/Madrid
2020-07-10T11:34:26.3177683Z WILY_MOM_PORT: 5001
2020-07-10T11:34:26.3178559Z spring_application_name: NAME
2020-07-10T11:34:26.3179807Z spring_cloud_config_uri: https://URL
2020-07-10T11:34:26.3181055Z envSecrets: {}
2020-07-10T11:34:26.3181569Z fullnameOverride: ""
2020-07-10T11:34:26.3182077Z image:
2020-07-10T11:34:26.3182707Z pullPolicy: Always
2020-07-10T11:34:26.3184026Z repository: REPO
2020-07-10T11:34:26.3185001Z imagePullSecrets: []
2020-07-10T11:34:26.3185461Z ingress:
2020-07-10T11:34:26.3186215Z enabled: ***
2020-07-10T11:34:26.3186709Z livenessProbe: {}
2020-07-10T11:34:26.3187187Z nameOverride: ""
2020-07-10T11:34:26.3188416Z nameSpace: sda-NAME
2020-07-10T11:34:26.3189008Z nodeSelector: {}
2020-07-10T11:34:26.3189522Z podSecurityContext: {}
2020-07-10T11:34:26.3190056Z readinessProbe: {}
2020-07-10T11:34:26.3190552Z replicaCount: 1
2020-07-10T11:34:26.3191030Z resources:
2020-07-10T11:34:26.3191686Z limits:
2020-07-10T11:34:26.3192320Z cpu: 150m
2020-07-10T11:34:26.3192819Z memory: 1444Mi
2020-07-10T11:34:26.3193319Z requests:
2020-07-10T11:34:26.3193797Z cpu: 100m
2020-07-10T11:34:26.3194463Z memory: 1024Mi
2020-07-10T11:34:26.3194975Z route:
2020-07-10T11:34:26.3195470Z alternateBackends: []
2020-07-10T11:34:26.3196028Z enabled: true
2020-07-10T11:34:26.3196556Z fullnameOverride: ""
2020-07-10T11:34:26.3197601Z host: HOST-NAME
2020-07-10T11:34:26.3198314Z nameOverride: ""
2020-07-10T11:34:26.3198828Z path: ""
2020-07-10T11:34:26.3199285Z service:
2020-07-10T11:34:26.3200023Z name: NAME
2020-07-10T11:34:26.3233791Z targetPort: http
2020-07-10T11:34:26.3234697Z weight: 100
2020-07-10T11:34:26.3235283Z status: ""
2020-07-10T11:34:26.3235819Z tls:
2020-07-10T11:34:26.3236787Z enabled: true
2020-07-10T11:34:26.3237479Z insecureEdgeTerminationPolicy: None
2020-07-10T11:34:26.3238168Z termination: edge
2020-07-10T11:34:26.3238800Z wildcardPolicy: None
2020-07-10T11:34:26.3239421Z secret:
2020-07-10T11:34:26.3240502Z jks: NAME-servers-jks
2020-07-10T11:34:26.3241249Z jssecacerts: jssecacerts
2020-07-10T11:34:26.3241901Z securityContext: {}
2020-07-10T11:34:26.3242534Z service:
2020-07-10T11:34:26.3243157Z containerPort: 8080
2020-07-10T11:34:26.3243770Z port: 8080
2020-07-10T11:34:26.3244543Z portName: http
2020-07-10T11:34:26.3245190Z targetPort: 8080
2020-07-10T11:34:26.3245772Z type: ClusterIP
2020-07-10T11:34:26.3246343Z serviceAccount:
2020-07-10T11:34:26.3247308Z create: ***
2020-07-10T11:34:26.3247993Z tolerations: []
2020-07-10T11:34:26.3248511Z
2020-07-10T11:34:26.3249065Z HOOKS:
2020-07-10T11:34:26.3249600Z MANIFEST:
2020-07-10T11:34:26.3250504Z ---
2020-07-10T11:34:26.3252176Z # Source: NAME/templates/service.yaml
2020-07-10T11:34:26.3253107Z apiVersion: v1
2020-07-10T11:34:26.3253715Z kind: Service
2020-07-10T11:34:26.3254487Z metadata:
2020-07-10T11:34:26.3255338Z name: NAME
2020-07-10T11:34:26.3256318Z namespace: sda-NAME
2020-07-10T11:34:26.3256883Z labels:
2020-07-10T11:34:26.3257666Z helm.sh/chart: NAME-1.0.0
2020-07-10T11:34:26.3258533Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3259785Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3260503Z app.kubernetes.io/version: "latest"
2020-07-10T11:34:26.3261383Z app.kubernetes.io/managed-by: Helm
2020-07-10T11:34:26.3261955Z spec:
2020-07-10T11:34:26.3262427Z type: ClusterIP
2020-07-10T11:34:26.3263292Z ports:
2020-07-10T11:34:26.3264086Z - port: 8080
2020-07-10T11:34:26.3264659Z targetPort: 8080
2020-07-10T11:34:26.3265359Z protocol: TCP
2020-07-10T11:34:26.3265900Z name: http
2020-07-10T11:34:26.3266361Z selector:
2020-07-10T11:34:26.3267220Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3268298Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3269380Z ---
2020-07-10T11:34:26.3270539Z # Source: NAME/templates/deployment.yaml
2020-07-10T11:34:26.3271606Z apiVersion: apps/v1
2020-07-10T11:34:26.3272400Z kind: Deployment
2020-07-10T11:34:26.3273326Z metadata:
2020-07-10T11:34:26.3274457Z name: NAME
2020-07-10T11:34:26.3275511Z namespace: sda-NAME
2020-07-10T11:34:26.3276177Z labels:
2020-07-10T11:34:26.3277219Z helm.sh/chart: NAME-1.0.0
2020-07-10T11:34:26.3278322Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3279447Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3280249Z app.kubernetes.io/version: "latest"
2020-07-10T11:34:26.3281398Z app.kubernetes.io/managed-by: Helm
2020-07-10T11:34:26.3282289Z spec:
2020-07-10T11:34:26.3282881Z replicas: 1
2020-07-10T11:34:26.3283505Z selector:
2020-07-10T11:34:26.3284469Z matchLabels:
2020-07-10T11:34:26.3285628Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3286815Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3287549Z template:
2020-07-10T11:34:26.3288192Z metadata:
2020-07-10T11:34:26.3288826Z labels:
2020-07-10T11:34:26.3289909Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3291596Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3292439Z spec:
2020-07-10T11:34:26.3293109Z serviceAccountName: default
2020-07-10T11:34:26.3293774Z securityContext:
2020-07-10T11:34:26.3294666Z {}
2020-07-10T11:34:26.3295217Z containers:
2020-07-10T11:34:26.3296338Z - name: NAME
2020-07-10T11:34:26.3297240Z securityContext:
2020-07-10T11:34:26.3297859Z {}
2020-07-10T11:34:26.3299353Z image: "REGISTRY-IMAGE"
2020-07-10T11:34:26.3300638Z imagePullPolicy: Always
2020-07-10T11:34:26.3301358Z ports:
2020-07-10T11:34:26.3302491Z - name:
2020-07-10T11:34:26.3303380Z containerPort: 8080
2020-07-10T11:34:26.3304479Z protocol: TCP
2020-07-10T11:34:26.3305325Z env:
2020-07-10T11:34:26.3306418Z - name: APP_NAME
2020-07-10T11:34:26.3307576Z value: "NAME"
2020-07-10T11:34:26.3308757Z - name: JAVA_OPTS_EXT
2020-07-10T11:34:26.3311974Z value: "-Djava.security.egd=file:/dev/./urandom -Dcom.sun.net.ssl.checkRevocation=*** -Djavax.net.ssl.trustStore=/etc/truststore/jssecacerts -Djavax.net.ssl.trustStorePassword=changeit"
2020-07-10T11:34:26.3313760Z - name: SPRING_CLOUD_CONFIG_PROFILE
2020-07-10T11:34:26.3314842Z value: "pro"
2020-07-10T11:34:26.3315890Z - name: TZ
2020-07-10T11:34:26.3316777Z value: "Europe/Madrid"
2020-07-10T11:34:26.3317863Z - name: WILY_MOM_PORT
2020-07-10T11:34:26.3318485Z value: "5001"
2020-07-10T11:34:26.3319421Z - name: spring_application_name
2020-07-10T11:34:26.3320679Z value: "NAME"
2020-07-10T11:34:26.3321858Z - name: spring_cloud_config_uri
2020-07-10T11:34:26.3323093Z value: "https://config.sda-NAME-pro.svc.cluster.local"
2020-07-10T11:34:26.3324190Z resources:
2020-07-10T11:34:26.3324905Z limits:
2020-07-10T11:34:26.3325439Z cpu: 150m
2020-07-10T11:34:26.3325985Z memory: 1444Mi
2020-07-10T11:34:26.3326739Z requests:
2020-07-10T11:34:26.3327305Z cpu: 100m
2020-07-10T11:34:26.3327875Z memory: 1024Mi
2020-07-10T11:34:26.3328436Z volumeMounts:
2020-07-10T11:34:26.3329476Z - name: jks
2020-07-10T11:34:26.3330147Z mountPath: "/etc/jks"
2020-07-10T11:34:26.3331153Z readOnly: true
2020-07-10T11:34:26.3332053Z - name: jssecacerts
2020-07-10T11:34:26.3332739Z mountPath: "/etc/truststore"
2020-07-10T11:34:26.3333356Z readOnly: true
2020-07-10T11:34:26.3334402Z - name: secretvol
2020-07-10T11:34:26.3335565Z mountPath: "/etc/secret-vol"
2020-07-10T11:34:26.3336302Z readOnly: true
2020-07-10T11:34:26.3336935Z volumes:
2020-07-10T11:34:26.3338100Z - name: jks
2020-07-10T11:34:26.3338724Z secret:
2020-07-10T11:34:26.3339946Z secretName: NAME-servers-jks
2020-07-10T11:34:26.3340817Z - name: secretvol
2020-07-10T11:34:26.3341347Z secret:
2020-07-10T11:34:26.3341870Z secretName:
2020-07-10T11:34:26.3342633Z - name: jssecacerts
2020-07-10T11:34:26.3343444Z secret:
2020-07-10T11:34:26.3344103Z secretName: jssecacerts
2020-07-10T11:34:26.3344866Z ---
2020-07-10T11:34:26.3345846Z # Source: NAME/templates/route.yaml
2020-07-10T11:34:26.3346641Z apiVersion: route.openshift.io/v1
2020-07-10T11:34:26.3347112Z kind: Route
2020-07-10T11:34:26.3347568Z metadata:
2020-07-10T11:34:26.3354831Z name: NAME
2020-07-10T11:34:26.3357144Z labels:
2020-07-10T11:34:26.3358020Z helm.sh/chart: NAME-1.0.0
2020-07-10T11:34:26.3359360Z app.kubernetes.io/name: NAME
2020-07-10T11:34:26.3360306Z app.kubernetes.io/instance: NAME
2020-07-10T11:34:26.3361002Z app.kubernetes.io/version: "latest"
2020-07-10T11:34:26.3361888Z app.kubernetes.io/managed-by: Helm
2020-07-10T11:34:26.3362463Z spec:
2020-07-10T11:34:26.3363374Z host: HOST
2020-07-10T11:34:26.3364364Z path:
2020-07-10T11:34:26.3364940Z wildcardPolicy: None
2020-07-10T11:34:26.3365630Z port:
2020-07-10T11:34:26.3366080Z targetPort: http
2020-07-10T11:34:26.3366496Z tls:
2020-07-10T11:34:26.3367144Z termination: edge
2020-07-10T11:34:26.3367630Z insecureEdgeTerminationPolicy: None
2020-07-10T11:34:26.3368072Z to:
2020-07-10T11:34:26.3368572Z kind: Service
2020-07-10T11:34:26.3369571Z name: NAME
2020-07-10T11:34:26.3369919Z weight: 100
2020-07-10T11:34:26.3370115Z status:
2020-07-10T11:34:26.3370287Z ingress: []
2020-07-10T11:34:26.3370419Z
2020-07-10T11:34:26.3370579Z NOTES:
2020-07-10T11:34:26.3370833Z 1. Get the application URL by running these commands:
2020-07-10T11:34:26.3371698Z export POD_NAME=$(kubectl get pods --namespace sda-NAME -l "app.kubernetes.io/name=NAME,app.kubernetes.io/instance=NAME" -o jsonpath="{.items[0].metadata.name}")
2020-07-10T11:34:26.3372278Z echo "Visit http://127.0.0.1:8080 to use your application"
2020-07-10T11:34:26.3373358Z kubectl --namespace sda-NAME port-forward $POD_NAME 8080:80
2020-07-10T11:34:26.3373586Z
2020-07-10T11:34:26.3385047Z ##[section]Finishing: Helm Install/Upgrade NAME
looks well and don`t show any error...but if make it without --dry-run crash in the same part...
On the other hand, I try it without this volume and secret...and works perfect! I don't understand it.
Thank you for your patience and guidance.
UPDATE & FIX:
finally, the problem was in the file values-NAME.yml:
secret:
jks: VALUE
jssecacerts: VALUE
it need the following line in secret:
secretvol: VALUE

Accessing bitnami/kafka outside the kubernetes cluster

I am currently using bitnami/kafka image(https://hub.docker.com/r/bitnami/kafka) and deploying it on kubernetes.
kubernetes master: 1
kubernetes workers: 3
Within the cluster the other application are able to find kafka. The problem occurs when trying to access the kafka container from outside the cluster. When reading little bit I read that we need to set property "advertised.listener=PLAINTTEXT://hostname:port_number" for external kafka clients.
I am currently referencing "https://github.com/bitnami/charts/tree/master/bitnami/kafka". Inside my values.yaml file I have added
values.yaml
advertisedListeners1: 10.21.0.191
and statefulset.yaml
- name: KAFKA_CFG_ADVERTISED_LISTENERS
value: 'PLAINTEXT://{{ .Values.advertisedListeners }}:9092'
For a single kafka instance it is working fine.
But for 3 node kafka cluster, I changed some configuration like below:
values.yaml
advertisedListeners1: 10.21.0.191
advertisedListeners2: 10.21.0.192
advertisedListeners3: 10.21.0.193
and Statefulset.yaml
- name: KAFKA_CFG_ADVERTISED_LISTENERS
{{- if $MY_POD_NAME := "kafka-0" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners1 }}:9092'
{{- else if $MY_POD_NAME := "kafka-1" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners2 }}:9092'
{{- else if $MY_POD_NAME := "kafka-2" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners3 }}:9092'
{{- end }}
Expected result is that all the 3 kafka instances should get advertised.listener property set to worker nodes ip address.
example:
kafka-0 --> "PLAINTEXT://10.21.0.191:9092"
kafka-1 --> "PLAINTEXT://10.21.0.192:9092"
kafka-3 --> "PLAINTEXT://10.21.0.193:9092"
Currently only one kafka pod in up and running and the other two are going to crashloopbackoff state.
and the other two pods are showing error as:
[2019-10-20 13:09:37,753] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2019-10-20 13:09:37,786] ERROR [KafkaServer id=1002] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.IllegalArgumentException: requirement failed: Configured end points 10.21.0.191:9092 in advertised listeners are already registered by broker 1001
at scala.Predef$.require(Predef.scala:224)
at kafka.server.KafkaServer$$anonfun$createBrokerInfo$2.apply(KafkaServer.scala:399)
at kafka.server.KafkaServer$$anonfun$createBrokerInfo$2.apply(KafkaServer.scala:397)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.server.KafkaServer.createBrokerInfo(KafkaServer.scala:397)
at kafka.server.KafkaServer.startup(KafkaServer.scala:261)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:84)
at kafka.Kafka.main(Kafka.scala)
That means the logic applied in statefulset.yaml is not working.
Can anyone help me in resolving this..?
Any help would be appreciated..
The output of kubectl get statefulset kafka -o yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2019-10-29T07:04:12Z"
generation: 1
labels:
app.kubernetes.io/component: kafka
app.kubernetes.io/instance: kafka
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: kafka
helm.sh/chart: kafka-6.0.1
name: kafka
namespace: default
resourceVersion: "12189730"
selfLink: /apis/apps/v1/namespaces/default/statefulsets/kafka
uid: d40cfd5f-46a6-49d0-a9d3-e3a851356063
spec:
podManagementPolicy: Parallel
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: kafka
app.kubernetes.io/instance: kafka
app.kubernetes.io/name: kafka
serviceName: kafka-headless
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: kafka
app.kubernetes.io/instance: kafka
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: kafka
helm.sh/chart: kafka-6.0.1
name: kafka
spec:
containers:
- env:
- name: MY_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: KAFKA_CFG_ZOOKEEPER_CONNECT
value: kafka-zookeeper
- name: KAFKA_PORT_NUMBER
value: "9092"
- name: KAFKA_CFG_LISTENERS
value: PLAINTEXT://:$(KAFKA_PORT_NUMBER)
- name: KAFKA_CFG_ADVERTISED_LISTENERS
value: PLAINTEXT://10.21.0.191:9092
- name: ALLOW_PLAINTEXT_LISTENER
value: "yes"
- name: KAFKA_CFG_BROKER_ID
value: "-1"
- name: KAFKA_CFG_DELETE_TOPIC_ENABLE
value: "false"
- name: KAFKA_HEAP_OPTS
value: -Xmx1024m -Xms1024m
- name: KAFKA_CFG_LOG_FLUSH_INTERVAL_MESSAGES
value: "10000"
- name: KAFKA_CFG_LOG_FLUSH_INTERVAL_MS
value: "1000"
- name: KAFKA_CFG_LOG_RETENTION_BYTES
value: "1073741824"
- name: KAFKA_CFG_LOG_RETENTION_CHECK_INTERVALS_MS
value: "300000"
- name: KAFKA_CFG_LOG_RETENTION_HOURS
value: "168"
- name: KAFKA_CFG_LOG_MESSAGE_FORMAT_VERSION
- name: KAFKA_CFG_MESSAGE_MAX_BYTES
value: "1000012"
- name: KAFKA_CFG_LOG_SEGMENT_BYTES
value: "1073741824"
- name: KAFKA_CFG_LOG_DIRS
value: /bitnami/kafka/data
- name: KAFKA_CFG_DEFAULT_REPLICATION_FACTOR
value: "1"
- name: KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR
value: "1"
- name: KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
value: "1"
- name: KAFKA_CFG_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM
value: https
- name: KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR
value: "1"
- name: KAFKA_CFG_NUM_IO_THREADS
value: "8"
- name: KAFKA_CFG_NUM_NETWORK_THREADS
value: "3"
- name: KAFKA_CFG_NUM_PARTITIONS
value: "1"
- name: KAFKA_CFG_NUM_RECOVERY_THREADS_PER_DATA_DIR
value: "1"
- name: KAFKA_CFG_SOCKET_RECEIVE_BUFFER_BYTES
value: "102400"
- name: KAFKA_CFG_SOCKET_REQUEST_MAX_BYTES
value: "104857600"
- name: KAFKA_CFG_SOCKET_SEND_BUFFER_BYTES
value: "102400"
- name: KAFKA_CFG_ZOOKEEPER_CONNECTION_TIMEOUT_MS
value: "6000"
image: docker.io/bitnami/kafka:2.3.0-debian-9-r88
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 2
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: kafka
timeoutSeconds: 5
name: kafka
ports:
- containerPort: 9092
name: kafka
protocol: TCP
readinessProbe:
failureThreshold: 6
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: kafka
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /bitnami/kafka
name: data
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
runAsUser: 1001
terminationGracePeriodSeconds: 30
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
volumeMode: Filesystem
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 3
currentRevision: kafka-56ff499d74
observedGeneration: 1
readyReplicas: 1
replicas: 3
updateRevision: kafka-56ff499d74
updatedReplicas: 3
I see you have some trouble with passing different environment variables for differents pods in a StatefulSet.
You are trying to achieve this using helm templates:
- name: KAFKA_CFG_ADVERTISED_LISTENERS
{{- if $MY_POD_NAME := "kafka-0" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners1 }}:9092'
{{- else if $MY_POD_NAME := "kafka-1" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners2 }}:9092'
{{- else if $MY_POD_NAME := "kafka-2" }}
value: 'PLAINTEXT://{{ .Values.advertisedListeners3 }}:9092'
{{- end }}
In helm template guide documentation you can find this explaination:
In Helm templates, a variable is a named reference to another object.
It follows the form $name. Variables are assigned with a special assignment operator: :=.
Now let's look at your code:
{{- if $MY_POD_NAME := "kafka-0" }}
This is variable assignment, not comparasion and
after this assignment, if statement evaluates this expression to true and that's why in your
staefulset yaml manifest you see this as an output:
- name: KAFKA_CFG_ADVERTISED_LISTENERS
value: PLAINTEXT://10.21.0.191:9092
To make it work as expected, you shouldn't use helm templating. It's not going to work.
One way to do it would be to create separate enviroment variable for every kafka node
and pass all of these variables to all pods, like this:
- env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: KAFKA_0
value: 10.21.0.191
- name: KAFKA_1
value: 10.21.0.192
- name: KAFKA_2
value: 10.21.0.193
# - name: KAFKA_CFG_ADVERTISED_LISTENERS
# value: PLAINTEXT://$MY_POD_NAME:9092
and also create your own docker image with modified starting script that will export KAFKA_CFG_ADVERTISED_LISTENERS variable
with appropriate value depending on MY_POD_NAME.
If you dont want to create your own image, you can create a ConfigMap with modified entrypoint.sh and mount it
in place of old entrypoint.sh (you can also use any other file, just take a look here
for more information on how kafka image is built).
Mounting ConfigMap looks like this:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test-container
image: docker.io/bitnami/kafka:2.3.0-debian-9-r88
volumeMounts:
- name: config-volume
mountPath: /entrypoint.sh
subPath: entrypoint.sh
volumes:
- name: config-volume
configMap:
# Provide the name of the ConfigMap containing the files you want
# to add to the container
name: kafka-entrypoint-config
defaultMode: 0744 # remember to add proper (executable) permissions
apiVersion: v1
kind: ConfigMap
metadata:
name: kafka-entrypoint-config
namespace: default
data:
entrypoint.sh: |
#!/bin/bash
# Here add modified entrypoint script
Please let me know if it helped.
I think the The helm chart doesn't whitelist your external (to kubernetes) network for advertised.listeners. I solved a similar issue by reconfiguring the helm values.yaml like this. In my case the 127.0.0.1 network is mac, yours might be different:
externalAccess:
enabled: true
autoDiscovery:
enabled: false
image:
registry: docker.io
repository: bitnami/kubectl
tag: 1.23.4-debian-10-r17
pullPolicy: IfNotPresent
pullSecrets: []
resources:
limits: {}
requests: {}
service:
type: NodePort
port: 9094
loadBalancerIPs: []
loadBalancerSourceRanges: []
nodePorts:
- 30000
- 30001
- 30002
useHostIPs: false
annotations: {}
domain: 127.0.0.1