I am trying to deploy AIrflow on EKS Fargate using Helm. I have the EKS cluster, SC, PV, and PVC, along with namespace and fargate-profile(dev) all set up.
My problem comes when I do helm install:
helm upgrade --install airflow apache-airflow/airflow -n dev --values values.yaml --set volumePermissions.enbled=true --debug
[![list of pods][1]][1]
Above is the list of pods. The last 3 keep going into Crashloopbackoff.
Here is the describe of webserver pod:
C:\Users\tanma>kubectl describe pods -n dev airflow-webserver-775d548b98-wd5x8
Name: airflow-webserver-775d548b98-wd5x8
Namespace: dev
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: airflow-webserver
Node: fargate-ip-192-168-161-147.us-west-2.compute.internal/192.168.161.147
Start Time: Thu, 13 Oct 2022 17:12:54 -0400
Labels: component=webserver
eks.amazonaws.com/fargate-profile=dev
pod-template-hash=775d548b98
release=airflow
tier=airflow
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
checksum/airflow-config: 978d20ff42d3de620bee24f2e35b1769f20ebd948890bf474bd940624e39f150
checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
checksum/metadata-secret: d9bd679df96f2631a8559d02cc528fd78c3d73c06289be9816d83fb332e05b5e
checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
checksum/webserver-config: 4a2281a4e3ed0cc5e89f07aba3c1bb314ea51c17cb5d2b41e9b045054a6b5c72
checksum/webserver-secret-key: a1e18ebcc73a51b6bafe52d95eee84dcdf132559cac0248fff6e58e409b4505e
kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.161.147
IPs:
IP: 192.168.161.147
Controlled By: ReplicaSet/airflow-webserver-775d548b98
Init Containers:
wait-for-airflow-migrations:
Container ID: containerd://bf4919f7a268bbeaf1a8f8779e4da1551d76f622d9ce970f18a3f2a1f14c24d7
Image: apache/airflow:2.4.1
Image ID: docker.io/apache/airflow#sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
Port: <none>
Host Port: <none>
Args:
airflow
db
check-migrations
--migration-wait-timeout=60
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 13 Oct 2022 17:14:40 -0400
Finished: Thu, 13 Oct 2022 17:15:12 -0400
Ready: True
Restart Count: 0
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'> Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Containers:
webserver:
Container ID: containerd://e479b50af8eefc8c99971cc9cc9b6345f826c09d5f770276b33518340298359d
Image: apache/airflow:2.4.1
Image ID: docker.io/apache/airflow#sha256:e077b68d81d56d773bddbcdc8941b7a2c16a2087a641005dfc5f1b8dcadec90a
Port: 8080/TCP
Host Port: 0/TCP
Args:
bash
-c
exec airflow webserver
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Thu, 13 Oct 2022 17:40:25 -0400
Finished: Thu, 13 Oct 2022 17:42:19 -0400
Ready: False
Restart Count: 9
Liveness: http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
Readiness: http-get http://:8080/health delay=15s timeout=30s period=5s #success=1 #failure=20
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'> Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
/opt/airflow/logs from logs (rw)
/opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pntv6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: airflow-airflow-config
Optional: false
logs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: af-efs-fargate-1
ReadOnly: false
kube-api-access-pntv6:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 31m fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Normal Scheduled 30m fargate-scheduler Successfully assigned dev/airflow-webserver-775d548b98-wd5x8 to fargate-ip-192-168-161-147.us-west-2.compute.internal
Normal Pulling 30m kubelet Pulling image "apache/airflow:2.4.1"
Normal Pulled 28m kubelet Successfully pulled image "apache/airflow:2.4.1" in 1m43.155801441s
Normal Created 28m kubelet Created container wait-for-airflow-migrations
Normal Started 28m kubelet Started container wait-for-airflow-migrations
Normal Pulled 28m kubelet Container image "apache/airflow:2.4.1" already present on machine
Normal Created 28m kubelet Created container webserver
Normal Started 28m kubelet Started container webserver
Warning Unhealthy 27m (x9 over 27m) kubelet Readiness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
Warning Unhealthy 10m (x156 over 27m) kubelet Liveness probe failed: Get "http://192.168.161.147:8080/health": dial tcp 192.168.161.147:8080: connect: connection refused
Warning BackOff 10s (x44 over 14m) kubelet Back-off restarting failed container
Any thoughts on why the pods keep restarting?
Appreciate your help here.
Thanks
[1]: https://i.stack.imgur.com/IPocP.png
Your host port is 0. I guess that could cause the webserver not to be able to expose its port.
However, you'd have to check the logs of the webserver pod itself to make sure this is the problem.
You need to make sure that this endpoint is available (which is not currently); http://192.168.161.147:8080/health
Ended up increasing the resources for webserver and this solved the problem.
THanks
Related
I am getting a 0/1 ready state when i run kubectl get pod -n mongodb
To install binami/mongo I ran:
helm repo add bitnami https://charts.bitnami.com/bitnami
kubectl create namespace mongodb
helm install mongodb -n mongodb bitnami/mongodb --values ./factory/yaml/mongodb/values.yaml
when I run kubectl get pod -n mongodb
NAME READY STATUS RESTARTS AGE
mongodb-79bf77f485-8bdm6 0/1 CrashLoopBackOff 69 (55s ago) 4h17m
Here I want the ready state to be 1/1 and status running
thenI ran kubectl describe pod -n mongodb to view the log, and I got
Name: mongodb-79bf77f485-8bdm6
Namespace: mongodb
Priority: 0
Node: ip-192-168-58-58.ec2.internal/192.168.58.58
Start Time: Tue, 19 Jul 2022 07:31:32 +0000
Labels: app.kubernetes.io/component=mongodb
app.kubernetes.io/instance=mongodb
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=mongodb
helm.sh/chart=mongodb-12.1.26
pod-template-hash=79bf77f485
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.47.246
IPs:
IP: 192.168.47.246
Controlled By: ReplicaSet/mongodb-79bf77f485
Containers:
mongodb:
Container ID: docker://11999c2e13e382ceb0a8ba2ea8255ed3d4dc07ca18659ee5a1fe1a8d071b10c0
Image: docker.io/bitnami/mongodb:4.4.2-debian-10-r0
Image ID: docker-pullable://bitnami/mongodb#sha256:add0ef947bc26d25b12ee1b01a914081e08b5e9242d2f9e34e2881b5583ce102
Port: 27017/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 19 Jul 2022 11:46:12 +0000
Finished: Tue, 19 Jul 2022 11:47:32 +0000
Ready: False
Restart Count: 69
Liveness: exec [/bitnami/scripts/ping-mongodb.sh] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bitnami/scripts/readiness-probe.sh] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
MONGODB_ROOT_USER: root
MONGODB_ROOT_PASSWORD: <set to the key 'mongodb-root-password' in secret 'mongodb'> Optional: false
ALLOW_EMPTY_PASSWORD: no
MONGODB_SYSTEM_LOG_VERBOSITY: 0
MONGODB_DISABLE_SYSTEM_LOG: no
MONGODB_DISABLE_JAVASCRIPT: no
MONGODB_ENABLE_JOURNAL: yes
MONGODB_PORT_NUMBER: 27017
MONGODB_ENABLE_IPV6: no
MONGODB_ENABLE_DIRECTORY_PER_DB: no
Mounts:
/bitnami/mongodb from datadir (rw)
/bitnami/scripts from common-scripts (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nsr69 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
common-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: mongodb-common-scripts
Optional: false
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb
ReadOnly: false
kube-api-access-nsr69:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 30m (x530 over 4h19m) kubelet Readiness probe failed: /bitnami/scripts/readiness-probe.sh: line 9: mongosh: command not found
Warning BackOff 9m59s (x760 over 4h10m) kubelet Back-off restarting failed container
Warning Unhealthy 5m6s (x415 over 4h19m) kubelet Liveness probe failed: /bitnami/scripts/ping-mongodb.sh: line 2: mongosh: command not found.
Don't understand the log or where the problem is coming from
How can I make the ready state 1/1 and a running status
I am running an Argo workflow and getting the following error in the pod's log:
error: a container name must be specified for pod <name>, choose one of: [wait main]
This error only happens some of the time and only with some of my templates, but when it does, it is a template that is run later in the workflow (i.e. not the first template run). I have not yet been able to identify the parameters that will run successfully, so I will be happy with tips for debugging. I have pasted the output of describe below.
Based on searches, I think the solution is simply that I need to attach "-c main" somewhere, but I do not know where and cannot find information in the Argo docs.
Describe:
Name: message-passing-1-q8jgn-607612432
Namespace: argo
Priority: 0
Node: REDACTED
Start Time: Wed, 17 Mar 2021 17:16:37 +0000
Labels: workflows.argoproj.io/completed=false
workflows.argoproj.io/workflow=message-passing-1-q8jgn
Annotations: cni.projectcalico.org/podIP: 192.168.40.140/32
cni.projectcalico.org/podIPs: 192.168.40.140/32
workflows.argoproj.io/node-name: message-passing-1-q8jgn.e
workflows.argoproj.io/outputs: {"exitCode":"6"}
workflows.argoproj.io/template:
{"name":"egress","arguments":{},"inputs":{...
Status: Failed
IP: 192.168.40.140
IPs:
IP: 192.168.40.140
Controlled By: Workflow/message-passing-1-q8jgn
Containers:
wait:
Container ID: docker://26d6c30440777add2af7ef3a55474d9ff36b8c562d7aecfb911ce62911e5fda3
Image: argoproj/argoexec:v2.12.10
Image ID: docker-pullable://argoproj/argoexec#sha256:6edb85a84d3e54881404d1113256a70fcc456ad49c6d168ab9dfc35e4d316a60
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: message-passing-1-q8jgn-607612432 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/mainctrfs/mnt/logs from log-p1-vol (rw)
/mainctrfs/mnt/processed from processed-p1-vol (rw)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
main:
Container ID: docker://67e6d6d3717ab1080f14cac6655c90d990f95525edba639a2d2c7b3170a7576e
Image: REDACTED
Image ID: REDACTED
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
State: Terminated
Reason: Error
Exit Code: 6
Started: Wed, 17 Mar 2021 17:16:43 +0000
Finished: Wed, 17 Mar 2021 17:17:03 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt/logs/ from log-p1-vol (rw)
/mnt/processed/ from processed-p1-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-v2w56 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
processed-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-processed-p1-vol
ReadOnly: false
log-p1-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: message-passing-1-q8jgn-log-p1-vol
ReadOnly: false
argo-token-v2w56:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-v2w56
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m35s default-scheduler Successfully assigned argo/message-passing-1-q8jgn-607612432 to ack1
Normal Pulled 7m31s kubelet Container image "argoproj/argoexec:v2.12.10" already present on machine
Normal Created 7m31s kubelet Created container wait
Normal Started 7m30s kubelet Started container wait
Normal Pulled 7m30s kubelet Container image already present on machine
Normal Created 7m30s kubelet Created container main
Normal Started 7m30s kubelet Started container main
This happens when you try to see logs for a pod with multiple containers and not specify for what container you want to see the log. Typical command to see logs:
kubectl logs <podname>
But your Pod has two container, one named "wait" and one named "main". You can see the logs from the container named "main" with:
kubectl logs <podname> -c main
or you can see the logs from all containers with
kubectl logs <podname> --all-containers
My pod get red status and show this error:
Usage of EmptyDir volume "agent" exceeds the limit "100Mi".
after I fix this problem, the error did not disappear.
how to make the error tips disappear? This is the pod info:
dolphin#dolphins-MacBook-Pro ~ % kubectl describe pods soa-task-745d48d955-bd4j8
Name: soa-task-745d48d955-bd4j8
Namespace: dabai-fat
Priority: 0
Node: azshara-k8s03/172.19.150.82
Start Time: Tue, 17 Aug 2021 15:23:18 +0800
Labels: k8s-app=soa-task
pod-template-hash=745d48d955
Annotations: kubectl.kubernetes.io/restartedAt: 2021-04-20T07:42:03Z
Status: Running
IP: 172.30.184.3
IPs: <none>
Controlled By: ReplicaSet/soa-task-745d48d955
Init Containers:
init-agent:
Container ID: docker://25d947147300edba8bc1861d40cea314047674b74f82d7de9013eead41f1f20f
Image: registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai_fat/skywalking-agent:6.5.0
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai_fat/skywalking-agent#sha256:eda5426bc7bc06fc184e740f4783f263f151ae25e55aae37eec8b67e5dbb2fb0
Port: <none>
Host Port: <none>
Command:
sh
-c
set -ex;mkdir -p /skywalking/agent;cp -r /opt/skywalking/agent/* /skywalking/agent;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 17 Aug 2021 15:23:19 +0800
Finished: Tue, 17 Aug 2021 15:23:19 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/skywalking/agent from agent (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xnrwt (ro)
Containers:
soa-task:
Container ID: docker://5406a90606e0a3905fa8fa4827e19db0d8d58609c06c2fd4e756b718df5db3b9
Image: registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai-fat/soa-task:v1.0.0
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai-fat/soa-task#sha256:aece36589aae4fdedcfb82d7e64e451e32ebb1169dccd2485f8fe4bd451944a8
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 17 Aug 2021 15:23:34 +0800
Ready: True
Restart Count: 0
Liveness: http-get http://:11028/actuator/liveness delay=120s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:11028/actuator/health delay=90s timeout=30s period=10s #success=1 #failure=3
Environment:
SKYWALKING_ADDR: dabai-skywalking-skywalking-oap.apm.svc.cluster.local:11800
APOLLO_META: <set to the key 'apollo.meta' of config map 'fat-config'> Optional: false
ENV: <set to the key 'env' of config map 'fat-config'> Optional: false
Mounts:
/opt/skywalking/agent from agent (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xnrwt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
agent:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 100Mi
default-token-xnrwt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xnrwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 360s
node.kubernetes.io/unreachable:NoExecute op=Exists for 360s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned dabai-fat/soa-task-745d48d955-bd4j8 to azshara-k8s03
Normal Pulled 13m kubelet Container image "registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai_fat/skywalking-agent:6.5.0" already present on machine
Normal Created 13m kubelet Created container init-agent
Normal Started 13m kubelet Started container init-agent
Normal Pulling 13m kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai-fat/soa-task:v1.0.0"
Normal Pulled 13m kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/dabai_app_k8s/dabai-fat/soa-task:v1.0.0"
Normal Created 13m kubelet Created container soa-task
Normal Started 13m kubelet Started container soa-task
I am stuck with docker-resgitry deployment to K8S. Here I show detail what I did. Hope you can give me any ideas.
My K8S version:
ii kubeadm 1.14.1-00 amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.14.1-00 amd64 Kubernetes Command Line Tool
ii kubelet 1.14.1-00 amd64 Kubernetes Node Agent
ii kubernetes-cni 0.7.5-00 amd64 Kubernetes CNI
What I did?
Create selfcert
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.crt
Import selfcert to K8S
$ kubectl create secret tls registry-cert-secret --key cert.key --cert cert.crt
$ vim chart_values.yaml
ingress:
enabled: true
hosts:
- registry.mgmt.home.local
annotations:
kubernetes.io/ingress.class: traefik
tls:
- secretName: registry-cert-secret
hosts:
- registry.mgmt.home.local
secrets:
htpasswd: "admin:$2y$05$f95dCd6fRxQdDoPJ6mJIb.YMvR0qfhddSl3NSL1wCk1ZMl4JyFBDW"
s3:
accessKey: "admin"
secretKey: "admin2019"
storage: s3
s3:
region: us-east-1
regionEndpoint: http://minio.home.local:9000
secure: true
bucket: registry
then install with helm
$ helm install stable/docker-registry -f chart_values.yaml --name docker-registry
NAME: docker-registry
LAST DEPLOYED: Thu Oct 31 16:29:31 2019
NAMESPACE: default
STATUS: DEPLOYED
show the kubectl deployments
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 0/1 1 0 35m
get pods
$ kubectl get pods --namespace default
NAME READY STATUS RESTARTS AGE
docker-registry-6989668db6-78d84 0/1 **CrashLoopBackOff** 7 13m
docker-registry-6989668db6-jttrz 1/1 Terminating 0 37m
describe pod
$ kubectl describe pod docker-registry-6989668db6-78d84 --namespace default
Name: docker-registry-6989668db6-78d84
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-worker-promox/10.102.11.223
Start Time: Thu, 31 Oct 2019 18:03:13 +0800
Labels: app=docker-registry
pod-template-hash=6989668db6
release=docker-registry
Annotations: checksum/config: 89b20bb43a348d6b8dedacac583a596ccef4e570a935e7c5b464ba746eb88307
Status: Running
IP: 10.244.52.10
Controlled By: ReplicaSet/docker-registry-6989668db6
Containers:
docker-registry:
Container ID: docker://9a40c5e100711b122ddd78439c9fa21790f04f5a442b704140639f8fbfbd8929
Image: registry:2.7.1
Image ID: docker-pullable://registry#sha256:8004747f1e8cd820a148fb7499d71a76d45ff66bac6a29129bfdbfdc0154d146
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/registry
serve
/etc/docker/registry/config.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 31 Oct 2019 18:14:21 +0800
Finished: Thu, 31 Oct 2019 18:15:19 +0800
Ready: False
Restart Count: 7
Liveness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
REGISTRY_AUTH: htpasswd
REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
REGISTRY_AUTH_HTPASSWD_PATH: /auth/htpasswd
REGISTRY_HTTP_SECRET: <set to the key 'haSharedSecret' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_ACCESSKEY: <set to the key 's3AccessKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_SECRETKEY: <set to the key 's3SecretKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_REGION: us-east-1
REGISTRY_STORAGE_S3_REGIONENDPOINT: http://10.102.11.218:9000
REGISTRY_STORAGE_S3_BUCKET: registry
REGISTRY_STORAGE_S3_SECURE: true
Mounts:
/auth from auth (ro)
/etc/docker/registry from docker-registry-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qfwkm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
auth:
Type: Secret (a volume populated by a Secret)
SecretName: docker-registry-secret
Optional: false
docker-registry-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: docker-registry-config
ingress:
Optional: false
default-token-qfwkm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qfwkm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/docker-registry-6989668db6-78d84 to k8s-worker-promox
Normal Pulled 12m (x3 over 14m) kubelet, k8s-worker-promox Container image "registry:2.7.1" already present on machine
Normal Created 12m (x3 over 14m) kubelet, k8s-worker-promox Created container docker-registry
Normal Started 12m (x3 over 14m) kubelet, k8s-worker-promox Started container docker-registry
Normal Killing 12m (x2 over 13m) kubelet, k8s-worker-promox Container docker-registry failed liveness probe, will be restarted
Warning Unhealthy 12m (x7 over 14m) kubelet, k8s-worker-promox Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 9m8s (x15 over 13m) kubelet, k8s-worker-promox Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 4m26s (x18 over 8m40s) kubelet, k8s-worker-promox Back-off restarting failed container
I see the issue related to Liveness and Readiness. So they made the pod is trying to start/ restart many times, then it gets "Back-off".
Following the troubleshooting, I see that should be related to DNS. But, DNS should not have any issues. I tried to lookup at K8S host.
$ nslookup minio.home.local
Server: 10.102.11.201
Address: 10.102.11.201#53
Non-authoritative answer:
Name: minio.home.local
Address: 10.101.12.213
Updated November 1st. I went into another pod, then nslookup, this pod could not find minio.home.local. Is that related this issue? also I tried to replace minio.home.local to IP in *.yaml, but also get the same issue.
$ kubectl exec -it net-utils-5b5f89f777-2cwgq bash
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/# nslookup minio.home.local
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find minio.skylab.local: NXDOMAIN
root#net-utils-5b5f89f777-2cwgq:/# ping minio.home.local
ping: unknown host
Googled/ Github discussion, but I still could not fix it. Do you have any ideas?
Thank you so much.
I am installing kong ingress controller on my AKS cluster, but I don't want to have the postgres Statefulset service inside my cluster. Instead, I have a postgres database in my azure infrastructure, and I want connect it from my kong-ingress-controller deplopyment creating the postgres credentials like secrets in my aks cluster and store it in an environment variables.
I've create the secret
⟩ kubectl create secret generic az-pg-db-user-pass --from-literal=username='az-pg-username' --from-literal=password='az-pg-password' --namespace kong
secret/az-pg-db-user-pass created
And in my kongwithingress.yaml file, I have the deployment manifest declarations, which I did want to present from this gist link in order to don't fill the body question of a lot of yaml code lines.
This gist is based in this AKS deployment all in one, but removing postgres like Statefulset and Service due to the previous reasons, my objective is setup connection with my own azure managed postgres service
I've configured the az-pg-db-user-pass generic secret created in the kong-ingress-controller deployment and my kong deployment and my kong-migrations job presents in my whole gist script on order to create an environment variables such as follow:
KONG_PG_USERNAME
KONG_PG_PASSWORD
These environment variables has been created and referenced as a secrets in the kong-ingress-controller deployment and kong deployment and kong-migrations job which need access or connect with the postgres database
When I execute the kubectl apply -f kongwithingres.yaml command I get the following output:
The kong-ingress-controller deployment, kong deployment and kong-migrations job were created successfully.
⟩ kubectl apply -f kongwithingres.yaml
namespace/kong unchanged
customresourcedefinition.apiextensions.k8s.io/kongplugins.configuration.konghq.com unchanged
customresourcedefinition.apiextensions.k8s.io/kongconsumers.configuration.konghq.com unchanged
customresourcedefinition.apiextensions.k8s.io/kongcredentials.configuration.konghq.com unchanged
customresourcedefinition.apiextensions.k8s.io/kongingresses.configuration.konghq.com unchanged
serviceaccount/kong-serviceaccount unchanged
clusterrole.rbac.authorization.k8s.io/kong-ingress-clusterrole unchanged
role.rbac.authorization.k8s.io/kong-ingress-role unchanged
rolebinding.rbac.authorization.k8s.io/kong-ingress-role-nisa-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/kong-ingress-clusterrole-nisa-binding unchanged
service/kong-ingress-controller created
deployment.extensions/kong-ingress-controller created
service/kong-proxy created
deployment.extensions/kong created
job.batch/kong-migrations created
[I]
But their respective pods have the CrashLoopBackOff status
NAME READY STATUS RESTARTS AGE
pod/kong-d8b88df99-j6hvl 0/1 Init:CrashLoopBackOff 5 4m24s
pod/kong-ingress-controller-984fc9666-cd2b5 0/2 Init:CrashLoopBackOff 5 4m24s
pod/kong-migrations-t6n7p 0/1 CrashLoopBackOff 5 4m24s
I am checking the respective logs of each pod and I found this:
The pod/kong-d8b88df99-j6hvl:
⟩ kubectl logs pod/kong-d8b88df99-j6hvl -p -n kong
Error from server (BadRequest): previous terminated container "kong-proxy" in pod "kong-d8b88df99-j6hvl" not found
And in their describe information this pod is getting the environment variables and the image
⟩ kubectl describe pod/kong-d8b88df99-j6hvl -n kong
Name: kong-d8b88df99-j6hvl
Namespace: kong
Status: Pending
IP: 10.244.1.18
Controlled By: ReplicaSet/kong-d8b88df99
Init Containers:
wait-for-migrations:
Container ID: docker://7007a89ada215daf853ec103d79dca60ccc5fb3a14c51ac6c5c56655da6da62f
Image: kong:1.0.0
Image ID: docker-pullable://kong#sha256:8fd6a312d7715a9cc85c49625a4c2f53951f6e4422926091e4d2ae67c480b6d5
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
kong migrations list
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 26 Feb 2019 16:25:01 +0100
Finished: Tue, 26 Feb 2019 16:25:01 +0100
Ready: False
Restart Count: 6
Environment:
KONG_ADMIN_LISTEN: off
KONG_PROXY_LISTEN: off
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PG_HOST: zcrm365-postgresql1.postgres.database.azure.com
KONG_PG_USERNAME: <set to the key 'username' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_PASSWORD: <set to the key 'password' in secret 'az-pg-db-user-pass'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gnkjq (ro)
Containers:
kong-proxy:
Container ID:
Image: kong:1.0.0
Image ID:
Ports: 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
KONG_PG_USERNAME: <set to the key 'username' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_PASSWORD: <set to the key 'password' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_HOST: zcrm365-postgresql1.postgres.database.azure.com
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: off
KUBERNETES_PORT_443_TCP_ADDR: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
KUBERNETES_PORT: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_SERVICE_HOST: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gnkjq (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-gnkjq:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gnkjq
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m44s default-scheduler Successfully assigned kong/kong-d8b88df99-j6hvl to aks-default-75800594-1
Normal Pulled 7m9s (x5 over 8m40s) kubelet, aks-default-75800594-1 Container image "kong:1.0.0" already present on machine
Normal Created 7m8s (x5 over 8m40s) kubelet, aks-default-75800594-1 Created container
Normal Started 7m7s (x5 over 8m40s) kubelet, aks-default-75800594-1 Started container
Warning BackOff 3m34s (x26 over 8m38s) kubelet, aks-default-75800594-1 Back-off restarting failed container
The pod/kong-ingress-controller-984fc9666-cd2b5:
kubectl logs pod/kong-ingress-controller-984fc9666-cd2b5 -p -n kong
Error from server (BadRequest): a container name must be specified for pod kong-ingress-controller-984fc9666-cd2b5, choose one of: [admin-api ingress-controller] or one of the init containers: [wait-for-migrations]
[I]
And their respective description
⟩ kubectl describe pod/kong-ingress-controller-984fc9666-cd2b5 -n kong
Name: kong-ingress-controller-984fc9666-cd2b5
Namespace: kong
Status: Pending
IP: 10.244.2.18
Controlled By: ReplicaSet/kong-ingress-controller-984fc9666
Init Containers:
wait-for-migrations:
Container ID: docker://8eb035f755322b3ac72792d922974811933ba9a71afb1f4549cfe7e0a6519619
Image: kong:1.0.0
Image ID: docker-pullable://kong#sha256:8fd6a312d7715a9cc85c49625a4c2f53951f6e4422926091e4d2ae67c480b6d5
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
kong migrations list
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 26 Feb 2019 16:29:56 +0100
Finished: Tue, 26 Feb 2019 16:29:56 +0100
Ready: False
Restart Count: 7
Environment:
KONG_ADMIN_LISTEN: off
KONG_PROXY_LISTEN: off
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PG_HOST: zcrm365-postgresql1.postgres.database.azure.com
KONG_PG_USERNAME: <set to the key 'username' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_PASSWORD: <set to the key 'password' in secret 'az-pg-db-user-pass'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kong-serviceaccount-token-rc4sp (ro)
Containers:
admin-api:
Container ID:
Image: kong:1.0.0
Image ID:
Port: 8001/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: http-get http://:8001/status delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8001/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
KONG_PG_USERNAME: <set to the key 'username' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_PASSWORD: <set to the key 'password' in secret 'az-pg-db-user-pass'> Optional: false
KONG_PG_HOST: zcrm365-postgresql1.postgres.database.azure.com
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001, 0.0.0.0:8444 ssl
KONG_PROXY_LISTEN: off
KUBERNETES_PORT_443_TCP_ADDR: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
KUBERNETES_PORT: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_SERVICE_HOST: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kong-serviceaccount-token-rc4sp (ro)
ingress-controller:
Container ID:
Image: kong-docker-kubernetes-ingress-controller.bintray.io/kong-ingress-controller:0.3.0
Image ID:
Port: <none>
Host Port: <none>
Args:
/kong-ingress-controller
--kong-url=https://localhost:8444
--admin-tls-skip-verify
--default-backend-service=kong/kong-proxy
--publish-service=kong/kong-proxy
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: http-get http://:10254/healthz delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: kong-ingress-controller-984fc9666-cd2b5 (v1:metadata.name)
POD_NAMESPACE: kong (v1:metadata.namespace)
KUBERNETES_PORT_443_TCP_ADDR: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
KUBERNETES_PORT: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://zcrm365-d73ab78d.hcp.westeurope.azmk8s.io:443
KUBERNETES_SERVICE_HOST: zcrm365-d73ab78d.hcp.westeurope.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kong-serviceaccount-token-rc4sp (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
kong-serviceaccount-token-rc4sp:
Type: Secret (a volume populated by a Secret)
SecretName: kong-serviceaccount-token-rc4sp
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned kong/kong-ingress-controller-984fc9666-cd2b5 to aks-default-75800594-2
Normal Pulled 10m (x5 over 12m) kubelet, aks-default-75800594-2 Container image "kong:1.0.0" already present on machine
Normal Created 10m (x5 over 12m) kubelet, aks-default-75800594-2 Created container
Normal Started 10m (x5 over 12m) kubelet, aks-default-75800594-2 Started container
Warning BackOff 2m14s (x49 over 12m) kubelet, aks-default-75800594-2 Back-off restarting failed container
[I]
~/workspace/ZCRM365/Deployments/Kubernetes/kong · (Deployments±)
⟩
I unknown the reason by which teh CrashLoopBackOff status and their status respective is Waiting: PodInitiazing
How to can I debug this behavior?
Is possible that Kong cannot talk to the Postgres database?
My AKS cluster is on Azure and also my postgres database and they have communication as a services.
UPDATE
These are the logs of my container pods created:
⟩ kubectl logs pod/kong-ingress-controller-984fc9666-w4vvn -p -n kong -c ingress-controller
Error from server (BadRequest): previous terminated container "ingress-controller" in pod "kong-ingress-controller-984fc9666-w4vvn" not found
[I]
⟩ kubectl logs pod/kong-d8b88df99-qsq4j -p -n kong -c kong-proxy
Error from server (BadRequest): previous terminated container "kong-proxy" in pod "kong-d8b88df99-qsq4j" not found
[I]
~/workspace/ZCRM365/Deployments/Kubernetes/kong · (Deployments±)
⟩
My kong-ingress-controller deployment pods are CrashLoopBackOff and some times in Waiting: PodInitiazing because I don't had in mind some things such as follow:
The main reason, such as says #Amityo, the kong-ingress-controller and kong have init-container called - wait-for-migrations which waits for the kong-migrations job before to be executed. Here, I can identify that is necessary perform my kong migrations
But my kong-migrations job was not working because I don't had the KONG_DATABASE environment variable parameter to setup the connection.
Other reason by which my deployment was not working is because kong internally to connect with postgres maybe wait that the user environment variable defined in the container to be called KONG_PG_USER. I was called KONG_PG_USERNAME and it was other reason to fail the execution of my script. (I am not sure completely about this)
⟩ kubectl create -f kongwithingres.yaml
namespace/kong created
secret/az-pg-db-user-pass created
customresourcedefinition.apiextensions.k8s.io/kongplugins.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongconsumers.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongcredentials.configuration.konghq.com created
customresourcedefinition.apiextensions.k8s.io/kongingresses.configuration.konghq.com created
serviceaccount/kong-serviceaccount created
clusterrole.rbac.authorization.k8s.io/kong-ingress-clusterrole created
role.rbac.authorization.k8s.io/kong-ingress-role created
rolebinding.rbac.authorization.k8s.io/kong-ingress-role-nisa-binding created
clusterrolebinding.rbac.authorization.k8s.io/kong-ingress-clusterrole-nisa-binding created
service/kong-ingress-controller created
deployment.extensions/kong-ingress-controller created
service/kong-proxy created
deployment.extensions/kong created
job.batch/kong-migrations created
[I]
~/workspace/ZCRM365/Deployments/Kubernetes/kong · (Deployments)
By the way, to start with kong I've recommend install konga which is a front-end dashboard tool to manage kong and check the things that we can make via yaml files.
We have this konga.yaml script to be installed like deployment in our kubernetes clusters
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: konga
namespace: kong
spec:
replicas: 1
template:
metadata:
labels:
app: konga
spec:
containers:
- env:
- name: NODE_TLS_REJECT_UNAUTHORIZED
value: "0"
image: pantsel/konga:latest
name: konga
ports:
- containerPort: 1337
And, we can start the service locally on our machines, via kubectl port-forward command
⟩ kubectl port-forward pod/konga-85b66cffff-mxq85 1337:1337 -n kong
Forwarding from 127.0.0.1:1337 -> 1337
Forwarding from [::1]:1337 -> 1337
We are going to http://localhost:1337/ and is necessary to register an admin user to use konga.
Create a kong connection (Connections in the dashboard) with the following kong admin URL http://kong-ingress-controller:8001/