Kubernetes CoreDNS in CrashLoopBackOff on worker node - kubernetes

I've searched CoreDns in CrashLoopBackOff but nothing has helped me through.
My Set
k8s - v1.20.2
CoreDns-1.7.0
Installed by kubespray with this one https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray
My Problem
CoreDNS pods on master Node are in a running state
but on worker Node coreDns pods are in crashLoopBackOff state.
enter image description here
kubectl logs -f coredns-847f564ccf-msbvp -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
CoreDns container runs a command "/coredns -conf /etc/resolv.conf" for a while
and then it is destroyed.
enter image description here
Here is Corefile
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
prefer_udp
}
cache 30
loop
reload
loadbalance
}
And one of crashed pod's event
kubectl get event --namespace kube-system --field-selector involvedObject.name=coredns-847f564ccf-lqnxs
LAST SEEN TYPE REASON OBJECT MESSAGE
4m55s Warning Unhealthy pod/coredns-847f564ccf-lqnxs Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
9m59s Warning BackOff pod/coredns-847f564ccf-lqnxs Back-off restarting failed container
And Here is CoreDns Description
Containers:
coredns:
Container ID: docker://a174cb3a3800181d1c7b78831bfd37bbf69caf60a82051d6fb29b4b9deeacce9
Image: k8s.gcr.io/coredns:1.7.0
Image ID: docker-pullable://k8s.gcr.io/coredns#sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 21 Apr 2021 21:51:44 +0900
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 21 Apr 2021 21:44:42 +0900
Finished: Wed, 21 Apr 2021 21:46:32 +0900
Ready: False
Restart Count: 9943
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=0s timeout=5s period=10s #success=1 #failure=10
Readiness: http-get http://:8181/ready delay=0s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/etc/coredns from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qqhn6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 18m (x9940 over 30d) kubelet Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
Warning Unhealthy 8m37s (x99113 over 30d) kubelet Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning BackOff 3m35s (x121901 over 30d) kubelet Back-off restarting failed container
At this point, any suggestion at all will be helpful
I found something weird.
I test in node1, I can access Coredns pod in the node2, but I can not access Coredns pod in the node1.
I use calico for cni
in node1, coredns1 - 1.1.1.1
in node2, coredns2 - 2.2.2.2
in node1.
access 1.1.1.1:8080/health -> timeout
access 2.2.2.2:8080/health -> ok
in node2.
access 1.1.1.1:8080/health -> ok
access 2.2.2.2:8080/health -> timeout

If Containerd, and Kubelet are under Proxy, please add private IP range: 10.0.0.0/8 into NO_PROXY configuration to make sure they can pull the images.
E.g:
[root#dev-systrdemo301z phananhtuan01]# cat /etc/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=dev-proxy.prod.xx.local:8300"
Environment="HTTPS_PROXY=dev-proxy.prod.xx.local:8300"
Environment="NO_PROXY=localhost,127.0.0.0/8,100.67.253.157/24,10.0.0.0/8"
Please refer to this article.

Related

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

GCP AI Platform - Pipelines - Clusters - Does not have minimum availability

I can't create pipelines. I can't even load the samples / tutorials on the AI Platform Pipelines Dashboard because it doesn't seem to be able to proxy to whatever it needs to.
An error occurred
Error occured while trying to proxy to: ...
I looked into the cluster's details and found 3 components with errors:
Deployment metadata-grpc-deployment Does not have minimum availability
Deployment ml-pipeline Does not have minimum availability
Deployment ml-pipeline-persistenceagent Does not have minimum availability
Creating the clusters involve approx. 3 clicks in GCP Kubernetes Engine so I don't think I messed up this step.
Anyone have an idea of how to achieve "minimum availability"?
UPDATE 1
Nodes have adequate resources and are Ready.
YAML file looks good.
I have 2 clusters in diff regions/zones and both have the deployment errors listed above.
2 Pods are not ok.
Name: ml-pipeline-65479485c8-mcj9x
Namespace: default
Priority: 0
Node: gke-cluster-3-default-pool-007784cb-qcsn/10.150.0.2
Start Time: Thu, 17 Sep 2020 22:15:19 +0000
Labels: app=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines-3
pod-template-hash=65479485c8
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container ml-pipeline-api-server
Status: Running
IP: 10.4.0.8
IPs:
IP: 10.4.0.8
Controlled By: ReplicaSet/ml-pipeline-65479485c8
Containers:
ml-pipeline-api-server:
Container ID: ...
Image: ...
Image ID: ...
Ports: 8888/TCP, 8887/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Fri, 18 Sep 2020 10:27:31 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 18 Sep 2020 10:20:38 +0000
Finished: Fri, 18 Sep 2020 10:27:31 +0000
Ready: False
Restart Count: 98
Requests:
cpu: 100m
Liveness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Readiness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Environment:
HAS_DEFAULT_BUCKET: true
BUCKET_NAME:
PROJECT_ID: <set to the key 'project_id' of config map 'gcp-default-config'> Optional: false
POD_NAMESPACE: default (v1:metadata.namespace)
DEFAULTPIPELINERUNNERSERVICEACCOUNT: pipeline-runner
OBJECTSTORECONFIG_SECURE: false
OBJECTSTORECONFIG_BUCKETNAME:
DBCONFIG_DBNAME: kubeflow_pipelines_3_pipeline
DBCONFIG_USER: <set to the key 'username' in secret 'mysql-credential'> Optional: false
DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-credential'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from ml-pipeline-token-77xl8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ml-pipeline-token-77xl8:
Type: Secret (a volume populated by a Secret)
SecretName: ml-pipeline-token-77xl8
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 52m (x409 over 11h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
Warning Unhealthy 31m (x94 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Readiness probe failed:
Warning Unhealthy 31m (x29 over 10h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn (combined from similar events): Readiness probe failed: c
annot exec in a stopped state: unknown
Warning Unhealthy 17m (x95 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe failed:
Normal Pulled 7m26s (x97 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Container image "gcr.io/cloud-marketplace/google-cloud-ai
-platform/kubeflow-pipelines/apiserver:1.0.0" already present on machine
Warning Unhealthy 75s (x78 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe errored: rpc error: code = DeadlineExceede
d desc = context deadline exceeded
And the other pod:
Name: ml-pipeline-persistenceagent-67db8b8964-mlbmv
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x2238 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
SOLUTION
Do not let google handle any storage. Uncheck "Use managed storage" and set up your own artifact collections manually. You don't actually need to enter anything in these fields since the pipeline will be launched anyway.
The Does not have minimum availability error is generic. There could be many issues that trigger it. You need to analyse more in-depth in order to find the actual problem. Here are some possible causes:
Insufficient resources: check if your Node has adequate resources (CPU/Memory). If Node is ok than check the Pod's status.
Liveliness probe and/or Readiness probe failure: execute kubectl describe pod <pod-name> to check if they failed and why.
Deployment misconfiguration: review your deployment yaml file to see if there are any errors or leftovers from previous configurations.
You can also try to wait a bit as sometimes it takes some time in order to deploy everything and/or try changing your Region/Zone.

Readiness probe failed: timeout: failed to connect service ":8080" within 1s

I am trying to build and deploy microservices images to a single-node Kubernetes cluster running on my development machine using minikube. I am using the cloud-native microservices demo application Online Boutique by Google to understand the use of technologies like Kubernetes, Istio etc.
Link to github repo: microservices-demo
I have followed all the installation process to locally build and deploy the microservices, and am able to access the web frontend through my browser. However, when I click on any of the product images say, I see this error page.
HTTP Status: 500 Internal Server Error
On doing a check using kubectl get pods
I realize that one of my pods( Recommendation service) has status CrashLoopBackOff.
Running kubectl describe pods recommendationservice-55b4d6c477-kxv8r:
Namespace: default
Priority: 0
Node: minikube/192.168.99.116
Start Time: Thu, 23 Jul 2020 19:58:38 +0530
Labels: app=recommendationservice
app.kubernetes.io/managed-by=skaffold-v1.11.0
pod-template-hash=55b4d6c477
skaffold.dev/builder=local
skaffold.dev/cleanup=true
skaffold.dev/deployer=kubectl
skaffold.dev/docker-api-version=1.40
skaffold.dev/run-id=49913ced-e8df-40a7-9336-a227b56bcb5f
skaffold.dev/tag-policy=git-commit
Annotations: <none>
Status: Running
IP: 172.17.0.14
IPs:
IP: 172.17.0.14
Controlled By: ReplicaSet/recommendationservice-55b4d6c477
Containers:
server:
Container ID: docker://2d92aa966a82fbe58c8f40f6ecf9d6d55c29f8081cb40e0423a2397e1419350f
Image: recommendationservice:2216d526d249cc8363129aed9a09d752f9ad8f458e61e50a2a99c59d000606cb
Image ID: docker://sha256:2216d526d249cc8363129aed9a09d752f9ad8f458e61e50a2a99c59d000606cb
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 23 Jul 2020 21:09:33 +0530
Finished: Thu, 23 Jul 2020 21:09:53 +0530
Ready: False
Restart Count: 29
Limits:
cpu: 200m
memory: 450Mi
Requests:
cpu: 100m
memory: 220Mi
Liveness: exec [/bin/grpc_health_probe -addr=:8080] delay=0s timeout=1s period=5s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=:8080] delay=0s timeout=1s period=5s #success=1 #failure=3
Environment:
PORT: 8080
PRODUCT_CATALOG_SERVICE_ADDR: productcatalogservice:3550
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-sbpcx (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-sbpcx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-sbpcx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 44m (x15 over 74m) kubelet, minikube Container image "recommendationservice:2216d526d249cc8363129aed9a09d752f9ad8f458e61e50a2a99c59d000606cb" already present on machine
Warning Unhealthy 9m33s (x99 over 74m) kubelet, minikube Readiness probe failed: timeout: failed to connect service ":8080" within 1s
Warning BackOff 4m25s (x294 over 72m) kubelet, minikube Back-off restarting failed container
In Events, I see Readiness probe failed: timeout: failed to connect service ":8080" within 1s.
What is the reason and how can I resolve this?
Thanks for the help!
Answer
The timeout of the Readiness Probe (1 second) was too short.
More Info
The relevant Readiness Probe is defined such that /bin/grpc_health_probe -addr=:8080 is run inside the server container.
You would expect a 1 second timeout to be sufficient for such a probe but this is running on Minikube so that could be impacting the timeout of the probe.

Istio Prometheus pod in CrashLoopBackOff State

I am trying to setup Istio (1.5.4) for the bookinfo example provided on their website. I have used the demo configuration profile. But on verifying istio installation it fails since Prometheus pod has entered a CrashLoopBackOff state.
NAME READY STATUS RESTARTS AGE
grafana-5f6f8cbf75-psk78 1/1 Running 0 21m
istio-egressgateway-7f9f45c966-g7k9j 1/1 Running 0 21m
istio-ingressgateway-968d69c8b-bhxk5 1/1 Running 0 21m
istio-tracing-9dd6c4f7c-7fm79 1/1 Running 0 21m
istiod-86884c8c45-sw96x 1/1 Running 0 21m
kiali-869c6894c5-wqgjb 1/1 Running 0 21m
prometheus-589c44dbfc-xkwmj 1/2 CrashLoopBackOff 8 21m
The logs for the prometheus pod:
level=warn ts=2020-05-15T09:07:53.113Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:331 build_context="(go=go1.13.5, user=root#4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:332 host_details="(Linux 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 prometheus-589c44dbfc-xkwmj (none))"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0005f63c0, 0x2c62100)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:362 +0x5229
Describe pod output:
Name: prometheus-589c44dbfc-xkwmj
Namespace: istio-system
Priority: 0
Node: inspiron-7577/192.168.0.9
Start Time: Fri, 15 May 2020 14:21:14 +0530
Labels: app=prometheus
pod-template-hash=589c44dbfc
release=istio
Annotations: sidecar.istio.io/inject: false
Status: Running
IP: 172.17.0.11
IPs:
IP: 172.17.0.11
Controlled By: ReplicaSet/prometheus-589c44dbfc
Containers:
prometheus:
Container ID: docker://b6820a000ab67a5ce31d3a38f6f0d510bd150794b2792147fc17ef8f730c03bb
Image: docker.io/prom/prometheus:v2.15.1
Image ID: docker-pullable://prom/prometheus#sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention=6h
--config.file=/etc/prometheus/prometheus.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 15 May 2020 14:37:50 +0530
Finished: Fri, 15 May 2020 14:37:53 +0530
Ready: False
Restart Count: 8
Requests:
cpu: 10m
Liveness: http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio-certs from istio-certs (rw)
/etc/prometheus from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
istio-proxy:
Container ID: docker://fa756c93510b6f402d7d88c31a5f5f066d4c254590eab70886e7835e7d3871be
Image: docker.io/istio/proxyv2:1.5.4
Image ID: docker-pullable://istio/proxyv2#sha256:e16e2801b7fd93154e8fcb5f4e2fb1240d73349d425b8be90691d48e8b9bb944
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy-prometheus
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system.svc:15012
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--connectTimeout
10s
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
--dnsRefreshRate
300s
--statusPort
15020
--trust-domain=cluster.local
--controlPlaneBootstrap=false
State: Running
Started: Fri, 15 May 2020 14:21:31 +0530
Ready: True
Restart Count: 0
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
OUTPUT_CERTS: /etc/istio-certs
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istio-pilot.istio-system.svc:15012
POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_META_POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-system (v1:metadata.namespace)
ISTIO_META_MESH_ID: cluster.local
ISTIO_META_CLUSTER_ID: Kubernetes
Mounts:
/etc/istio-certs/ from istio-certs (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
istio-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
prometheus-token-cgqbc:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-cgqbc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned istio-system/prometheus-589c44dbfc-xkwmj to inspiron-7577
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "prometheus-token-cgqbc" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulled 17m kubelet, inspiron-7577 Container image "docker.io/istio/proxyv2:1.5.4" already present on machine
Normal Created 17m kubelet, inspiron-7577 Created container istio-proxy
Normal Started 17m kubelet, inspiron-7577 Started container istio-proxy
Warning Unhealthy 17m kubelet, inspiron-7577 Readiness probe failed: HTTP probe failed with statuscode: 503
Normal Pulled 16m (x4 over 17m) kubelet, inspiron-7577 Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
Normal Created 16m (x4 over 17m) kubelet, inspiron-7577 Created container prometheus
Normal Started 16m (x4 over 17m) kubelet, inspiron-7577 Started container prometheus
Warning BackOff 2m24s (x72 over 17m) kubelet, inspiron-7577 Back-off restarting failed container
It is unable to create directory for logging. Please help with any ideas.
As istio 1.5.4 has been just released there are some issues with prometheus on minikube installed with istioctl manifest apply.
I checked it on a gcp and everything works fine there.
As a workaround, you can use istio operator which was tested by me and OP and as he mentioned in comments, it's working.
Thanks a lot #jt97! It did work.
Steps to install istio operator
Install the istioctl command.
Deploy the Istio operator: istioctl operator init.
Install istio
To install the Istio demo configuration profile using the operator, run the following command:
kubectl create ns istio-system
kubectl apply -f - <<EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: example-istiocontrolplane
spec:
profile: demo
EOF
Could you tell me why the normal installation failed?
As I mentioned in comments, I don't know yet. If I found a reason I will update this question.

docker-registry deploys to K8S get an issue "CrashLoopBackOff"

I am stuck with docker-resgitry deployment to K8S. Here I show detail what I did. Hope you can give me any ideas.
My K8S version:
ii kubeadm 1.14.1-00 amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.14.1-00 amd64 Kubernetes Command Line Tool
ii kubelet 1.14.1-00 amd64 Kubernetes Node Agent
ii kubernetes-cni 0.7.5-00 amd64 Kubernetes CNI
What I did?
Create selfcert
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.crt
Import selfcert to K8S
$ kubectl create secret tls registry-cert-secret --key cert.key --cert cert.crt
$ vim chart_values.yaml
ingress:
enabled: true
hosts:
- registry.mgmt.home.local
annotations:
kubernetes.io/ingress.class: traefik
tls:
- secretName: registry-cert-secret
hosts:
- registry.mgmt.home.local
secrets:
htpasswd: "admin:$2y$05$f95dCd6fRxQdDoPJ6mJIb.YMvR0qfhddSl3NSL1wCk1ZMl4JyFBDW"
s3:
accessKey: "admin"
secretKey: "admin2019"
storage: s3
s3:
region: us-east-1
regionEndpoint: http://minio.home.local:9000
secure: true
bucket: registry
then install with helm
$ helm install stable/docker-registry -f chart_values.yaml --name docker-registry
NAME: docker-registry
LAST DEPLOYED: Thu Oct 31 16:29:31 2019
NAMESPACE: default
STATUS: DEPLOYED
show the kubectl deployments
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 0/1 1 0 35m
get pods
$ kubectl get pods --namespace default
NAME READY STATUS RESTARTS AGE
docker-registry-6989668db6-78d84 0/1 **CrashLoopBackOff** 7 13m
docker-registry-6989668db6-jttrz 1/1 Terminating 0 37m
describe pod
$ kubectl describe pod docker-registry-6989668db6-78d84 --namespace default
Name: docker-registry-6989668db6-78d84
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-worker-promox/10.102.11.223
Start Time: Thu, 31 Oct 2019 18:03:13 +0800
Labels: app=docker-registry
pod-template-hash=6989668db6
release=docker-registry
Annotations: checksum/config: 89b20bb43a348d6b8dedacac583a596ccef4e570a935e7c5b464ba746eb88307
Status: Running
IP: 10.244.52.10
Controlled By: ReplicaSet/docker-registry-6989668db6
Containers:
docker-registry:
Container ID: docker://9a40c5e100711b122ddd78439c9fa21790f04f5a442b704140639f8fbfbd8929
Image: registry:2.7.1
Image ID: docker-pullable://registry#sha256:8004747f1e8cd820a148fb7499d71a76d45ff66bac6a29129bfdbfdc0154d146
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/registry
serve
/etc/docker/registry/config.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 31 Oct 2019 18:14:21 +0800
Finished: Thu, 31 Oct 2019 18:15:19 +0800
Ready: False
Restart Count: 7
Liveness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
REGISTRY_AUTH: htpasswd
REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
REGISTRY_AUTH_HTPASSWD_PATH: /auth/htpasswd
REGISTRY_HTTP_SECRET: <set to the key 'haSharedSecret' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_ACCESSKEY: <set to the key 's3AccessKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_SECRETKEY: <set to the key 's3SecretKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_REGION: us-east-1
REGISTRY_STORAGE_S3_REGIONENDPOINT: http://10.102.11.218:9000
REGISTRY_STORAGE_S3_BUCKET: registry
REGISTRY_STORAGE_S3_SECURE: true
Mounts:
/auth from auth (ro)
/etc/docker/registry from docker-registry-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qfwkm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
auth:
Type: Secret (a volume populated by a Secret)
SecretName: docker-registry-secret
Optional: false
docker-registry-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: docker-registry-config
ingress:
Optional: false
default-token-qfwkm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qfwkm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/docker-registry-6989668db6-78d84 to k8s-worker-promox
Normal Pulled 12m (x3 over 14m) kubelet, k8s-worker-promox Container image "registry:2.7.1" already present on machine
Normal Created 12m (x3 over 14m) kubelet, k8s-worker-promox Created container docker-registry
Normal Started 12m (x3 over 14m) kubelet, k8s-worker-promox Started container docker-registry
Normal Killing 12m (x2 over 13m) kubelet, k8s-worker-promox Container docker-registry failed liveness probe, will be restarted
Warning Unhealthy 12m (x7 over 14m) kubelet, k8s-worker-promox Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 9m8s (x15 over 13m) kubelet, k8s-worker-promox Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 4m26s (x18 over 8m40s) kubelet, k8s-worker-promox Back-off restarting failed container
I see the issue related to Liveness and Readiness. So they made the pod is trying to start/ restart many times, then it gets "Back-off".
Following the troubleshooting, I see that should be related to DNS. But, DNS should not have any issues. I tried to lookup at K8S host.
$ nslookup minio.home.local
Server: 10.102.11.201
Address: 10.102.11.201#53
Non-authoritative answer:
Name: minio.home.local
Address: 10.101.12.213
Updated November 1st. I went into another pod, then nslookup, this pod could not find minio.home.local. Is that related this issue? also I tried to replace minio.home.local to IP in *.yaml, but also get the same issue.
$ kubectl exec -it net-utils-5b5f89f777-2cwgq bash
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/# nslookup minio.home.local
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find minio.skylab.local: NXDOMAIN
root#net-utils-5b5f89f777-2cwgq:/# ping minio.home.local
ping: unknown host
Googled/ Github discussion, but I still could not fix it. Do you have any ideas?
Thank you so much.