kubernetes ingress-controller CrashLoopBackOff Error - kubernetes

I've set up a Kubernetes (1.17.11) cluster (Azure), and I've installed nginx-ingress-controller via
helm install nginx-ingress --namespace z1 stable/nginx-ingress --set controller.publishService.enabled=true
the setup seems to be ok and it's working but every now and then it fails, when I check running pods (kubectl get pod -n z1) I see there is a number of restarts for the ingress-controller pod.
I thought maybe there is a huge load so better to increase replicas so I ran helm upgrade --namespace z1 stable/ingress --set controller.replicasCount=3 but still only one of the pods (out of 3) seems to be in use and one has fails due to CrashLoopBackOff sometimes (not constantly).
One thing worth mentioning, installed nginx-ingress version is 0.34.1 but 0.41.2 is also available, do you think the upgrade will help, and how can I upgrade the installed version to the new one (AFAIK helm upgrade won't replace the chart with a newer version, I may be wrong) ?
Any idea?
kubectl describe pod result:
Name: nginx-ingress-controller-58467bccf7-jhzlx
Namespace: z1
Priority: 0
Node: aks-agentpool-41415378-vmss000000/10.240.0.4
Start Time: Thu, 19 Nov 2020 09:01:30 +0100
Labels: app=nginx-ingress
app.kubernetes.io/component=controller
component=controller
pod-template-hash=58467bccf7
release=nginx-ingress
Annotations: <none>
Status: Running
IP: 10.244.1.18
IPs:
IP: 10.244.1.18
Controlled By: ReplicaSet/nginx-ingress-controller-58467bccf7
Containers:
nginx-ingress-controller:
Container ID: docker://719655d41c1c8cdb8c9e88c21adad7643a44d17acbb11075a1a60beb7553e2cf
Image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
Image ID: docker-pullable://us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller#sha256:0e072dddd1f7f8fc8909a2ca6f65e76c5f0d2fcfb8be47935ae3457e8bbceb20
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--default-backend-service=z1/nginx-ingress-default-backend
--election-id=ingress-controller-leader
--ingress-class=nginx
--configmap=z1/nginx-ingress-controller
State: Running
Started: Thu, 19 Nov 2020 09:54:07 +0100
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Thu, 19 Nov 2020 09:50:41 +0100
Finished: Thu, 19 Nov 2020 09:51:12 +0100
Ready: True
Restart Count: 8
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-58467bccf7-jhzlx (v1:metadata.name)
POD_NAMESPACE: z1 (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-7rmtk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
nginx-ingress-token-7rmtk:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-token-7rmtk
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned z1/nginx-ingress-controller-58467bccf7-jhzlx to aks-agentpool-41415378-vmss000000
Normal Killing 58m kubelet, aks-agentpool-41415378-vmss000000 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 57m (x4 over 58m) kubelet, aks-agentpool-41415378-vmss000000 Readiness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 57m kubelet, aks-agentpool-41415378-vmss000000 Readiness probe failed: Get http://10.244.1.18:10254/healthz: read tcp 10.244.1.1:54126->10.244.1.18:10254: read: connection reset by peer
Normal Pulled 57m (x2 over 59m) kubelet, aks-agentpool-41415378-vmss000000 Container image "us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1" already present on machine
Normal Created 57m (x2 over 59m) kubelet, aks-agentpool-41415378-vmss000000 Created container nginx-ingress-controller
Normal Started 57m (x2 over 59m) kubelet, aks-agentpool-41415378-vmss000000 Started container nginx-ingress-controller
Warning Unhealthy 57m kubelet, aks-agentpool-41415378-vmss000000 Liveness probe failed: Get http://10.244.1.18:10254/healthz: dial tcp 10.244.1.18:10254: connect: connection refused
Warning Unhealthy 56m kubelet, aks-agentpool-41415378-vmss000000 Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 23m (x10 over 58m) kubelet, aks-agentpool-41415378-vmss000000 Liveness probe failed: Get http://10.244.1.18:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 14m (x6 over 57m) kubelet, aks-agentpool-41415378-vmss000000 Readiness probe failed: Get http://10.244.1.18:10254/healthz: dial tcp 10.244.1.18:10254: connect: connection refused
Warning BackOff 9m28s (x12 over 12m) kubelet, aks-agentpool-41415378-vmss000000 Back-off restarting failed container
Warning Unhealthy 3m51s (x24 over 58m) kubelet, aks-agentpool-41415378-vmss000000 Readiness probe failed: Get http://10.244.1.18:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Some logs from the controller
NGINX Ingress controller
Release: v0.34.1
Build: v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.1
-------------------------------------------------------------------------------
I1119 08:54:07.267185 6 main.go:275] Running in Kubernetes cluster version v1.17 (v1.17.11) - git (clean) commit 3a3612132641768edd7f7e73d07772225817f630 - platform linux/amd64
I1119 08:54:07.276120 6 main.go:87] Validated z1/nginx-ingress-default-backend as the default backend.
I1119 08:54:07.430459 6 main.go:105] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
W1119 08:54:07.497816 6 store.go:659] Unexpected error reading configuration configmap: configmaps "nginx-ingress-controller" not found
I1119 08:54:07.617458 6 nginx.go:263] Starting NGINX Ingress controller
I1119 08:54:08.748938 6 backend_ssl.go:66] Adding Secret "z1/z1-tls-secret" to the local store
I1119 08:54:08.801385 6 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"z2", Name:"zalenium", UID:"8d395a18-811b-4852-8dd5-3cdd682e2e6e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"13667218", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress z2/zalenium
I1119 08:54:08.801908 6 backend_ssl.go:66] Adding Secret "z2/z2-tls-secret" to the local store
I1119 08:54:08.802837 6 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"z1", Name:"zalenium", UID:"244ae6f5-897e-432e-8ec3-fd142f0255dc", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"13667219", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress z1/zalenium
I1119 08:54:08.839946 6 nginx.go:307] Starting NGINX process
I1119 08:54:08.840375 6 leaderelection.go:242] attempting to acquire leader lease z1/ingress-controller-leader-nginx...
I1119 08:54:08.845041 6 controller.go:141] Configuration changes detected, backend reload required.
I1119 08:54:08.919965 6 status.go:86] new leader elected: nginx-ingress-controller-58467bccf7-5thwb
I1119 08:54:09.084800 6 controller.go:157] Backend successfully reloaded.
I1119 08:54:09.096999 6 controller.go:166] Initial sync, sleeping for 1 second.

As OP confirmed in comment section, I am posting solution for this issue.
Yes I tried and I replaced the deprecated version with the latest version, it completely solved the nginx issue.
In this setup OP used helm chart from stable repository. In Github page, dedicated to stable/nginx-ingress there is an information that this specific chart is DEPRECATED. It was updated 12 days ago so this is a fresh change.
This chart is deprecated as we have moved to the upstream repo ingress-nginx The chart source can be found here: https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx
In Nginx Ingress Controller deploy guide using Helm option is already with new repository.
To list current repository on the cluster use command $ helm repo list.
$ helm repo list
NAME URL
stable https://kubernetes-charts.storage.googleapis.com
ingress-nginx https://kubernetes.github.io/ingress-nginx
If you don't have new ingress-nginx repository, you have to:
Add new repository:
$ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
Update it:
$ helm update
Deploy Nginx Ingress Controller:
$ helm install my-release ingress-nginx/ingress-nginx
Disclaimer!
Above commands are specific to Helm v3.

Related

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

GCP AI Platform - Pipelines - Clusters - Does not have minimum availability

I can't create pipelines. I can't even load the samples / tutorials on the AI Platform Pipelines Dashboard because it doesn't seem to be able to proxy to whatever it needs to.
An error occurred
Error occured while trying to proxy to: ...
I looked into the cluster's details and found 3 components with errors:
Deployment metadata-grpc-deployment Does not have minimum availability
Deployment ml-pipeline Does not have minimum availability
Deployment ml-pipeline-persistenceagent Does not have minimum availability
Creating the clusters involve approx. 3 clicks in GCP Kubernetes Engine so I don't think I messed up this step.
Anyone have an idea of how to achieve "minimum availability"?
UPDATE 1
Nodes have adequate resources and are Ready.
YAML file looks good.
I have 2 clusters in diff regions/zones and both have the deployment errors listed above.
2 Pods are not ok.
Name: ml-pipeline-65479485c8-mcj9x
Namespace: default
Priority: 0
Node: gke-cluster-3-default-pool-007784cb-qcsn/10.150.0.2
Start Time: Thu, 17 Sep 2020 22:15:19 +0000
Labels: app=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines-3
pod-template-hash=65479485c8
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container ml-pipeline-api-server
Status: Running
IP: 10.4.0.8
IPs:
IP: 10.4.0.8
Controlled By: ReplicaSet/ml-pipeline-65479485c8
Containers:
ml-pipeline-api-server:
Container ID: ...
Image: ...
Image ID: ...
Ports: 8888/TCP, 8887/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Fri, 18 Sep 2020 10:27:31 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 18 Sep 2020 10:20:38 +0000
Finished: Fri, 18 Sep 2020 10:27:31 +0000
Ready: False
Restart Count: 98
Requests:
cpu: 100m
Liveness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Readiness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Environment:
HAS_DEFAULT_BUCKET: true
BUCKET_NAME:
PROJECT_ID: <set to the key 'project_id' of config map 'gcp-default-config'> Optional: false
POD_NAMESPACE: default (v1:metadata.namespace)
DEFAULTPIPELINERUNNERSERVICEACCOUNT: pipeline-runner
OBJECTSTORECONFIG_SECURE: false
OBJECTSTORECONFIG_BUCKETNAME:
DBCONFIG_DBNAME: kubeflow_pipelines_3_pipeline
DBCONFIG_USER: <set to the key 'username' in secret 'mysql-credential'> Optional: false
DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-credential'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from ml-pipeline-token-77xl8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ml-pipeline-token-77xl8:
Type: Secret (a volume populated by a Secret)
SecretName: ml-pipeline-token-77xl8
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 52m (x409 over 11h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
Warning Unhealthy 31m (x94 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Readiness probe failed:
Warning Unhealthy 31m (x29 over 10h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn (combined from similar events): Readiness probe failed: c
annot exec in a stopped state: unknown
Warning Unhealthy 17m (x95 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe failed:
Normal Pulled 7m26s (x97 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Container image "gcr.io/cloud-marketplace/google-cloud-ai
-platform/kubeflow-pipelines/apiserver:1.0.0" already present on machine
Warning Unhealthy 75s (x78 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe errored: rpc error: code = DeadlineExceede
d desc = context deadline exceeded
And the other pod:
Name: ml-pipeline-persistenceagent-67db8b8964-mlbmv
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x2238 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
SOLUTION
Do not let google handle any storage. Uncheck "Use managed storage" and set up your own artifact collections manually. You don't actually need to enter anything in these fields since the pipeline will be launched anyway.
The Does not have minimum availability error is generic. There could be many issues that trigger it. You need to analyse more in-depth in order to find the actual problem. Here are some possible causes:
Insufficient resources: check if your Node has adequate resources (CPU/Memory). If Node is ok than check the Pod's status.
Liveliness probe and/or Readiness probe failure: execute kubectl describe pod <pod-name> to check if they failed and why.
Deployment misconfiguration: review your deployment yaml file to see if there are any errors or leftovers from previous configurations.
You can also try to wait a bit as sometimes it takes some time in order to deploy everything and/or try changing your Region/Zone.

Istio Prometheus pod in CrashLoopBackOff State

I am trying to setup Istio (1.5.4) for the bookinfo example provided on their website. I have used the demo configuration profile. But on verifying istio installation it fails since Prometheus pod has entered a CrashLoopBackOff state.
NAME READY STATUS RESTARTS AGE
grafana-5f6f8cbf75-psk78 1/1 Running 0 21m
istio-egressgateway-7f9f45c966-g7k9j 1/1 Running 0 21m
istio-ingressgateway-968d69c8b-bhxk5 1/1 Running 0 21m
istio-tracing-9dd6c4f7c-7fm79 1/1 Running 0 21m
istiod-86884c8c45-sw96x 1/1 Running 0 21m
kiali-869c6894c5-wqgjb 1/1 Running 0 21m
prometheus-589c44dbfc-xkwmj 1/2 CrashLoopBackOff 8 21m
The logs for the prometheus pod:
level=warn ts=2020-05-15T09:07:53.113Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:331 build_context="(go=go1.13.5, user=root#4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:332 host_details="(Linux 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 prometheus-589c44dbfc-xkwmj (none))"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0005f63c0, 0x2c62100)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:362 +0x5229
Describe pod output:
Name: prometheus-589c44dbfc-xkwmj
Namespace: istio-system
Priority: 0
Node: inspiron-7577/192.168.0.9
Start Time: Fri, 15 May 2020 14:21:14 +0530
Labels: app=prometheus
pod-template-hash=589c44dbfc
release=istio
Annotations: sidecar.istio.io/inject: false
Status: Running
IP: 172.17.0.11
IPs:
IP: 172.17.0.11
Controlled By: ReplicaSet/prometheus-589c44dbfc
Containers:
prometheus:
Container ID: docker://b6820a000ab67a5ce31d3a38f6f0d510bd150794b2792147fc17ef8f730c03bb
Image: docker.io/prom/prometheus:v2.15.1
Image ID: docker-pullable://prom/prometheus#sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention=6h
--config.file=/etc/prometheus/prometheus.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 15 May 2020 14:37:50 +0530
Finished: Fri, 15 May 2020 14:37:53 +0530
Ready: False
Restart Count: 8
Requests:
cpu: 10m
Liveness: http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio-certs from istio-certs (rw)
/etc/prometheus from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
istio-proxy:
Container ID: docker://fa756c93510b6f402d7d88c31a5f5f066d4c254590eab70886e7835e7d3871be
Image: docker.io/istio/proxyv2:1.5.4
Image ID: docker-pullable://istio/proxyv2#sha256:e16e2801b7fd93154e8fcb5f4e2fb1240d73349d425b8be90691d48e8b9bb944
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy-prometheus
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system.svc:15012
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--connectTimeout
10s
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
--dnsRefreshRate
300s
--statusPort
15020
--trust-domain=cluster.local
--controlPlaneBootstrap=false
State: Running
Started: Fri, 15 May 2020 14:21:31 +0530
Ready: True
Restart Count: 0
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
OUTPUT_CERTS: /etc/istio-certs
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istio-pilot.istio-system.svc:15012
POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_META_POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-system (v1:metadata.namespace)
ISTIO_META_MESH_ID: cluster.local
ISTIO_META_CLUSTER_ID: Kubernetes
Mounts:
/etc/istio-certs/ from istio-certs (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
istio-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
prometheus-token-cgqbc:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-cgqbc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned istio-system/prometheus-589c44dbfc-xkwmj to inspiron-7577
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "prometheus-token-cgqbc" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulled 17m kubelet, inspiron-7577 Container image "docker.io/istio/proxyv2:1.5.4" already present on machine
Normal Created 17m kubelet, inspiron-7577 Created container istio-proxy
Normal Started 17m kubelet, inspiron-7577 Started container istio-proxy
Warning Unhealthy 17m kubelet, inspiron-7577 Readiness probe failed: HTTP probe failed with statuscode: 503
Normal Pulled 16m (x4 over 17m) kubelet, inspiron-7577 Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
Normal Created 16m (x4 over 17m) kubelet, inspiron-7577 Created container prometheus
Normal Started 16m (x4 over 17m) kubelet, inspiron-7577 Started container prometheus
Warning BackOff 2m24s (x72 over 17m) kubelet, inspiron-7577 Back-off restarting failed container
It is unable to create directory for logging. Please help with any ideas.
As istio 1.5.4 has been just released there are some issues with prometheus on minikube installed with istioctl manifest apply.
I checked it on a gcp and everything works fine there.
As a workaround, you can use istio operator which was tested by me and OP and as he mentioned in comments, it's working.
Thanks a lot #jt97! It did work.
Steps to install istio operator
Install the istioctl command.
Deploy the Istio operator: istioctl operator init.
Install istio
To install the Istio demo configuration profile using the operator, run the following command:
kubectl create ns istio-system
kubectl apply -f - <<EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: example-istiocontrolplane
spec:
profile: demo
EOF
Could you tell me why the normal installation failed?
As I mentioned in comments, I don't know yet. If I found a reason I will update this question.

docker-registry deploys to K8S get an issue "CrashLoopBackOff"

I am stuck with docker-resgitry deployment to K8S. Here I show detail what I did. Hope you can give me any ideas.
My K8S version:
ii kubeadm 1.14.1-00 amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.14.1-00 amd64 Kubernetes Command Line Tool
ii kubelet 1.14.1-00 amd64 Kubernetes Node Agent
ii kubernetes-cni 0.7.5-00 amd64 Kubernetes CNI
What I did?
Create selfcert
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.crt
Import selfcert to K8S
$ kubectl create secret tls registry-cert-secret --key cert.key --cert cert.crt
$ vim chart_values.yaml
ingress:
enabled: true
hosts:
- registry.mgmt.home.local
annotations:
kubernetes.io/ingress.class: traefik
tls:
- secretName: registry-cert-secret
hosts:
- registry.mgmt.home.local
secrets:
htpasswd: "admin:$2y$05$f95dCd6fRxQdDoPJ6mJIb.YMvR0qfhddSl3NSL1wCk1ZMl4JyFBDW"
s3:
accessKey: "admin"
secretKey: "admin2019"
storage: s3
s3:
region: us-east-1
regionEndpoint: http://minio.home.local:9000
secure: true
bucket: registry
then install with helm
$ helm install stable/docker-registry -f chart_values.yaml --name docker-registry
NAME: docker-registry
LAST DEPLOYED: Thu Oct 31 16:29:31 2019
NAMESPACE: default
STATUS: DEPLOYED
show the kubectl deployments
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 0/1 1 0 35m
get pods
$ kubectl get pods --namespace default
NAME READY STATUS RESTARTS AGE
docker-registry-6989668db6-78d84 0/1 **CrashLoopBackOff** 7 13m
docker-registry-6989668db6-jttrz 1/1 Terminating 0 37m
describe pod
$ kubectl describe pod docker-registry-6989668db6-78d84 --namespace default
Name: docker-registry-6989668db6-78d84
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-worker-promox/10.102.11.223
Start Time: Thu, 31 Oct 2019 18:03:13 +0800
Labels: app=docker-registry
pod-template-hash=6989668db6
release=docker-registry
Annotations: checksum/config: 89b20bb43a348d6b8dedacac583a596ccef4e570a935e7c5b464ba746eb88307
Status: Running
IP: 10.244.52.10
Controlled By: ReplicaSet/docker-registry-6989668db6
Containers:
docker-registry:
Container ID: docker://9a40c5e100711b122ddd78439c9fa21790f04f5a442b704140639f8fbfbd8929
Image: registry:2.7.1
Image ID: docker-pullable://registry#sha256:8004747f1e8cd820a148fb7499d71a76d45ff66bac6a29129bfdbfdc0154d146
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/registry
serve
/etc/docker/registry/config.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 31 Oct 2019 18:14:21 +0800
Finished: Thu, 31 Oct 2019 18:15:19 +0800
Ready: False
Restart Count: 7
Liveness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
REGISTRY_AUTH: htpasswd
REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
REGISTRY_AUTH_HTPASSWD_PATH: /auth/htpasswd
REGISTRY_HTTP_SECRET: <set to the key 'haSharedSecret' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_ACCESSKEY: <set to the key 's3AccessKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_SECRETKEY: <set to the key 's3SecretKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_REGION: us-east-1
REGISTRY_STORAGE_S3_REGIONENDPOINT: http://10.102.11.218:9000
REGISTRY_STORAGE_S3_BUCKET: registry
REGISTRY_STORAGE_S3_SECURE: true
Mounts:
/auth from auth (ro)
/etc/docker/registry from docker-registry-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qfwkm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
auth:
Type: Secret (a volume populated by a Secret)
SecretName: docker-registry-secret
Optional: false
docker-registry-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: docker-registry-config
ingress:
Optional: false
default-token-qfwkm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qfwkm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/docker-registry-6989668db6-78d84 to k8s-worker-promox
Normal Pulled 12m (x3 over 14m) kubelet, k8s-worker-promox Container image "registry:2.7.1" already present on machine
Normal Created 12m (x3 over 14m) kubelet, k8s-worker-promox Created container docker-registry
Normal Started 12m (x3 over 14m) kubelet, k8s-worker-promox Started container docker-registry
Normal Killing 12m (x2 over 13m) kubelet, k8s-worker-promox Container docker-registry failed liveness probe, will be restarted
Warning Unhealthy 12m (x7 over 14m) kubelet, k8s-worker-promox Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 9m8s (x15 over 13m) kubelet, k8s-worker-promox Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 4m26s (x18 over 8m40s) kubelet, k8s-worker-promox Back-off restarting failed container
I see the issue related to Liveness and Readiness. So they made the pod is trying to start/ restart many times, then it gets "Back-off".
Following the troubleshooting, I see that should be related to DNS. But, DNS should not have any issues. I tried to lookup at K8S host.
$ nslookup minio.home.local
Server: 10.102.11.201
Address: 10.102.11.201#53
Non-authoritative answer:
Name: minio.home.local
Address: 10.101.12.213
Updated November 1st. I went into another pod, then nslookup, this pod could not find minio.home.local. Is that related this issue? also I tried to replace minio.home.local to IP in *.yaml, but also get the same issue.
$ kubectl exec -it net-utils-5b5f89f777-2cwgq bash
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/# nslookup minio.home.local
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find minio.skylab.local: NXDOMAIN
root#net-utils-5b5f89f777-2cwgq:/# ping minio.home.local
ping: unknown host
Googled/ Github discussion, but I still could not fix it. Do you have any ideas?
Thank you so much.

My pod is in Container Creating state, showing TLS handshake timeout

I use docker pull command can pull mirror image correctly,But when I use the kubectl run command,my pod is in ContainerCreating state.How can I fix it.
[root#centos-master etc]# kubectl run my-nginx --image=nginx
deployment "my-nginx" created
[root#centos-master etc]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-nginx-2723453542-5s33f 0/1 ContainerCreating 0 7s
[root#centos-master etc]# kubectl describe pod my-nginx-2723453542-5s33f
Name: my-nginx-2723453542-5s33f
Namespace: default
Node: centos-minion-2/104.21.51.35
Start Time: Fri, 30 Aug 2019 16:11:57 +0800
Labels: pod-template-hash=2723453542
run=my-nginx
Status: Pending
IP:
Controllers: ReplicaSet/my-nginx-2723453542
Containers:
my-nginx:
Container ID:
Image: nginx
Image ID:
Port:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
5m 5m 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-2723453542-5s33f to centos-minion-2
<invalid> <invalid> 5 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (Get https://registry.access.redhat.com/v1/_ping: proxyconnect tcp: net/http: TLS handshake timeout)"
<invalid> <invalid> 11 {kubelet centos-minion-2} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""
As was recommended by #char and #prometherion, in order to sort out this issue you probably need to supply KUBELET_ARGS parameters with appropriate --pod-infra-container-image flag as per link provided :
KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest"
You can also take into the consideration solution mentioned by #Matthew installing subscription-manager package and subscribe host OS, as described here.