Istio Prometheus pod in CrashLoopBackOff State - kubernetes

I am trying to setup Istio (1.5.4) for the bookinfo example provided on their website. I have used the demo configuration profile. But on verifying istio installation it fails since Prometheus pod has entered a CrashLoopBackOff state.
NAME READY STATUS RESTARTS AGE
grafana-5f6f8cbf75-psk78 1/1 Running 0 21m
istio-egressgateway-7f9f45c966-g7k9j 1/1 Running 0 21m
istio-ingressgateway-968d69c8b-bhxk5 1/1 Running 0 21m
istio-tracing-9dd6c4f7c-7fm79 1/1 Running 0 21m
istiod-86884c8c45-sw96x 1/1 Running 0 21m
kiali-869c6894c5-wqgjb 1/1 Running 0 21m
prometheus-589c44dbfc-xkwmj 1/2 CrashLoopBackOff 8 21m
The logs for the prometheus pod:
level=warn ts=2020-05-15T09:07:53.113Z caller=main.go:283 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.1, branch=HEAD, revision=8744510c6391d3ef46d8294a7e1f46e57407ab13)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:331 build_context="(go=go1.13.5, user=root#4b1e33c71b9d, date=20191225-01:04:15)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:332 host_details="(Linux 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 prometheus-589c44dbfc-xkwmj (none))"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-05-15T09:07:53.114Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:107 component=activeQueryTracker msg="Failed to create directory for logging active queries"
level=error ts=2020-05-15T09:07:53.157Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: no such file or directory"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x24dda5b, 0x5, 0x14, 0x2c62100, 0xc0005f63c0, 0x2c62100)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:362 +0x5229
Describe pod output:
Name: prometheus-589c44dbfc-xkwmj
Namespace: istio-system
Priority: 0
Node: inspiron-7577/192.168.0.9
Start Time: Fri, 15 May 2020 14:21:14 +0530
Labels: app=prometheus
pod-template-hash=589c44dbfc
release=istio
Annotations: sidecar.istio.io/inject: false
Status: Running
IP: 172.17.0.11
IPs:
IP: 172.17.0.11
Controlled By: ReplicaSet/prometheus-589c44dbfc
Containers:
prometheus:
Container ID: docker://b6820a000ab67a5ce31d3a38f6f0d510bd150794b2792147fc17ef8f730c03bb
Image: docker.io/prom/prometheus:v2.15.1
Image ID: docker-pullable://prom/prometheus#sha256:169b743ceb4452266915272f9c3409d36972e41cb52f3f28644e6c0609fc54e6
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention=6h
--config.file=/etc/prometheus/prometheus.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 15 May 2020 14:37:50 +0530
Finished: Fri, 15 May 2020 14:37:53 +0530
Ready: False
Restart Count: 8
Requests:
cpu: 10m
Liveness: http-get http://:9090/-/healthy delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio-certs from istio-certs (rw)
/etc/prometheus from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
istio-proxy:
Container ID: docker://fa756c93510b6f402d7d88c31a5f5f066d4c254590eab70886e7835e7d3871be
Image: docker.io/istio/proxyv2:1.5.4
Image ID: docker-pullable://istio/proxyv2#sha256:e16e2801b7fd93154e8fcb5f4e2fb1240d73349d425b8be90691d48e8b9bb944
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy-prometheus
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system.svc:15012
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--connectTimeout
10s
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
--dnsRefreshRate
300s
--statusPort
15020
--trust-domain=cluster.local
--controlPlaneBootstrap=false
State: Running
Started: Fri, 15 May 2020 14:21:31 +0530
Ready: True
Restart Count: 0
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
OUTPUT_CERTS: /etc/istio-certs
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istio-pilot.istio-system.svc:15012
POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_META_POD_NAME: prometheus-589c44dbfc-xkwmj (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-system (v1:metadata.namespace)
ISTIO_META_MESH_ID: cluster.local
ISTIO_META_CLUSTER_ID: Kubernetes
Mounts:
/etc/istio-certs/ from istio-certs (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-cgqbc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus
Optional: false
istio-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
prometheus-token-cgqbc:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-cgqbc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned istio-system/prometheus-589c44dbfc-xkwmj to inspiron-7577
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "prometheus-token-cgqbc" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 17m kubelet, inspiron-7577 MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulled 17m kubelet, inspiron-7577 Container image "docker.io/istio/proxyv2:1.5.4" already present on machine
Normal Created 17m kubelet, inspiron-7577 Created container istio-proxy
Normal Started 17m kubelet, inspiron-7577 Started container istio-proxy
Warning Unhealthy 17m kubelet, inspiron-7577 Readiness probe failed: HTTP probe failed with statuscode: 503
Normal Pulled 16m (x4 over 17m) kubelet, inspiron-7577 Container image "docker.io/prom/prometheus:v2.15.1" already present on machine
Normal Created 16m (x4 over 17m) kubelet, inspiron-7577 Created container prometheus
Normal Started 16m (x4 over 17m) kubelet, inspiron-7577 Started container prometheus
Warning BackOff 2m24s (x72 over 17m) kubelet, inspiron-7577 Back-off restarting failed container
It is unable to create directory for logging. Please help with any ideas.

As istio 1.5.4 has been just released there are some issues with prometheus on minikube installed with istioctl manifest apply.
I checked it on a gcp and everything works fine there.
As a workaround, you can use istio operator which was tested by me and OP and as he mentioned in comments, it's working.
Thanks a lot #jt97! It did work.
Steps to install istio operator
Install the istioctl command.
Deploy the Istio operator: istioctl operator init.
Install istio
To install the Istio demo configuration profile using the operator, run the following command:
kubectl create ns istio-system
kubectl apply -f - <<EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: example-istiocontrolplane
spec:
profile: demo
EOF
Could you tell me why the normal installation failed?
As I mentioned in comments, I don't know yet. If I found a reason I will update this question.

Related

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

Kubernetes CoreDNS in CrashLoopBackOff on worker node

I've searched CoreDns in CrashLoopBackOff but nothing has helped me through.
My Set
k8s - v1.20.2
CoreDns-1.7.0
Installed by kubespray with this one https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray
My Problem
CoreDNS pods on master Node are in a running state
but on worker Node coreDns pods are in crashLoopBackOff state.
enter image description here
kubectl logs -f coredns-847f564ccf-msbvp -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
CoreDns container runs a command "/coredns -conf /etc/resolv.conf" for a while
and then it is destroyed.
enter image description here
Here is Corefile
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
prefer_udp
}
cache 30
loop
reload
loadbalance
}
And one of crashed pod's event
kubectl get event --namespace kube-system --field-selector involvedObject.name=coredns-847f564ccf-lqnxs
LAST SEEN TYPE REASON OBJECT MESSAGE
4m55s Warning Unhealthy pod/coredns-847f564ccf-lqnxs Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
9m59s Warning BackOff pod/coredns-847f564ccf-lqnxs Back-off restarting failed container
And Here is CoreDns Description
Containers:
coredns:
Container ID: docker://a174cb3a3800181d1c7b78831bfd37bbf69caf60a82051d6fb29b4b9deeacce9
Image: k8s.gcr.io/coredns:1.7.0
Image ID: docker-pullable://k8s.gcr.io/coredns#sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 21 Apr 2021 21:51:44 +0900
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 21 Apr 2021 21:44:42 +0900
Finished: Wed, 21 Apr 2021 21:46:32 +0900
Ready: False
Restart Count: 9943
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=0s timeout=5s period=10s #success=1 #failure=10
Readiness: http-get http://:8181/ready delay=0s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/etc/coredns from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qqhn6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 18m (x9940 over 30d) kubelet Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
Warning Unhealthy 8m37s (x99113 over 30d) kubelet Liveness probe failed: Get "http://10.216.50.2:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning BackOff 3m35s (x121901 over 30d) kubelet Back-off restarting failed container
At this point, any suggestion at all will be helpful
I found something weird.
I test in node1, I can access Coredns pod in the node2, but I can not access Coredns pod in the node1.
I use calico for cni
in node1, coredns1 - 1.1.1.1
in node2, coredns2 - 2.2.2.2
in node1.
access 1.1.1.1:8080/health -> timeout
access 2.2.2.2:8080/health -> ok
in node2.
access 1.1.1.1:8080/health -> ok
access 2.2.2.2:8080/health -> timeout
If Containerd, and Kubelet are under Proxy, please add private IP range: 10.0.0.0/8 into NO_PROXY configuration to make sure they can pull the images.
E.g:
[root#dev-systrdemo301z phananhtuan01]# cat /etc/systemd/system/containerd.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=dev-proxy.prod.xx.local:8300"
Environment="HTTPS_PROXY=dev-proxy.prod.xx.local:8300"
Environment="NO_PROXY=localhost,127.0.0.0/8,100.67.253.157/24,10.0.0.0/8"
Please refer to this article.

docker-registry deploys to K8S get an issue "CrashLoopBackOff"

I am stuck with docker-resgitry deployment to K8S. Here I show detail what I did. Hope you can give me any ideas.
My K8S version:
ii kubeadm 1.14.1-00 amd64 Kubernetes Cluster Bootstrapping Tool
ii kubectl 1.14.1-00 amd64 Kubernetes Command Line Tool
ii kubelet 1.14.1-00 amd64 Kubernetes Node Agent
ii kubernetes-cni 0.7.5-00 amd64 Kubernetes CNI
What I did?
Create selfcert
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.crt
Import selfcert to K8S
$ kubectl create secret tls registry-cert-secret --key cert.key --cert cert.crt
$ vim chart_values.yaml
ingress:
enabled: true
hosts:
- registry.mgmt.home.local
annotations:
kubernetes.io/ingress.class: traefik
tls:
- secretName: registry-cert-secret
hosts:
- registry.mgmt.home.local
secrets:
htpasswd: "admin:$2y$05$f95dCd6fRxQdDoPJ6mJIb.YMvR0qfhddSl3NSL1wCk1ZMl4JyFBDW"
s3:
accessKey: "admin"
secretKey: "admin2019"
storage: s3
s3:
region: us-east-1
regionEndpoint: http://minio.home.local:9000
secure: true
bucket: registry
then install with helm
$ helm install stable/docker-registry -f chart_values.yaml --name docker-registry
NAME: docker-registry
LAST DEPLOYED: Thu Oct 31 16:29:31 2019
NAMESPACE: default
STATUS: DEPLOYED
show the kubectl deployments
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
docker-registry 0/1 1 0 35m
get pods
$ kubectl get pods --namespace default
NAME READY STATUS RESTARTS AGE
docker-registry-6989668db6-78d84 0/1 **CrashLoopBackOff** 7 13m
docker-registry-6989668db6-jttrz 1/1 Terminating 0 37m
describe pod
$ kubectl describe pod docker-registry-6989668db6-78d84 --namespace default
Name: docker-registry-6989668db6-78d84
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-worker-promox/10.102.11.223
Start Time: Thu, 31 Oct 2019 18:03:13 +0800
Labels: app=docker-registry
pod-template-hash=6989668db6
release=docker-registry
Annotations: checksum/config: 89b20bb43a348d6b8dedacac583a596ccef4e570a935e7c5b464ba746eb88307
Status: Running
IP: 10.244.52.10
Controlled By: ReplicaSet/docker-registry-6989668db6
Containers:
docker-registry:
Container ID: docker://9a40c5e100711b122ddd78439c9fa21790f04f5a442b704140639f8fbfbd8929
Image: registry:2.7.1
Image ID: docker-pullable://registry#sha256:8004747f1e8cd820a148fb7499d71a76d45ff66bac6a29129bfdbfdc0154d146
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/registry
serve
/etc/docker/registry/config.yml
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 31 Oct 2019 18:14:21 +0800
Finished: Thu, 31 Oct 2019 18:15:19 +0800
Ready: False
Restart Count: 7
Liveness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
REGISTRY_AUTH: htpasswd
REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
REGISTRY_AUTH_HTPASSWD_PATH: /auth/htpasswd
REGISTRY_HTTP_SECRET: <set to the key 'haSharedSecret' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_ACCESSKEY: <set to the key 's3AccessKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_SECRETKEY: <set to the key 's3SecretKey' in secret 'docker-registry-secret'> Optional: false
REGISTRY_STORAGE_S3_REGION: us-east-1
REGISTRY_STORAGE_S3_REGIONENDPOINT: http://10.102.11.218:9000
REGISTRY_STORAGE_S3_BUCKET: registry
REGISTRY_STORAGE_S3_SECURE: true
Mounts:
/auth from auth (ro)
/etc/docker/registry from docker-registry-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qfwkm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
auth:
Type: Secret (a volume populated by a Secret)
SecretName: docker-registry-secret
Optional: false
docker-registry-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: docker-registry-config
ingress:
Optional: false
default-token-qfwkm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qfwkm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/docker-registry-6989668db6-78d84 to k8s-worker-promox
Normal Pulled 12m (x3 over 14m) kubelet, k8s-worker-promox Container image "registry:2.7.1" already present on machine
Normal Created 12m (x3 over 14m) kubelet, k8s-worker-promox Created container docker-registry
Normal Started 12m (x3 over 14m) kubelet, k8s-worker-promox Started container docker-registry
Normal Killing 12m (x2 over 13m) kubelet, k8s-worker-promox Container docker-registry failed liveness probe, will be restarted
Warning Unhealthy 12m (x7 over 14m) kubelet, k8s-worker-promox Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 9m8s (x15 over 13m) kubelet, k8s-worker-promox Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 4m26s (x18 over 8m40s) kubelet, k8s-worker-promox Back-off restarting failed container
I see the issue related to Liveness and Readiness. So they made the pod is trying to start/ restart many times, then it gets "Back-off".
Following the troubleshooting, I see that should be related to DNS. But, DNS should not have any issues. I tried to lookup at K8S host.
$ nslookup minio.home.local
Server: 10.102.11.201
Address: 10.102.11.201#53
Non-authoritative answer:
Name: minio.home.local
Address: 10.101.12.213
Updated November 1st. I went into another pod, then nslookup, this pod could not find minio.home.local. Is that related this issue? also I tried to replace minio.home.local to IP in *.yaml, but also get the same issue.
$ kubectl exec -it net-utils-5b5f89f777-2cwgq bash
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/#
root#net-utils-5b5f89f777-2cwgq:/# nslookup minio.home.local
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find minio.skylab.local: NXDOMAIN
root#net-utils-5b5f89f777-2cwgq:/# ping minio.home.local
ping: unknown host
Googled/ Github discussion, but I still could not fix it. Do you have any ideas?
Thank you so much.

Improper cni install preventing coredns pods from starting

Just installed a single master cluster using kubeadm v1.15.0. However, coredns seems stuck in pending mode:
coredns-5c98db65d4-4pm65 0/1 Pending 0 2m17s <none> <none> <none> <none>
coredns-5c98db65d4-55hcc 0/1 Pending 0 2m2s <none> <none> <none> <none>
the following is what shows up for the pod:
kubectl describe pods coredns-5c98db65d4-4pm65 --namespace=kube-system
Name: coredns-5c98db65d4-4pm65
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5c98db65d4
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5c98db65d4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.3.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-5t2wn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-5t2wn:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-5t2wn
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 61s (x4 over 5m21s) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
I removed the taint on the master node, to no avail. Shouldn't I be able to created a single node master without any problems like this. I know scheduling pods on the master is not possible without removing the taint, but this is odd.
I tried adding the latest calico cni, to no avail, too.
I get the following running journalctl (systemctl shows no errors):
sudo journalctl -xn --unit kubelet.service
[sudo] password for gms:
-- Logs begin at Fri 2019-07-12 04:31:34 CDT, end at Tue 2019-07-16 16:58:17 CDT. --
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:54.122355 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:54.400606 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:59.124863 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:59.400924 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:04.127120 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:04.401266 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:09.129287 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:09.401520 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:14.133059 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:14.402008 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Indeed, when I look in /etc/cni/net.d there is nothing there -> yes, I ran kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml... this is the output when I apply this:
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
I ran the following on the pod for calico-node, which is stuck in the following state:
calico-node-tcfhw 0/1 Init:0/3 0 11m 10.32.3.158
describe pods calico-node-tcfhw --namespace=kube-system
Name: calico-node-tcfhw
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: thalia0.ahc.umn.edu/10.32.3.158
Start Time: Tue, 16 Jul 2019 18:08:25 -0500
Labels: controller-revision-hash=844ddd97c6
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 10.32.3.158
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://1e1bf9e65cb182656f6f06a1bb8291237562f0f5a375e557a454942e81d32063
Image: calico/cni:v3.8.0
Image ID: docker-pullable://docker.io/calico/cni#sha256:decba0501ab0658e6e7da2f5625f1eabb8aba5690f9206caba3bf98caca5094c
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Running
Started: Tue, 16 Jul 2019 18:08:26 -0500
Ready: False
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
install-cni:
Container ID:
Image: calico/cni:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
flexvol-driver:
Container ID:
Image: calico/pod2daemon-flexvol:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Containers:
calico-node:
Container ID:
Image: calico/node:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
Liveness: http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-b9c6p:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-b9c6p
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m15s default-scheduler Successfully assigned kube-system/calico-node-tcfhw to thalia0.ahc.umn.edu
Normal Pulled 9m14s kubelet, thalia0.ahc.umn.edu Container image "calico/cni:v3.8.0" already present on machine
Normal Created 9m14s kubelet, thalia0.ahc.umn.edu Created container upgrade-ipam
Normal Started 9m14s kubelet, thalia0.ahc.umn.edu Started container upgrade-ipam
I tried Flannel as a cni, but that was even worse. The kube-proxy wouldn't even start due to a taint!
EDIT ADDENDUM
Should the kube-controller-manager and kube-scheduler not have defined endpoints?
[gms#thalia0 ~]$ kubectl get ep --namespace=kube-system -o wide
NAME ENDPOINTS AGE
kube-controller-manager <none> 19h
kube-dns <none> 19h
kube-scheduler <none> 19h
[gms#thalia0 ~]$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-nmn4g 0/1 Pending 0 19h
coredns-5c98db65d4-qv8fm 0/1 Pending 0 19h
etcd-thalia0.x.x.edu. 1/1 Running 0 19h
kube-apiserver-thalia0.x.x.edu 1/1 Running 0 19h
kube-controller-manager-thalia0.x.x.edu 1/1 Running 0 19h
kube-proxy-4hrdc 1/1 Running 0 19h
kube-proxy-vb594 1/1 Running 0 19h
kube-proxy-zwrst 1/1 Running 0 19h
kube-scheduler-thalia0.x.x.edu 1/1 Running 0 19h
Lastly, for sanity's sake, I tried v1.13.1, and voila! Success:
NAME READY STATUS RESTARTS AGE
calico-node-pbrps 2/2 Running 0 15s
coredns-86c58d9df4-g5944 1/1 Running 0 2m40s
coredns-86c58d9df4-zntjl 1/1 Running 0 2m40s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 110s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 105s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 103s
kube-proxy-qxh2h 1/1 Running 0 2m39s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 117s
EDIT 2
Tried sudo kubeadm upgrade plan and got an error on api-server's health and bad certs.
Ran this on the api-server:
kubectl logs kube-apiserver-thalia0.x.x.edu --namespace=kube-system1
and got a ton of errors of the sort TLS handshake error from 10.x.x.157:52384: remote error: tls: bad certificate, which were from nodes that have long been deleted from the cluster and, long after several kubeadm resets on the master, along with uninstall/reinstall of kubelet, kubeadm, etc.
Why are these old nodes showing up? Don't the certs get recreated on a kubeadm init?
This issue https://github.com/projectcalico/calico/issues/2699 had similar symptoms and indicates that deleting /var/lib/cni/ fixed the issue. You could see if it exists and delete it if so.
Coreos-dns doesn't start until Calico is started, check if your worker nodes are ready with this command
kubectl get nodes -owide
kubectl describe node <your-node>
or
kubectl get node <your-node> -oyaml
Other thing to check is the following message in the log :
"Unable to update cni config: No networks found in /etc/cni/net.d"
what you have in that directory?
Maybe cni isn't configured properly.
That directory /etc/cni/net.d should contain 2 files :
10-calico.conflist calico-kubeconfig
Below is the content of this two files, check if you have files like this in your directory
[root#master net.d]# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "master",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
[root#master net.d]# cat calico-kubeconfig
# Kubeconfig file for Calico CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
cluster:
server: https://[10.20.0.1]:443
certificate-authority-data: LSRt.... tLQJ=
users:
- name: calico
user:
token: "eUJh .... ZBoIA"
contexts:
- name: calico-context
context:
cluster: local
user: calico
current-context: calico-context

Istio BookInfo sample pods not starting on Minishift 3.11.0 - Init:CrashLoopBackOff - message: 'containers with incomplete status: [istio-init]'

I have a fresh installation of minishift (v1.32.0+009893b) running on MacOS Mojave.
I start minishift with 4 CPUs and 8GB RAM: minishift start --cpus 4 --memory 8GB
I have followed the instructions to prepare the Openshift (minishift) environment described here: https://istio.io/docs/setup/kubernetes/prepare/platform-setup/openshift/
I've installed Istio following the documentation without any error: https://istio.io/docs/setup/kubernetes/install/kubernetes/
istio-system namespace pods
$> kubectl get pod -n istio-system
grafana-7b46bf6b7c-27pn8 1/1 Running 1 26m
istio-citadel-5878d994cc-5tsx2 1/1 Running 1 26m
istio-cleanup-secrets-1.1.1-vwzq5 0/1 Completed 0 26m
istio-egressgateway-976f94bd-pst7g 1/1 Running 1 26m
istio-galley-7855cc97dc-s7wvt 1/1 Running 0 1m
istio-grafana-post-install-1.1.1-nvdvl 0/1 Completed 0 26m
istio-ingressgateway-794cfcf8bc-zkfnc 1/1 Running 1 26m
istio-pilot-746995884c-6l8jm 2/2 Running 2 26m
istio-policy-74c95b5657-g2cvq 2/2 Running 10 26m
istio-security-post-install-1.1.1-f4524 0/1 Completed 0 26m
istio-sidecar-injector-59fc9d6f7d-z48rc 1/1 Running 1 26m
istio-telemetry-6c5d7b55bf-cmnvp 2/2 Running 10 26m
istio-tracing-75dd89b8b4-pp9c5 1/1 Running 2 26m
kiali-5d68f4c676-5lsj9 1/1 Running 1 26m
prometheus-89bc5668c-rbrd7 1/1 Running 1 26m
I deploy the BookInfo sample in my istio-test namespace: istioctl kube-inject -f bookinfo.yaml | kubectl -n istio-test apply -f - but the pods don't start.
oc command info
$> oc get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
details 172.30.204.102 <none> 9080/TCP 21m
productpage 172.30.72.33 <none> 9080/TCP 21m
ratings 172.30.10.155 <none> 9080/TCP 21m
reviews 172.30.169.6 <none> 9080/TCP 21m
$> kubectl get pods
NAME READY STATUS RESTARTS AGE
details-v1-5c879644c7-vtb6g 0/2 Init:CrashLoopBackOff 12 21m
productpage-v1-59dff9bdf9-l2r2d 0/2 Init:CrashLoopBackOff 12 21m
ratings-v1-89485cb9c-vk58r 0/2 Init:CrashLoopBackOff 12 21m
reviews-v1-5db4f45f5d-ddqrm 0/2 Init:CrashLoopBackOff 12 21m
reviews-v2-575959b5b7-8gppt 0/2 Init:CrashLoopBackOff 12 21m
reviews-v3-79b65d46b4-zs865 0/2 Init:CrashLoopBackOff 12 21m
For some reason the init containers (istio-init) are crashing:
oc describe pod details-v1-5c879644c7-vtb6g
Name: details-v1-5c879644c7-vtb6g
Namespace: istio-test
Node: localhost/192.168.64.13
Start Time: Sat, 30 Mar 2019 14:38:49 +0100
Labels: app=details
pod-template-hash=1743520073
version=v1
Annotations: openshift.io/scc=privileged
sidecar.istio.io/status={"version":"b83fa303cbac0223b03f9fc5fbded767303ad2f7992390bfda6b9be66d960332","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs...
Status: Pending
IP: 172.17.0.24
Controlled By: ReplicaSet/details-v1-5c879644c7
Init Containers:
istio-init:
Container ID: docker://0d8b62ad72727f39d8a4c9278592c505ccbcd52ed8038c606b6256056a3a8d12
Image: docker.io/istio/proxy_init:1.1.1
Image ID: docker-pullable://docker.io/istio/proxy_init#sha256:5008218de88915f0b45930d69c5cdd7cd4ec94244e9ff3cfe3cec2eba6d99440
Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
9080
-d
15020
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 30 Mar 2019 14:58:18 +0100
Finished: Sat, 30 Mar 2019 14:58:19 +0100
Ready: False
Restart Count: 12
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-58j6f (ro)
Containers:
details:
Container ID:
Image: istio/examples-bookinfo-details-v1:1.10.1
Image ID:
Port: 9080/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-58j6f (ro)
istio-proxy:
Container ID:
Image: docker.io/istio/proxyv2:1.1.1
Image ID:
Port: 15090/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
details.$(POD_NAMESPACE)
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15010
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--proxyAdminPort
15000
--concurrency
2
--controlPlaneAuthPolicy
NONE
--statusPort
15020
--applicationPorts
9080
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 128Mi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
POD_NAME: details-v1-5c879644c7-vtb6g (v1:metadata.name)
POD_NAMESPACE: istio-test (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: details-v1-5c879644c7-vtb6g (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-test (v1:metadata.namespace)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_METAJSON_LABELS: {"app":"details","version":"v1"}
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-58j6f (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
default-token-58j6f:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-58j6f
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
23m 23m 1 default-scheduler Normal Scheduled Successfully assigned istio-test/details-v1-5c879644c7-vtb6g to localhost
23m 23m 1 kubelet, localhost spec.initContainers{istio-init} Normal Pulling pulling image "docker.io/istio/proxy_init:1.1.1"
22m 22m 1 kubelet, localhost spec.initContainers{istio-init} Normal Pulled Successfully pulled image "docker.io/istio/proxy_init:1.1.1"
22m 21m 5 kubelet, localhost spec.initContainers{istio-init} Normal Created Created container
22m 21m 5 kubelet, localhost spec.initContainers{istio-init} Normal Started Started container
22m 21m 4 kubelet, localhost spec.initContainers{istio-init} Normal Pulled Container image "docker.io/istio/proxy_init:1.1.1" already present on machine
22m 17m 24 kubelet, localhost spec.initContainers{istio-init} Warning BackOff Back-off restarting failed container
9m 9m 1 kubelet, localhost Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
9m 8m 4 kubelet, localhost spec.initContainers{istio-init} Normal Pulled Container image "docker.io/istio/proxy_init:1.1.1" already present on machine
9m 8m 4 kubelet, localhost spec.initContainers{istio-init} Normal Created Created container
9m 8m 4 kubelet, localhost spec.initContainers{istio-init} Normal Started Started container
9m 3m 31 kubelet, localhost spec.initContainers{istio-init} Warning BackOff Back-off restarting failed container
I can't see any info that gives any hint appart from Exit code: 1 and
status:
conditions:
- lastProbeTime: null
lastTransitionTime: '2019-03-30T13:38:50Z'
message: 'containers with incomplete status: [istio-init]'
reason: ContainersNotInitialized
status: 'False'
type: Initialized
UPDATE:
This is the istio-init Init container log:
kubectl -n istio-test logs -f details-v1-5c879644c7-m9k6q istio-init
Environment:
------------
ENVOY_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_MARK=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=
Variables:
----------
PROXY_PORT=15001
INBOUND_CAPTURE_PORT=15001
PROXY_UID=1337
INBOUND_INTERCEPTION_MODE=REDIRECT
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=9080
INBOUND_PORTS_EXCLUDE=15020
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
KUBEVIRT_INTERFACES=
ENABLE_INBOUND_IPV6=
# Generated by iptables-save v1.6.0 on Sat Mar 30 22:21:52 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_REDIRECT - [0:0]
COMMIT
# Completed on Sat Mar 30 22:21:52 2019
# Generated by iptables-save v1.6.0 on Sat Mar 30 22:21:52 2019
*filter
:INPUT ACCEPT [3:180]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [3:120]
COMMIT
# Completed on Sat Mar 30 22:21:52 2019
+ iptables -t nat -N ISTIO_REDIRECT
+ iptables -t nat -A ISTIO_REDIRECT -p tcp -j REDIRECT --to-port 15001
iptables: No chain/target/match by that name.
+ dump
+ iptables-save
+ ip6tables-save
I solved the problem adding privileged: true in the istio-init pod securityContext configuration:
name: istio-init
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 10m
memory: 10Mi
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true