I made a simple deployment of an nginx pod and afterwards edited the deployment to add a readinessProbe and a livenessProbe via TCP like in the official docs.
Once I save it the deployment created a new replicaSet and started the new pod, but the probes never get fulfilled.
Here is the deployment yaml output of the describe command:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2020-09-21T18:51:13Z"
generation: 2
labels:
app: dep1
name: dep1
namespace: default
resourceVersion: "1683893"
selfLink: /apis/apps/v1/namespaces/default/deployments/dep1
uid: b23bceff-aca5-4c89-84c0-5882cf2df217
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: dep1
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: dep1
spec:
containers:
- image: nginx
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 15
periodSeconds: 20
successThreshold: 1
tcpSocket:
port: 8080
timeoutSeconds: 1
name: nginx
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 8080
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-09-21T18:51:16Z"
lastUpdateTime: "2020-09-21T18:51:16Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-09-21T18:51:13Z"
lastUpdateTime: "2020-09-21T19:16:07Z"
message: ReplicaSet "dep1-5d66c67794" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 2
readyReplicas: 1
replicas: 2
unavailableReplicas: 1
updatedReplicas: 1
And here are the events of the pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/dep1-5d66c67794-qd48q to docker-desktop
Normal Pulling 13m (x2 over 14m) kubelet, docker-desktop Pulling image "nginx"
Normal Killing 13m kubelet, docker-desktop Container nginx failed liveness probe, will be restarted
Normal Pulled 13m (x2 over 14m) kubelet, docker-desktop Successfully pulled image "nginx"
Normal Created 13m (x2 over 14m) kubelet, docker-desktop Created container nginx
Normal Started 13m (x2 over 14m) kubelet, docker-desktop Started container nginx
Warning Unhealthy 12m (x5 over 14m) kubelet, docker-desktop Liveness probe failed: dial tcp 10.1.0.174:8080: connect: connection refused
Warning Unhealthy 9m48s (x30 over 14m) kubelet, docker-desktop Readiness probe failed: dial tcp 10.1.0.174:8080: connect: connection refused
Warning BackOff 4m42s (x11 over 8m36s) kubelet, docker-desktop Back-off restarting failed container
Why is the connection refused when I opened the ports with the following?
ports:
- containerPort: 8080
protocol: TCP
By default, nginx webserver exposes the port 80, not just your health checks aren't working but your application will never open on port 8080. The docker image used in this tutorial is k8s.gcr.io/goproxy:0.1, you are using nginx. Try this config or change your image deployment to k8s.gcr.io/goproxy:0.1:
spec:
containers:
- image: nginx
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 15
periodSeconds: 20
successThreshold: 1
tcpSocket:
port: 80
timeoutSeconds: 1
name: nginx
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 80
timeoutSeconds: 1
Related
I'm deploying a simple application image which has readiness, startup and liveness probes through docker desktop app. I tried to search for similar issues but none of them matched the one which I'm facing therefore created this post.
Image : rahulwagh17/kubernetes:jhooq-k8s-springboot
Below is the deployment manifest used.
apiVersion: apps/v1
kind: Deployment
metadata:
name: jhooq-springboot
spec:
replicas: 2
selector:
matchLabels:
app: jhooq-springboot
template:
metadata:
labels:
app: jhooq-springboot
spec:
containers:
- name: springboot
image: rahulwagh17/kubernetes:jhooq-k8s-springboot
resources:
requests:
memory: "128Mi"
cpu: "512m"
limits:
memory: "128Mi"
cpu: "512m"
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /hello
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
livenessProbe:
httpGet:
path: /hello
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
startupProbe:
httpGet:
path: /hello
port: 8080
failureThreshold: 60
periodSeconds: 10
env:
- name: PORT
value: "8080"
---
apiVersion: v1
kind: Service
metadata:
name: jhooq-springboot
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
selector:
app: jhooq-springboot
After deploying, pods status is CrashLoopBackOff due to Startup probe failed: Get "http://10.1.0.36:8080/hello": dial tcp 10.1.0.36:8080: connect: connection refused
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/jhooq-springboot-6dbc755d48-4pqcz to docker-desktop
Warning Unhealthy 7m22s (x7 over 8m42s) kubelet, docker-desktop Startup probe failed: Get "http://10.1.0.36:8080/hello": dial tcp 10.1.0.36:8080: connect: connection refused
Normal Pulled 6m56s (x4 over 8m51s) kubelet, docker-desktop Container image "rahulwagh17/kubernetes:jhooq-k8s-springboot" already present on machine
Normal Created 6m56s (x4 over 8m51s) kubelet, docker-desktop Created container springboot
Normal Started 6m56s (x4 over 8m51s) kubelet, docker-desktop Started container springboot
Warning BackOff 3m40s (x19 over 8m6s) kubelet, docker-desktop Back-off restarting failed container
I have a multi-container pod running on AWS EKS. One web app container running on port 80 and a Redis container running on port 6379.
Once the deployment goes through, manual curl probes on the pod's IP address:port from within the cluster are always good responses.
The ingress to service is fine as well.
However, the kubelet's probes are failing, leading to restarts and I'm not sure how to replicate that probe fail nor fix it yet.
Thanks for reading!
Here are the events:
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Readiness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/app-7cddfb865b-gsvbg Liveness probe failed: Get http://10.10.14.199:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Normal Killing pod/app-7cddfb865b-gsvbg Container app failed liveness probe, will be restarted
0s Normal Pulling pod/app-7cddfb865b-gsvbg Pulling image "registry/app:latest"
0s Normal Pulled pod/app-7cddfb865b-gsvbg Successfully pulled image "registry/app:latest"
0s Normal Created pod/app-7cddfb865b-gsvbg Created container app
Making things generic, this is my deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "16"
creationTimestamp: "2021-05-26T22:01:19Z"
generation: 19
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "234691173"
selfLink: /apis/apps/v1/namespaces/default/deployments/app
uid: 3149acc2-031e-4719-89e6-abafb0bcdc3c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: app
release: app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 100%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2021-09-17T09:04:49-07:00"
creationTimestamp: null
labels:
app: app
environment: production
owner: acme
release: app
spec:
containers:
- image: redis:5.0.6-alpine
imagePullPolicy: IfNotPresent
name: redis
ports:
- containerPort: 6379
hostPort: 6379
name: redis
protocol: TCP
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: SYSTEM_ENVIRONMENT
value: production
envFrom:
- configMapRef:
name: app-production
- secretRef:
name: app-production
image: registry/app:latest
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
name: app
ports:
- containerPort: 80
hostPort: 80
name: app
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 500Mi
requests:
cpu: "1"
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
priorityClassName: critical-app
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-08-10T17:34:18Z"
lastUpdateTime: "2021-08-10T17:34:18Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-05-26T22:01:19Z"
lastUpdateTime: "2021-09-17T16:48:54Z"
message: ReplicaSet "app-7f7cb8fd4" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 19
readyReplicas: 1
replicas: 1
updatedReplicas: 1
This is my service yaml:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2021-05-05T20:11:33Z"
labels:
app: app
chart: app-1.0.0
environment: production
heritage: Helm
owner: acme
release: app
name: app
namespace: default
resourceVersion: "163989104"
selfLink: /api/v1/namespaces/default/services/app
uid: 1f54cd2f-b978-485e-a1af-984ffeeb7db0
spec:
clusterIP: 172.20.184.161
externalTrafficPolicy: Cluster
ports:
- name: http
nodePort: 32648
port: 80
protocol: TCP
targetPort: 80
selector:
app: app
release: app
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
Update 10/20/2021:
So I went with the advice to tinker the readiness probe with these generous settings:
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
These are the events:
5m21s Normal Scheduled pod/app-686494b58b-6cjsq Successfully assigned default/app-686494b58b-6cjsq to ip-10-10-14-127.compute.internal
5m20s Normal Created pod/app-686494b58b-6cjsq Created container redis
5m20s Normal Started pod/app-686494b58b-6cjsq Started container redis
5m20s Normal Pulling pod/app-686494b58b-6cjsq Pulling image "registry/app:latest"
5m20s Normal Pulled pod/app-686494b58b-6cjsq Successfully pulled image "registry/app:latest"
5m20s Normal Created pod/app-686494b58b-6cjsq Created container app
5m20s Normal Pulled pod/app-686494b58b-6cjsq Container image "redis:5.0.6-alpine" already present on machine
5m19s Normal Started pod/app-686494b58b-6cjsq Started container app
0s Warning Unhealthy pod/app-686494b58b-6cjsq Readiness probe failed: Get http://10.10.14.117:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I see the readiness probe kicking into action though when I actually request the health check page (root page) manually, which is odd. But be that is it may, the probe failure is not for the containers not running fine -- they are -- but somewhere else.
Let's go over your probes so you can understand what is going and might find a way to fix it:
### Readiness probe - "waiting" for the container to be ready
### to get to work.
###
### Liveness is executed once the pod is running which means that
### you have passed the readinessProbe so you might want to start
### with the readinessProbe first
livenessProbe:
### - Define how many retries to test the URL before restarting the pod.
### Try to increase this number and once your pod is restarted reduce
### it back to a lower value
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
###
### Delay before executing the first test
### As before - try to increase the delay and reduce it
### back when you figured out the correct value
###
initialDelaySeconds: 90
### How often (in seconds) to perform the test.
periodSeconds: 20
successThreshold: 1
### Number of seconds after which the probe times out.
### Since the value is 1 I assume that you did not change it.
### Same as before - increase the value and figure out what
### the current value
timeoutSeconds: 1
### Same comments as above + `initialDelaySeconds`
### Readiness is "waiting" for the container to be ready to
### get to work.
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
### Again, nothing new here, same comments to increase the value
### and then reduce it until you figure out what is desired value
### for this probe
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
View the logs/events
If you are not sure that the probes are the root cause, view the logs and the events to figure out what is the root cause for those failures
Linking my answer for
liveness and readiness probe for multiple containers in a pod
I'm using microk8s in ubuntu
I'm trying to run a simple hello world program but I got the error when pod created.
kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy
Here is my deployment.yaml file which I'm trying to apply.
apiVersion: v1
kind: Service
metadata:
name: grpc-hello
spec:
ports:
- port: 80
targetPort: 9000
protocol: TCP
name: http
selector:
app: grpc-hello
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grpc-hello
spec:
replicas: 1
selector:
matchLabels:
app: grpc-hello
template:
metadata:
labels:
app: grpc-hello
spec:
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1
args: [
"--http2_port=9000",
"--backend=grpc://127.0.0.1:50051",
"--service=hellogrpc.endpoints.octa-test-123.cloud.goog",
"--rollout_strategy=managed",
]
ports:
- containerPort: 9000
- name: python-grpc-hello
image: gcr.io/octa-test-123/python-grpc-hello:1.0
ports:
- containerPort: 50051
Here is what I got when I try to describe the pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned default/grpc-hello-66869cf9fb-kpr69 to azeem-ubuntu
Normal Started 30s kubelet, azeem-ubuntu Started container python-grpc-hello
Normal Pulled 30s kubelet, azeem-ubuntu Container image "gcr.io/octa-test-123/python-grpc-hello:1.0" already present on machine
Normal Created 30s kubelet, azeem-ubuntu Created container python-grpc-hello
Normal Pulled 12s (x3 over 31s) kubelet, azeem-ubuntu Container image "gcr.io/endpoints-release/endpoints-runtime:1" already present on machine
Normal Created 12s (x3 over 31s) kubelet, azeem-ubuntu Created container esp
Normal Started 12s (x3 over 30s) kubelet, azeem-ubuntu Started container esp
Warning MissingClusterDNS 8s (x10 over 31s) kubelet, azeem-ubuntu pod: "grpc-hello-66869cf9fb-kpr69_default(19c5a870-fcf5-415c-bcb6-dedfc11f936c)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning BackOff 8s (x2 over 23s) kubelet, azeem-ubuntu Back-off restarting failed container
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned default/grpc-hello-66869cf9fb-kpr69 to azeem-ubuntu
Normal Started 30s kubelet, azeem-ubuntu Started container python-grpc-hello
Normal Pulled 30s kubelet, azeem-ubuntu Container image "gcr.io/octa-test-123/python-grpc-hello:1.0" already present on machine
Normal Created 30s kubelet, azeem-ubuntu Created container python-grpc-hello
Normal Pulled 12s (x3 over 31s) kubelet, azeem-ubuntu Container image "gcr.io/endpoints-release/endpoints-runtime:1" already present on machine
Normal Created 12s (x3 over 31s) kubelet, azeem-ubuntu Created container esp
Normal Started 12s (x3 over 30s) kubelet, azeem-ubuntu Started container esp
Warning MissingClusterDNS 8s (x10 over 31s) kubelet, azeem-ubuntu pod: "grpc-hello-66869cf9fb-kpr69_default(19c5a870-fcf5-415c-bcb6-dedfc11f936c)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning BackOff 8s (x2 over 23s) kubelet, azeem-ubuntu Back-off restarting failed container
I search alot about this I find some answers but no one is working for me I also create the kube-dns for this but don't know why this still is not working. These kube-dns are running. kube-dns are in kube-system namespace.
NAME READY STATUS RESTARTS AGE
kube-dns-6dbd676f7-dfbjq 3/3 Running 0 22m
And here is what I apply to create the kube-dns
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.152.183.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-dns
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
data:
upstreamNameservers: |-
["8.8.8.8", "8.8.4.4"]
# Why set upstream ns: https://github.com/kubernetes/minikube/issues/2027
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
# replicas: not specified here:
# 1. In order to make Addon Manager do not reconcile this replicas parameter.
# 2. Default is 1.
# 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
volumes:
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
containers:
- name: kubedns
image: gcr.io/google-containers/k8s-dns-kube-dns:1.15.8
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-dir=/kube-dns-config
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
volumeMounts:
- name: kube-dns-config
mountPath: /kube-dns-config
- name: dnsmasq
image: gcr.io/google-containers/k8s-dns-dnsmasq-nanny:1.15.8
livenessProbe:
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/ip6.arpa/127.0.0.1#10053
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 20Mi
volumeMounts:
- name: kube-dns-config
mountPath: /etc/k8s/dns/dnsmasq-nanny
- name: sidecar
image: gcr.io/google-containers/k8s-dns-sidecar:1.15.8
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 20Mi
cpu: 10m
dnsPolicy: Default # Don't use cluster DNS.
serviceAccountName: kube-dns
Please let me know what I'm missing.
You have not specified how you deployed kube dns but with microk8s its recommended to use core dns. You should not deploy kube dns or core dns on your own rather you need to enable dns using this command microk8s enable dns which would deploy core DNS and setup DNS.
I had the same problem with a microk8s cluster. Despite I had already enabled the dns add-on it don't worked.
I looked for the service kube-dns's cluster-ip address with
kubectl -nkube-system get svc/kube-dns
I stopped the microk8s cluster and i edited the kubelet configuration file /var/snap/microk8s/current/args/kubelet and i added the following lines in my case:
--resolv-conf=""
--cluster-dns=A.B.C.D
--cluster-domain=cluster.local
Afterward, started the cluster and the problem don't ocurred again.
As per the kubectl docs, kubectl rollout restart is applicable for deployments, daemonsets and statefulsets. It works as expected for deployments. But for statefulsets, it restarts only one pod of the 2 pods.
✗ k rollout restart statefulset alertmanager-main (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted
✗ k rollout status statefulset alertmanager-main (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...
✗ kgp -l app=alertmanager (playground-fdp/monitoring)
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As you can see the pod alertmanager-main-1 has been restarted and its age is 20s. Whereas the other pod in the statefulset alertmanager, i.e., pod alertmanager-main-0 has not been restarted and it is age is 21h. Any idea how we can restart a statefulset after some configmap used by it has been updated?
[Update 1] Here is the statefulset configuration. As you can see the .spec.updateStrategy.rollingUpdate.partition is not set.
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
creationTimestamp: "2019-12-02T07:17:49Z"
generation: 4
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
resourceVersion: "521307"
selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
alertmanager: main
app: alertmanager
serviceName: alertmanager-operated
template:
metadata:
creationTimestamp: null
labels:
alertmanager: main
app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config/alertmanager.yaml
- --cluster.listen-address=[$(POD_IP)]:9094
- --storage.path=/alertmanager
- --data.retention=120h
- --web.listen-address=:9093
- --web.external-url=http://10.47.0.234
- --web.route-prefix=/
- --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
- --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 10.47.2.76:80/alm/alertmanager:v0.19.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
- mountPath: /alertmanager
name: alertmanager-main-db
- args:
- -webhook-url=http://localhost:9093/-/reload
- -volume-dir=/etc/alertmanager/config
image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
imagePullPolicy: IfNotPresent
name: config-reloader
resources:
limits:
cpu: 100m
memory: 25Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: alertmanager-main
serviceAccountName: alertmanager-main
terminationGracePeriodSeconds: 120
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
- emptyDir: {}
name: alertmanager-main-db
updateStrategy:
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 2
currentRevision: alertmanager-main-59d7ccf598
observedGeneration: 4
readyReplicas: 2
replicas: 2
updateRevision: alertmanager-main-59d7ccf598
updatedReplicas: 2
You did not provide whole scenario. It might depends on Readiness Probe or Update Strategy.
StatefulSet restart pods from index 0 to n-1. Details can be found here.
Reason 1*
Statefulset have 4 update strategies.
On Delete
Rolling Updates
Partitions
Forced Rollback
In Partition update you can find information that:
If a partition is specified, all Pods with an ordinal that is greater
than or equal to the partition will be updated when the StatefulSet’s
.spec.template is updated. All Pods with an ordinal that is less
than the partition will not be updated, and, even if they are deleted,
they will be recreated at the previous version. If a StatefulSet’s
.spec.updateStrategy.rollingUpdate.partition is greater than its
.spec.replicas, updates to its .spec.template will not be
propagated to its Pods. In most cases you will not need to use a
partition, but they are useful if you want to stage an update, roll
out a canary, or perform a phased roll out.
So if somewhere in StatefulSet you have set updateStrategy.rollingUpdate.partition: 1 it will restart all pods with index 1 or higher.
Example of partition: 3
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 30m
web-1 1/1 Running 0 30m
web-2 1/1 Running 0 31m
web-3 1/1 Running 0 2m45s
web-4 1/1 Running 0 3m
web-5 1/1 Running 0 3m13s
Reason 2
Configuration of Readiness probe.
If your values of initialDelaySeconds and periodSeconds are high, it might take a while before another one will be restarted. Details about those parameters can be found here.
In below example, pod will wait 10 seconds it will be running, and readiness probe is checking this each 2 seconds. Depends on values it might be cause of this behavior.
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
Reason 3
I saw that you have 2 containers in each pod.
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As describe in docs:
Running - The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.
It would be good to check if everything is ok with both containers (readinessProbe/livenessProbe, restarts etc.)
You would need to delete it. Stateful set are removed following their ordinal index with the highest ordinal index first.
Also you do not need to restart pod to re-read updated config map. This is happening automatically (after some period of time).
This might be related to your ownerReferences definition. You can try it without any owner and do the rollout again.
I'm trying to run a service exposed via port 80 and 443. The SSL termination happens on the pod.
I specified only port 80 for liveness probe but for some reasons kubernates is probing https (443) as well. Why is that and how can I stop it probing 443?
Kubernates config
apiVersion: v1
kind: Secret
metadata:
name: myregistrykey
namespace: default
data:
.dockerconfigjson: xxx==
type: kubernetes.io/dockerconfigjson
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: example-com
spec:
replicas: 0
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 50%
minReadySeconds: 30
template:
metadata:
labels:
app: example-com
spec:
imagePullSecrets:
- name: myregistrykey
containers:
- name: example-com
image: DOCKER_HOST/DOCKER_IMAGE_VERSION
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
name: http
- containerPort: 443
protocol: TCP
name: https
livenessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
readinessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
resources:
requests:
cpu: 250m
limits:
cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
name: example-com
spec:
type: LoadBalancer
ports:
- port: 80
protocol: TCP
targetPort: 80
nodePort: 0
name: http
- port: 443
protocol: TCP
targetPort: 443
nodePort: 0
name: https
selector:
app: example-com
The error/logs on pods clearly indicate that kubernates is trying to access the service via https.
kubectl describe pod example-com-86876875c7-b75hr
Name: example-com-86876875c7-b75hr
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-agentpool-37281605-0/10.240.0.4
Start Time: Sat, 17 Nov 2018 19:58:30 +0200
Labels: app=example-com
pod-template-hash=4243243173
Annotations: <none>
Status: Running
IP: 10.244.0.65
Controlled By: ReplicaSet/example-com-86876875c7
Containers:
example-com:
Container ID: docker://c5eeb03558adda435725a0df3cc2d15943966c3df53e9462e964108969c8317a
Image: example-com.azurecr.io/example-com:2018-11-17_19-58-05
Image ID: docker-pullable://example-com.azurecr.io/example-com#sha256:5d425187b8663ecfc5d6cc78f6c5dd29f1559d3687ba9d4c0421fd0ad109743e
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Sat, 17 Nov 2018 20:07:59 +0200
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sat, 17 Nov 2018 20:05:39 +0200
Finished: Sat, 17 Nov 2018 20:07:55 +0200
Ready: False
Restart Count: 3
Limits:
cpu: 500m
Requests:
cpu: 250m
Liveness: http-get http://:80/_ah/health delay=35s timeout=1s period=35s #success=1 #failure=3
Readiness: http-get http://:80/_ah/health delay=35s timeout=1s period=35s #success=1 #failure=3
Environment:
NABU: nabu
KUBERNETES_PORT_443_TCP_ADDR: agile-kube-b3e5753f.hcp.westeurope.azmk8s.io
KUBERNETES_PORT: tcp://agile-kube-b3e5753f.hcp.westeurope.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://agile-kube-b3e5753f.hcp.westeurope.azmk8s.io:443
KUBERNETES_SERVICE_HOST: agile-kube-b3e5753f.hcp.westeurope.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rcr7c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-rcr7c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rcr7c
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/example-com-86876875c7-b75hr to aks-agentpool-37281605-0
Warning Unhealthy 3m46s (x6 over 7m16s) kubelet, aks-agentpool-37281605-0 Liveness probe failed: Get https://example.com/_ah/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Pulling 3m45s (x3 over 10m) kubelet, aks-agentpool-37281605-0 pulling image "example-com.azurecr.io/example-com:2018-11-17_19-58-05"
Normal Killing 3m45s (x2 over 6m5s) kubelet, aks-agentpool-37281605-0 Killing container with id docker://example-com:Container failed liveness probe.. Container will be killed andrecreated.
Normal Pulled 3m44s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Successfully pulled image "example-com.azurecr.io/example-com:2018-11-17_19-58-05"
Normal Created 3m42s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Created container
Normal Started 3m42s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Started container
Warning Unhealthy 39s (x9 over 7m4s) kubelet, aks-agentpool-37281605-0 Readiness probe failed: Get https://example.com/_ah/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
As per your comments, you are doing an HTTP to HTTPS redirect in the pod and basically, the probe cannot connect to it. If you still want to serve a probe on port 80 you should consider using TCP probes. For example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: example-com
spec:
...
minReadySeconds: 30
template:
metadata:
labels:
app: example-com
spec:
imagePullSecrets:
- name: myregistrykey
containers:
- name: example-com
...
livenessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 35
periodSeconds: 35
...
Or you can ignore some redirects in your application depending on the URL, just like mentioned in #night-gold's answer.
The problem doesn't come from Kubernetes but from your web server. Kubernetes is doing exactly what you are asking, probing the http url but your server is redirecting it to https, that is causing the error.
If you are using apache, you should look here Apache https block redirect or there if you use nginx nginx https block redirect