Kubectl rollout restart for statefulset - kubernetes

As per the kubectl docs, kubectl rollout restart is applicable for deployments, daemonsets and statefulsets. It works as expected for deployments. But for statefulsets, it restarts only one pod of the 2 pods.
✗ k rollout restart statefulset alertmanager-main (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted
✗ k rollout status statefulset alertmanager-main (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...
✗ kgp -l app=alertmanager (playground-fdp/monitoring)
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As you can see the pod alertmanager-main-1 has been restarted and its age is 20s. Whereas the other pod in the statefulset alertmanager, i.e., pod alertmanager-main-0 has not been restarted and it is age is 21h. Any idea how we can restart a statefulset after some configmap used by it has been updated?
[Update 1] Here is the statefulset configuration. As you can see the .spec.updateStrategy.rollingUpdate.partition is not set.
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
creationTimestamp: "2019-12-02T07:17:49Z"
generation: 4
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
resourceVersion: "521307"
selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
alertmanager: main
app: alertmanager
serviceName: alertmanager-operated
template:
metadata:
creationTimestamp: null
labels:
alertmanager: main
app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config/alertmanager.yaml
- --cluster.listen-address=[$(POD_IP)]:9094
- --storage.path=/alertmanager
- --data.retention=120h
- --web.listen-address=:9093
- --web.external-url=http://10.47.0.234
- --web.route-prefix=/
- --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
- --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 10.47.2.76:80/alm/alertmanager:v0.19.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
- mountPath: /alertmanager
name: alertmanager-main-db
- args:
- -webhook-url=http://localhost:9093/-/reload
- -volume-dir=/etc/alertmanager/config
image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
imagePullPolicy: IfNotPresent
name: config-reloader
resources:
limits:
cpu: 100m
memory: 25Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: alertmanager-main
serviceAccountName: alertmanager-main
terminationGracePeriodSeconds: 120
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
- emptyDir: {}
name: alertmanager-main-db
updateStrategy:
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 2
currentRevision: alertmanager-main-59d7ccf598
observedGeneration: 4
readyReplicas: 2
replicas: 2
updateRevision: alertmanager-main-59d7ccf598
updatedReplicas: 2

You did not provide whole scenario. It might depends on Readiness Probe or Update Strategy.
StatefulSet restart pods from index 0 to n-1. Details can be found here.
Reason 1*
Statefulset have 4 update strategies.
On Delete
Rolling Updates
Partitions
Forced Rollback
In Partition update you can find information that:
If a partition is specified, all Pods with an ordinal that is greater
than or equal to the partition will be updated when the StatefulSet’s
.spec.template is updated. All Pods with an ordinal that is less
than the partition will not be updated, and, even if they are deleted,
they will be recreated at the previous version. If a StatefulSet’s
.spec.updateStrategy.rollingUpdate.partition is greater than its
.spec.replicas, updates to its .spec.template will not be
propagated to its Pods. In most cases you will not need to use a
partition, but they are useful if you want to stage an update, roll
out a canary, or perform a phased roll out.
So if somewhere in StatefulSet you have set updateStrategy.rollingUpdate.partition: 1 it will restart all pods with index 1 or higher.
Example of partition: 3
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 30m
web-1 1/1 Running 0 30m
web-2 1/1 Running 0 31m
web-3 1/1 Running 0 2m45s
web-4 1/1 Running 0 3m
web-5 1/1 Running 0 3m13s
Reason 2
Configuration of Readiness probe.
If your values of initialDelaySeconds and periodSeconds are high, it might take a while before another one will be restarted. Details about those parameters can be found here.
In below example, pod will wait 10 seconds it will be running, and readiness probe is checking this each 2 seconds. Depends on values it might be cause of this behavior.
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
Reason 3
I saw that you have 2 containers in each pod.
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As describe in docs:
Running - The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.
It would be good to check if everything is ok with both containers (readinessProbe/livenessProbe, restarts etc.)

You would need to delete it. Stateful set are removed following their ordinal index with the highest ordinal index first.
Also you do not need to restart pod to re-read updated config map. This is happening automatically (after some period of time).

This might be related to your ownerReferences definition. You can try it without any owner and do the rollout again.

Related

Why my GKE node pool does not auto-scale down?

I've got a preemptible node pool which is clearly under-utilized:
The node pool hosts a deployment with HPA with the following setup:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
labels:
app: backend
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
initContainers:
- name: wait-for-database
image: ### IMAGE ###
command: ['bash', 'init.sh']
containers:
- name: backend
image: ### IMAGE ###
command: ["bash", "entrypoint.sh"]
imagePullPolicy: Always
resources:
requests:
memory: "200M"
cpu: "50m"
ports:
- name: probe-port
containerPort: 8080
hostPort: 8080
volumeMounts:
- name: static-shared-data
mountPath: /static
readinessProbe:
httpGet:
path: /readiness/
port: probe-port
failureThreshold: 5
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
- name: nginx
image: nginx:alpine
resources:
requests:
memory: "400M"
cpu: "20m"
ports:
- containerPort: 80
volumeMounts:
- name: nginx-proxy-config
mountPath: /etc/nginx/conf.d/default.conf
subPath: app.conf
- name: static-shared-data
mountPath: /static
volumes:
- name: nginx-proxy-config
configMap:
name: backend-nginx
- name: static-shared-data
emptyDir: {}
nodeSelector:
cloud.google.com/gke-nodepool: app-dev
tolerations:
- effect: NoSchedule
key: workload
operator: Equal
value: dev
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: backend
namespace: default
spec:
maxReplicas: 12
minReplicas: 8
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: backend
metrics:
- resource:
name: cpu
targetAverageUtilization: 50
type: Resource
---
The node pool also has the toleration label.
The HPA utilization shows this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
backend-develop Deployment/backend-develop 10%/50% 8 12 8 38d
But the node pool does not scale down for about a day. No heavy load on this deployment:
NAME STATUS ROLES AGE VERSION
gke-dev-app-dev-fee1a901-fvw9 Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-gls7 Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-lf3f Ready <none> 24h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-lgw9 Ready <none> 3d10h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-qxkz Ready <none> 3h35m v1.14.10-gke.36
gke-dev-app-dev-fee1a901-s10l Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-sj4d Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-vdnw Ready <none> 27h v1.14.10-gke.36
There's no affinity settings for this deployment and node pool. Some of the nodes easily pack several same pods, but others just hold one pod for hours, no scale down happens.
What could be wrong?
The issue was:
hostPort: 8080
This lead to FailedScheduling didn't have free ports.
That's why the nodes were kept online.

K8s pod priority & outOfPods

we had the situation that the k8s-cluster was running out of pods after an update (kubernetes or more specific: ICP) resulting in "OutOfPods" error messages. The reason was a lower "podsPerCore"-setting which we corrected afterwards. Until then there were pods with a provided priorityClass (1000000) which cannot be scheduled. Others - without a priorityClass (0) - were scheduled. I assumed a different behaviour. I thought that the K8s scheduler would kill pods with no priority so that a pod with priority can be scheduled. Was I wrong?
Thats just a question for understanding because I want to guarantee that the priority pods are running, no matter what.
Thanks
Pod with Prio:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: ibm-anyuid-hostpath-psp
creationTimestamp: "2019-12-16T13:39:21Z"
generateName: dms-config-server-555dfc56-
labels:
app: config-server
pod-template-hash: 555dfc56
release: dms-config-server
name: dms-config-server-555dfc56-2ssxb
namespace: dms
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: dms-config-server-555dfc56
uid: c29c40e1-1da7-11ea-b646-005056a72568
resourceVersion: "65065735"
selfLink: /api/v1/namespaces/dms/pods/dms-config-server-555dfc56-2ssxb
uid: 7758e138-2009-11ea-9ff4-005056a72568
spec:
containers:
- env:
- name: CONFIG_SERVER_GIT_USERNAME
valueFrom:
secretKeyRef:
key: username
name: dms-config-server-git
- name: CONFIG_SERVER_GIT_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: dms-config-server-git
envFrom:
- configMapRef:
name: dms-config-server-app-env
- configMapRef:
name: dms-config-server-git
image: docker.repository..../infra/config-server:2.0.8
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: config-server
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 250m
memory: 600Mi
requests:
cpu: 10m
memory: 300Mi
securityContext:
capabilities:
drop:
- MKNOD
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v7tpv
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kub-test-worker-02
priority: 1000000
priorityClassName: infrastructure
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-v7tpv
secret:
defaultMode: 420
secretName: default-token-v7tpv
Pod without Prio (just an example within the same namespace):
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: ibm-anyuid-hostpath-psp
creationTimestamp: "2019-09-10T09:09:28Z"
generateName: produkt-service-57d448979d-
labels:
app: produkt-service
pod-template-hash: 57d448979d
release: dms-produkt-service
name: produkt-service-57d448979d-4x5qs
namespace: dms
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: produkt-service-57d448979d
uid: 4096ab97-5cee-11e9-97a2-005056a72568
resourceVersion: "65065755"
selfLink: /api/v1/namespaces/dms/pods/produkt-service-57d448979d-4x5qs
uid: b112c5f7-d3aa-11e9-9b1b-005056a72568
spec:
containers:
- image: docker-snapshot.repository..../dms/produkt- service:0b6e0ecc88a28d2a91ffb1db61f8ca99c09a9d92
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: produkt-service
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
capabilities:
drop:
- MKNOD
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v7tpv
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kub-test-worker-02
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-v7tpv
secret:
defaultMode: 420
secretName: default-token-v7tpv
There could be a lot of circumstances that will alter the work of the scheduler. There is a documentation talking about it: Pod priority and preemption.
Be aware of the fact that this features were deemed stable at version 1.14.0
From the IBM perspective please take in mind that the version 1.13.9 will be supported until 19 of February 2020!.
You are correct that pods with lower priority should be replaced with higher priority pods.
Let me elaborate on that with an example:
Example
Let's assume a Kubernetes cluster with 3 nodes (1 master and 2 nodes):
By default you cannot schedule normal pods on master node
The only worker node that can schedule the pods has 8GB of RAM.
2nd worker node has a taint that disables scheduling.
This example will base on RAM usage but it can be used in the same manner as CPU time.
Priority Class
There are 2 priority classes:
zero-priority (0)
high-priority (1 000 000)
YAML definition of zero priority class:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: zero-priority
value: 0
globalDefault: false
description: "This is priority class for hello pod"
globalDefault: false is used for objects that do not have assigned priority class. It will assign this class by default.
YAML definition of high priority class:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This is priority class for goodbye pod"
To apply this priority classes you will need to invoke:
$ kubectl apply -f FILE.yaml
Deployments
With above objects you can create deployments:
Hello - deployment with low priority
Goodbye - deployment with high priority
YAML definition of hello deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello
spec:
selector:
matchLabels:
app: hello
version: 1.0.0
replicas: 10
template:
metadata:
labels:
app: hello
version: 1.0.0
spec:
containers:
- name: hello
image: "gcr.io/google-samples/hello-app:1.0"
env:
- name: "PORT"
value: "50001"
resources:
requests:
memory: "128Mi"
priorityClassName: zero-priority
Please take a specific look on this fragment:
resources:
requests:
memory: "128Mi"
priorityClassName: zero-priority
It will limit number of pods because of the requested resources as well as it will assign low priority to this deployment.
YAML definition of goodbye deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: goodbye
spec:
selector:
matchLabels:
app: goodbye
version: 2.0.0
replicas: 3
template:
metadata:
labels:
app: goodbye
version: 2.0.0
spec:
containers:
- name: goodbye
image: "gcr.io/google-samples/hello-app:2.0"
env:
- name: "PORT"
value: "50001"
resources:
requests:
memory: "6144Mi"
priorityClassName: high-priority
Also please take a specific look on this fragment:
resources:
requests:
memory: "6144Mi"
priorityClassName: high-priority
This pods will have much higher request for RAM and high priority.
Testing and troubleshooting
There is no enough information to properly troubleshoot issues like this. Without extensive logs of many components starting from kubelet to pods,nodes and deployments itself.
Apply hello deployment and see what happens:
$ kubectl apply -f hello.yaml
Get basic information about the deployment with command:
$ kubectl get deployments hello
Output should look like that after a while:
NAME READY UP-TO-DATE AVAILABLE AGE
hello 10/10 10 10 9s
As you can see all of the pods are ready and available. The requested resources were assigned to them.
To get more details for troubleshooting purposes you can invoke:
$ kubectl describe deployment hello
$ kubectl describe node NAME_OF_THE_NODE
Example information about allocated resources from the above command:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 1280Mi (17%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
Apply goodbye deployment and see what happens:
$ kubectl apply -f goodbye.yaml
Get basic information about the deployments with command:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
goodbye 1/3 3 1 25s
hello 9/10 10 9 11m
As you can see there is goodbye deployment but only 1 pod is available. And despite the fact that the goodbye has much higher priority, the hello pods are still working.
Why it is like that?:
$ kubectl describe node NAME_OF_THE_NODE
Non-terminated Pods: (13 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default goodbye-575968c8d6-bnrjc 0 (0%) 0 (0%) 6Gi (83%) 0 (0%) 15m
default hello-fdfb55c96-6hkwp 0 (0%) 0 (0%) 128Mi (1%) 0 (0%) 27m
default hello-fdfb55c96-djrwf 0 (0%) 0 (0%) 128Mi (1%) 0 (0%) 27m
Take a look at requested memory for goodbye pod. It is as described above as 6Gi.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 7296Mi (98%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
The memory usage is near 100%.
Getting information about specific goodbye pod that is in Pending state will yield some more information $ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x3 over 53s) default-scheduler 0/3 nodes are available: 1 Insufficient memory, 2 node(s) had taints that the pod didn't tolerate.
Goodbye pod was not created because there were not enough resources that could be satisfied. But there still was some left resources for hello pods.
There is possibility for a situation that will kill lower priority pods and schedule higher priority pods.
Change the requested memory for goodbye pod to 2304Mi. It will allow scheduler to assign of all required pods (3):
resources:
requests:
memory: "2304Mi"
You can delete the previous deployment and apply new one with memory parameter changed.
Invoke command: $ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
goodbye 3/3 3 3 5m59s
hello 3/10 10 3 48m
As you can see all of the goodbye pods are available.
Hello pods got reduced to make space for pods with higher priority (goodbye).

Pods not replaced if MaxUnavailable set to 0 in Kubernetes

I want rollback deployment for my pods. I'm updating my pod using set Image in a CI environment. When I set maxUnavailable on Deployment/web file to 1, I get downtime. but when I set maxUnavailable to 0, The pods doesnot get replaced and container / app is not restarted.
Also I Have a single Node in Kubernetes cluster and Here's its info
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
881m (93%) 396m (42%) 909712Ki (33%) 1524112Ki (56%)
Events: <none>
Here's the complete YAML file. I do have readiness Probe set.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "10"
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
convert
kompose.version: 1.14.0 (fa706f2)
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{"kompose.cmd":"C:\\ProgramData\\chocolatey\\lib\\kubernetes-kompose\\tools\\kompose.exe convert","kompose.version":"1.14.0 (fa706f2)"},"creationTimestamp":null,"labels":{"io.kompose.service":"dev-web"},"name":"dev-web","namespace":"default"},"spec":{"replicas":1,"strategy":{},"template":{"metadata":{"labels":{"io.kompose.service":"dev-web"}},"spec":{"containers":[{"env":[{"name":"JWT_KEY","value":"ABCD"},{"name":"PORT","value":"2000"},{"name":"GOOGLE_APPLICATION_CREDENTIALS","value":"serviceaccount/quick-pay.json"},{"name":"mongoCon","value":"mongodb://quickpayadmin:quickpay1234#ds121343.mlab.com:21343/quick-pay-db"},{"name":"PGHost","value":"173.255.206.177"},{"name":"PGUser","value":"postgres"},{"name":"PGDatabase","value":"quickpay"},{"name":"PGPassword","value":"z33shan"},{"name":"PGPort","value":"5432"}],"image":"gcr.io/quick-pay-208307/quickpay-dev-node:latest","imagePullPolicy":"Always","name":"dev-web-container","ports":[{"containerPort":2000}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/","port":2000,"scheme":"HTTP"},"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"20m"}}}]}}}}
creationTimestamp: 2018-12-24T12:13:48Z
generation: 12
labels:
io.kompose.service: dev-web
name: dev-web
namespace: default
resourceVersion: "9631122"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/web
uid: 5e66f7b3-0775-11e9-9653-42010a80019d
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
io.kompose.service: web
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: web
spec:
containers:
- env:
- name: PORT
value: "2000"
image: gcr.io/myimagepath/web-node
imagePullPolicy: Always
name: web-container
ports:
- containerPort: 2000
protocol: TCP
readinessProbe:
failureThreshold: 10
httpGet:
path: /
port: 2000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 10m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2019-01-03T05:49:46Z
lastUpdateTime: 2019-01-03T05:49:46Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-12-24T12:13:48Z
lastUpdateTime: 2019-01-03T06:04:24Z
message: ReplicaSet "dev-web-7bd498fc74" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 12
readyReplicas: 2
replicas: 2
updatedReplicas: 2
I've tried with 1 replica and it still doesnot work.
In first scenario, Kubernetes deletes one pod (maxUnavailable: 1) and started the pod with new image and waits for ~110 seconds(based on your readiness probe) to check if new pod is able to serve request. New pod isn't able to serve requests but the pod is in running state and hence it delete the second old pod and started it with new image and again second pod waits for the readiness probe to complete. This is the reason there is some time in between where both the containers are not ready to serve request and hence the downtime.
In second scenario, where you have maxUnavailable:0, Kubernetes first brings up the pod with new image and it isn't able to serve the request in ~110 seconds(based on your readiness probe) and hence it times out and deletes the new pod with new image. Same happens with the second pod. Hence both your pod do not get updated
So the reason is that you are not giving enough time to your application to come up and start serving requests. You can increase the value of failureThreshold in your readiness probe and maxUnavailable: 0, it will work.

How to change running pods limits in Kubernetes?

I have a self made Kubernetes cluster consisting of VMs. My problem is, that the coredns pods are always go in CrashLoopBackOff state, and after a while they go back to Running as nothing happened.. One solution that I found and could not try yet, is changing the default memory limit from 170Mi to something higher. As I'm not an expert in this, I thought this is not a hard thing, but I don't know how to change a running pod's configuration. It may be impossible, but there must be a way to recreate them with new configuration. I tried with kubectl patch, and looked up rolling-update too, but I just can't figure it out. How can I change the limit?
Here is the relevant part of the pod's data:
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 176.16.0.12/32
creationTimestamp: 2018-11-18T10:29:53Z
generateName: coredns-78fcdf6894-
labels:
k8s-app: kube-dns
pod-template-hash: "3497892450"
name: coredns-78fcdf6894-gnlqw
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: coredns-78fcdf6894
uid: e3349719-eb1c-11e8-9000-080027bbdf83
resourceVersion: "73564"
selfLink: /api/v1/namespaces/kube-system/pods/coredns-78fcdf6894-gnlqw
uid: e34930db-eb1c-11e8-9000-080027bbdf83
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: k8s.gcr.io/coredns:1.1.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
EDIT:
It turned out, that in Ubuntu the Network Manager's dnsmasq drives the Corends pods crazy, so in /etc/NetworkManager/NetworkManager.conf I commented out the dnsmasq line, reboot and everything is okay.
You must edit coredns pod's template in coredns deployment definition:
kubectl edit deployment -n kube-system coredns
Once your default editor is opened with coredns deployment, in the templateSpec you will find part which is responsible for setting memory and cpu limits.

Kubernetes - Old pod not being deleted after update

I am using Deployments to control my pods in my K8S cluster.
My original deployment file looks like :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: websocket-backend-deployment
spec:
replicas: 2
selector:
matchLabels:
name: websocket-backend
template:
metadata:
labels:
name: websocket-backend
spec:
containers:
- name: websocket-backend
image: armdock.se/proj/websocket_backend:3.1.4
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
port: 8080
path: /websocket/health
initialDelaySeconds: 300
timeoutSeconds: 30
readinessProbe:
httpGet:
port: 8080
path: /websocket/health
initialDelaySeconds: 25
timeoutSeconds: 5
This config is working as planned.
# kubectl get po | grep websocket
websocket-backend-deployment-4243571618-mreef 1/1 Running 0 31s
websocket-backend-deployment-4243571618-qjo6q 1/1 Running 0 31s
Now I plan to do a live/rolling update on the image file.
The command that I am using is :
kubectl set image deployment websocket-backend-deployment websocket-backend=armdock.se/proj/websocket_backend:3.1.5
I am only updating the docker image tag.
Now im expecting for my pods to remain 2 after the update. I am getting the 2 new pods with the new version but there is one pod that still exists carrying the old version.
# kubectl get po | grep websocket
websocket-backend-deployment-4243571618-qjo6q 1/1 Running 0 2m
websocket-backend-deployment-93242275-kgcmw 1/1 Running 0 51s
websocket-backend-deployment-93242275-kwmen 1/1 Running 0 51s
As you can see, 1 pod uses the old tag 3.1.4
# kubectl describe po websocket-backend-deployment-4243571618-qjo6q | grep Image:
Image: armdock.se/proj/websocket_backend:3.1.4
The rest of the 2 nodes are on the new tag 3.1.5.
# kubectl describe po websocket-backend-deployment-93242275-kgcmw | grep Image:
Image: armdock.se/proj/websocket_backend:3.1.5
# kubectl describe po websocket-backend-deployment-93242275-kwmen | grep Image:
Image: armdock.se/proj/websocket_backend:3.1.5
Why does 1 old pod still stay there and doesnt get deleted ? Am I missing some config ?
When I check the rollout command, its just stuck on :
# kubectl rollout status deployment/websocket-backend-deployment
Waiting for rollout to finish: 1 old replicas are pending termination...
My K8S version is :
# kubectl --version
Kubernetes v1.5.2
I would suggest you to set the maxSurge to 0 in the RollingUpdate strategy to make the desired pods same after the rollout . The maxSurge parameter is the maximum number of pods that can be scheduled above the original number of pods.
Example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: webserver
spec:
replicas: 2
selector:
matchLabels:
name: webserver
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
labels:
name: webserver
spec:
containers:
- name: webserver
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
Maybe k8s can't distinguish the images and treat them like are different. Check if you are fast-forwarding your commits or if the hash of the last commit in the branch which are you deploying is different from the last hash of the commit you actually did