Why is this Kubernetes pod not triggering our autoscaler to scale up? - kubernetes

We're running a Kubernetes cluster that has an autoscaler, which, as far as I can tell works perfectly most of the time. When we change the replica count of a given deployment that would exceed the resources of our cluster, the autoscaler catches it and scales up. Likewise we get a scale down if we need fewer resources.
That is until today when some of the pods our Airflow deployment stopped working because they can't get the resources required. Rather than triggering a cluster scale up, the pods immediately fail or are evicted for trying to ask for or use more resources than are available. See the YAML output of one of the failing pods below. The pods also never appear as Pending: they skip immediately from launch to their failed state.
Is there something that I'm missing in terms of some kind of retry tolerance that would trigger the pod to be pending and thus wait for a scale up?
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2019-12-02T22:41:19Z"
name: ingest-customer-ff06ae4d
namespace: airflow
resourceVersion: "32545690"
selfLink: /api/v1/namespaces/airflow/pods/ingest-customer-ff06ae4d
uid: dba8b4c1-1554-11ea-ac6b-12ff56d05229
spec:
affinity: {}
containers:
- args:
- scripts/fetch_and_run.sh
env:
- name: COMPANY
value: acme
- name: ENVIRONMENT
value: production
- name: ELASTIC_BUCKET
value: customer
- name: ELASTICSEARCH_HOST
value: <redacted>
- name: PATH_TO_EXEC
value: tools/storage/store_elastic.py
- name: PYTHONWARNINGS
value: ignore:Unverified HTTPS request
- name: PATH_TO_REQUIREMENTS
value: tools/requirements.txt
- name: GIT_REPO_URL
value: <redacted>
- name: GIT_COMMIT
value: <redacted>
- name: SPARK
value: "true"
image: dkr.ecr.us-east-1.amazonaws.com/spark-runner:dev
imagePullPolicy: IfNotPresent
name: base
resources:
limits:
memory: 28Gi
requests:
memory: 28Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/ssd
name: tmp-disk
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-cgpcc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: tmp-disk
- name: default-token-cgpcc
secret:
defaultMode: 420
secretName: default-token-cgpcc
status:
conditions:
- lastProbeTime: "2019-12-02T22:41:19Z"
lastTransitionTime: "2019-12-02T22:41:19Z"
message: '0/9 nodes are available: 9 Insufficient memory.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable

Related

terminationGracePeriodSeconds not shown in kubectl describe result

Creating a Pod with spec terminationGracePeriodSeconds specified, I can't check whether this spec has been applied successfully using kubectl describe. How can I check whether terminationGracePeriodSeconds option has been successfully applied? I'm running kubernetes version 1.19.
apiVersion: v1
kind: Pod
metadata:
name: mysql-client
spec:
serviceAccountName: test
terminationGracePeriodSeconds: 60
containers:
- name: mysql-cli
image: blah
command: ["/bin/sh", "-c"]
args:
- sleep 2000
restartPolicy: OnFailure
Assuming the pod is running successfully. You should be able to see the settings in the manifest.
terminationGracePeriodSeconds is available in v1.19 as per the following page. Search for "terminationGracePeriodSeconds" here.
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/
Now try this command:
kubectl get pod mysql-client -o yaml | grep terminationGracePeriodSeconds -a10 -b10
How can I check whether terminationGracePeriodSeconds option has been successfully applied?
First of all, you need to make sure your pod has been created correctly. I will show you this on an example. I have deployed very simple pod by following yaml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
terminationGracePeriodSeconds: 60
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
Then I run the command kubectl get pods:
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 9m1s
Everything is fine.
I can't check whether this spec has been applied successfully using kubectl describe.
That is also correct, because this command doesn't return us information about termination grace period. To find this information you need to run kubectl get pod <your pod name> command. The result will be similar to below:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"containers":[{"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}]}],"terminationGracePeriodSeconds":60}}
creationTimestamp: "2022-01-11T11:34:58Z"
name: nginx
namespace: default
resourceVersion: "57260566"
uid: <MY-UID>
spec:
containers:
- image: nginx:1.14.2
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: <name>
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: <my-node-name>
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 60
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-nj88r
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-01-11T11:35:01Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-01-11T11:35:07Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-01-11T11:35:07Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-01-11T11:35:01Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://<ID>
image: docker.io/library/nginx:1.14.2
imageID: docker.io/library/nginx#sha256:<sha256>
lastState: {}
name: nginx
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-01-11T11:35:06Z"
hostIP: <IP>
phase: Running
podIP: <IP>
podIPs:
- ip: <IP>
qosClass: BestEffort
startTime: "2022-01-11T11:35:01Z"
The most important part will be here:
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"containers":[{"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}]}],"terminationGracePeriodSeconds":60}}
and here
terminationGracePeriodSeconds: 60
At this moment you are sure that terminationGracePeriodSeconds is applied successfully.

Kubernetes Pod Termination doesn't happen immediately, have to wait until grace period expires

I have a helm chart that has one deployment/pod and one service. I set the deployment terminationGracePeriodSeconds to 300s.
I didn't have any pod lifecycle hook, so if I terminate the pod, the pod should terminate immediately. However, now the pod will determine until my grace period ends!
Below is the deployment template for my pod:
$ kubectl get pod hpa-poc---jcc-7dbbd66d86-xtfc5 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
creationTimestamp: "2021-02-01T18:12:34Z"
generateName: hpa-poc-jcc-7dbbd66d86-
labels:
app.kubernetes.io/instance: hpa-poc
app.kubernetes.io/name: -
pod-template-hash: 7dbbd66d86
name: hpa-poc-jcc-7dbbd66d86-xtfc5
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: hpa-poc-jcc-7dbbd66d86
uid: 66db29d8-9e2d-4097-94fc-b0b827466e10
resourceVersion: "127938945"
selfLink: /api/v1/namespaces/default/pods/hpa-poc-jcc-7dbbd66d86-xtfc5
uid: 82ed4134-95de-4093-843b-438e94e408dd
spec:
containers:
- env:
- name: _CONFIG_LINK
value: xxx
- name: _USERNAME
valueFrom:
secretKeyRef:
key: username
name: hpa-jcc-poc
- name: _PASSWORD
valueFrom:
secretKeyRef:
key: password
name: hpa-jcc-poc
image: xxx
imagePullPolicy: IfNotPresent
name: -
resources:
limits:
cpu: "2"
memory: 8Gi
requests:
cpu: 500m
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-hzmwh
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: xxx
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-hzmwh
secret:
defaultMode: 420
secretName: default-token-hzmwh
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2021-02-01T18:12:34Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2021-02-01T18:12:36Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2021-02-01T18:12:36Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2021-02-01T18:12:34Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://c4c969ec149f43ff4494339930c8f0640d897b461060dd810c63a5d1f17fdc47
image: xxx
imageID: xxx
lastState: {}
name: -
ready: true
restartCount: 0
state:
running:
startedAt: "2021-02-01T18:12:35Z"
hostIP: 10.0.35.137
phase: Running
podIP: 10.0.21.35
qosClass: Burstable
startTime: "2021-02-01T18:12:34Z"
When I tried to terminate the pod (I used helm delete command), you can see in the time, it terminated after 5 min which is the grace period time.
$ helm delete hpa-poc
release "hpa-poc" uninstalled
$ kubectl get pod -w | grep hpa
hpa-poc-jcc-7dbbd66d86-xtfc5 1/1 Terminating 0 3h10m
hpa-poc-jcc-7dbbd66d86-xtfc5 0/1 Terminating 0 3h15m
hpa-poc-jcc-7dbbd66d86-xtfc5 0/1 Terminating 0 3h15m
So I suspect it's something for my pod/container configuration issue. Because I have tried with the other simple Java App deployment and it can terminate immediately once I terminate the pod.
BTW, I am using AWS EKS Cluster. Not sure its AWS specific as well.
So any suggestions?
I find the issue. When I exec into the container, I noticed there is one process running, which is the tailing log process.
So, I need to kill the process and add that into the prestop hook. After that, my container can shut down immediately.

Pod is in pending stage ( Error : FailedScheduling : nodes didn't match node selector )

I have a problem with one of the pods. It says that it is in a pending state.
If I describe the pod, this is what I can see:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 1m (x58 over 11m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector
Warning FailedScheduling 1m (x34 over 11m) default-scheduler 0/6 nodes are available: 6 node(s) didn't match node selector.
If I check the logs, there is nothing in there (it just outputs empty value).
--- Update ---
This is my pod yaml file
apiVersion: v1
kind: Pod
metadata:
annotations:
checksum/config: XXXXXXXXXXX
checksum/dashboards-config: XXXXXXXXXXX
creationTimestamp: 2020-02-11T10:15:15Z
generateName: grafana-654667db5b-
labels:
app: grafana-grafana
component: grafana
pod-template-hash: "2102238616"
release: grafana
name: grafana-654667db5b-tnrlq
namespace: monitoring
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: grafana-654667db5b
uid: xxxx-xxxxx-xxxxxxxx-xxxxxxxx
resourceVersion: "98843547"
selfLink: /api/v1/namespaces/monitoring/pods/grafana-654667db5b-tnrlq
uid: xxxx-xxxxx-xxxxxxxx-xxxxxxxx
spec:
containers:
- env:
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
key: xxxx
name: grafana
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
key: xxxx
name: grafana
- name: GF_INSTALL_PLUGINS
valueFrom:
configMapKeyRef:
key: grafana-install-plugins
name: grafana-config
image: grafana/grafana:5.0.4
imagePullPolicy: Always
name: grafana
ports:
- containerPort: 3000
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
resources:
requests:
cpu: 200m
memory: 100Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/grafana
name: config-volume
- mountPath: /var/lib/grafana/dashboards
name: dashboard-volume
- mountPath: /var/lib/grafana
name: storage-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-tqb6j
readOnly: true
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- cp /tmp/config-volume-configmap/* /tmp/config-volume 2>/dev/null || true; cp
/tmp/dashboard-volume-configmap/* /tmp/dashboard-volume 2>/dev/null || true
image: busybox
imagePullPolicy: Always
name: copy-configs
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp/config-volume-configmap
name: config-volume-configmap
- mountPath: /tmp/dashboard-volume-configmap
name: dashboard-volume-configmap
- mountPath: /tmp/config-volume
name: config-volume
- mountPath: /tmp/dashboard-volume
name: dashboard-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-tqb6j
readOnly: true
nodeSelector:
nodePool: cluster
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: config-volume
- emptyDir: {}
name: dashboard-volume
- configMap:
defaultMode: 420
name: grafana-config
name: config-volume-configmap
- configMap:
defaultMode: 420
name: grafana-dashs
name: dashboard-volume-configmap
- name: storage-volume
persistentVolumeClaim:
claimName: grafana
- name: default-token-tqb6j
secret:
defaultMode: 420
secretName: default-token-tqb6j
status:
conditions:
- lastProbeTime: 2020-02-11T10:45:37Z
lastTransitionTime: 2020-02-11T10:15:15Z
message: '0/6 nodes are available: 6 node(s) didn''t match node selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
Do you know how should I further debug this?
Solution : You can do one of the two things to allow scheduler to fullfil your pod creation request.
you can choose to remove these lines from your pod yaml and start your pod creation again from scratch (if you need a selector for a reason go for approach as on next step 2)
nodeSelector:
nodePool: cluster
or
You can ensure that you add this nodePool: cluster as label to all your nodes so the pod will be scheduled by using the available selector.
You can use this command to label all nodes
kubectl label nodes <your node name> nodePool=cluster
Run above command by replacing node name from your cluster details for each node or only the nodes you want to be select with this label.
Your pod probably uses a node selector which can not fulfilled by scheduler.
Check pod description for something like that
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
...
nodeSelector:
disktype: ssd
And check whether your nodes are labeled accordingly.
The simplest option would be to use "nodeName" in the Pod yaml.
First, get the node where you want to run the Pod:
kubectl get nodes
Use the below attribute inside the Pod definition( yaml) so that the Pod is forced to run under the below mentioned node only.
nodeName: seliiuvd05714

How can I see my failing jobs with Kubernetes

I have an issue with a Job in kubernetes. When I want to debug, I do:
kubectl describe job -n influx pamela-1578898800
Name: pamela-1578898800
Namespace: influx
Selector: controller-uid=xxx
Labels: controller-uid=xxx
job-name=pamela-1578898800
Annotations: <none>
Controlled By: CronJob/pamela
Parallelism: 1
Completions: 1
Start Time: Mon, 13 Jan 2020 08:00:04 +0100
Pods Statuses: 0 Running / 0 Succeeded / 5 Failed
Pod Template:
Labels: controller-uid=53110b24-35d2-11ea-bca1-06ecc706e86a
job-name=pamela-1578898800
Containers:
pamela:
Image: registry.gitlab.com/xxx/pamela:latest
Port: <none>
Host Port: <none>
Limits:
cpu: 800m
memory: 1000Mi
Requests:
cpu: 800m
memory: 1000Mi
Environment Variables from:
pamela-env Secret Optional: false
Environment: <none>
Mounts:
/config from pamela-keys (rw)
/log from pamela-claim (rw,path="log")
/raw from pamela-claim (rw,path="raw")
Volumes:
pamela-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pamela-claim
ReadOnly: false
pamela-keys:
Type: Secret (a volume populated by a Secret)
SecretName: pamela-keys
Optional: false
Events: <none>
Here you can see 5 failing pods, but I don't know how to see the logs of failing pods.
When I do :
kubectl get po -A
I have no pods "pamela-xxx"
How should I see the issue ?
EDIT:
Here are the scripts I use to trigger job.
# job.yaml - THIS ONE WORKS, LAUNCHING MANUALLY
apiVersion: batch/v1
kind: Job
metadata:
name: pamela-singlerun
namespace: influx
spec:
template:
spec:
containers:
- image: registry.gitlab.com/xxx/pamela:latest
envFrom:
- secretRef:
name: pamela-env
name: pamela
volumeMounts:
- mountPath: /raw
name: pamela-claim
subPath: raw
- mountPath: /log
name: pamela-claim
subPath: log
- mountPath: /config
name: pamela-keys
restartPolicy: Never
volumes:
- name: pamela-claim
persistentVolumeClaim:
claimName: pamela-claim
- name: pamela-keys
secret:
secretName: pamela-keys
items:
- key: keys.yml
path: keys.yml
nodeSelector:
kops.k8s.io/instancegroup: pamela-nodes
imagePullSecrets:
- name: gitlab-registry
And cronjob.yml, THIS ONE DOESN'T WORK
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: pamela
namespace: influx
spec:
schedule: "0 7,19 * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
template:
spec:
containers:
- image: registry.gitlab.com/xxx/pamela:latest
envFrom:
- secretRef:
name: pamela-env
name: pamela
resources:
limits:
cpu: 800m
memory: 1000Mi
requests:
cpu: 800m
memory: 1000Mi
volumeMounts:
- mountPath: /raw
name: pamela-claim
subPath: raw
- mountPath: /log
name: pamela-claim
subPath: log
- mountPath: /config
name: pamela-keys
restartPolicy: Never
volumes:
- name: pamela-claim
persistentVolumeClaim:
claimName: pamela-claim
- name: pamela-keys
secret:
secretName: pamela-keys
items:
- key: keys.yml
path: keys.yml
nodeSelector:
kops.k8s.io/instancegroup: pamela-nodes
imagePullSecrets:
- name: gitlab-registry
EDIT 2: After running cron each 10 minutes, I can see my jobs, and I get expected results ( means it works )
pamela-1578992400-ppgtx 0/1 Completed 0 21m
pamela-1578993000-kn8nd 0/1 Completed 0 11m
But when right after this, I get:
Error from server (NotFound): pods "pamela-1578992400-ppgtx" not found
when trying to get logs, means that ttl should be 10 min. When trying to increase ttl, I get a feature-gates disabled issue. checking how to fix it
It is weird, after setting cron job each 10 min, I get:
➜ kubectl get jobs -n influx
NAME COMPLETIONS DURATION AGE
pamela-1578898800 0/1 32h 32h
pamela-1579007400 1/1 99s 159m
pamela-1579011000 1/1 97s 99m
pamela-1579014600 1/1 108s 39m
I use: schedule: "10 * * * *"
Don't understand what's going on here...

Kubernetes deployment does not perform a rolling update when using a single replica

I modified the deployment config (production.yaml), changing the container image value.
I then ran this: kubectl replace -f production.yaml.
While this occurred, my service did not appear to be responding, in addition:
kubectl get pods:
wordpress-2105335096-dkrvg 3/3 Running 0 47s
a while later... :
wordpress-2992233824-l4287 3/3 Running 0 14s
a while later... :
wordpress-2992233824-l4287 0/3 ContainerCreating 0 7s
It seems it has terminated the previous pod before the new pod is Running... Why?
produciton.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
replicas: 1
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
terminationGracePeriodSeconds: 30
containers:
- image: eu.gcr.io/abcxyz/wordpress:deploy-1502463532
name: wordpress
imagePullPolicy: "Always"
env:
- name: WORDPRESS_HOST
value: localhost
- name: WORDPRESS_DB_USERNAME
valueFrom:
secretKeyRef:
name: cloudsql-db-credentials
key: username
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /var/www/html
- image: eu.gcr.io/abcxyz/nginx:deploy-1502463532
name: nginx
imagePullPolicy: "Always"
ports:
- containerPort: 80
name: nginx
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /var/www/html
- image: gcr.io/cloudsql-docker/gce-proxy:1.09
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=abcxyz:europe-west1:wordpressdb2=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: ssl-certs
mountPath: /etc/ssl/certs
- name: cloudsql
mountPath: /cloudsql
volumes:
- name: wordpress-persistent-storage
gcePersistentDisk:
pdName: wordpress-disk
fsType: ext4
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: cloudsql
emptyDir:
I believe this behaviour is correct according to the Kubernetes documentation. Assuming you specify n replicas for a deployment, the following steps will be taken by Kubernetes when updating a deployment:
Terminate old pods, while ensuring that at least n - 1 total pods are up
Create new pods until a maximum of n + 1 total pods are up
As soon as new pods are up, go back to step 1 until n new pods are up
In your case n = 1, which means that in the first step, all old pods will be terminated.
See Updating a Deployment for more information:
Deployment can ensure that only a certain number of Pods may be down while they are being updated. By default, it ensures that at least 1 less than the desired number of Pods are up (1 max unavailable).
Deployment can also ensure that only a certain number of Pods may be created above the desired number of Pods. By default, it ensures that at most 1 more than the desired number of Pods are up (1 max surge).
In a future version of Kubernetes, the defaults will change from 1-1 to 25%-25%.