Delay kubernetes pod creation for zero downtime - kubernetes

I am trying to implement the Rolling update of deployments in Kubernetes. I have followed a lot of articles that say that there would be zero downtime but when I run curl continuously. A couple of my requests failed before getting a response back. Below is the deployment file.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: my-image
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
The next thing I did was added
MinReadySeconds: 120
This takes care of this issue but it is not an optimum solution since we want to switch to the next pod as soon as it starts servicing requests and kill the old pod. I have two questions -
Can there be a condition when both the pods - the new and the old are
running and both start servicing the traffic? That would also not be
ideal as well. Since we want only one pod to service the request at a
time.
Is there any other out of the box solution that Kubernetes provides
to do a rolling deployment?

Try this. this should work for you . try doing a update of your image.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
For your better understanding check this link

Related

GKE "no.scale.down.node.pod.not.enough.pdb" log even with existing PDB

My GKE cluster is displaying "Scale down blocked by pod" note, and clicking it then going to the Logs Explorer it shows a filtered view with log entries for the pods that had the incident: no.scale.down.node.pod.not.enough.pdb . But that's really strange since the pods on the log entries having that message do have PDB defined for them. So it seems to me that GKE is wrongly reporting the cause of the blocking of the node scale down. These are the manifests for one of the pods with this issue:
apiVersion: v1
kind: Service
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: ms-new-api-beta
type: NodePort
The Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
selector:
matchLabels:
app: ms-new-api-beta
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
labels:
app: ms-new-api-beta
spec:
containers:
- command:
- /deploy/venv/bin/gunicorn
- '--bind'
- '0.0.0.0:8000'
- 'newapi.app:app'
- '--chdir'
- /deploy/app
- '--timeout'
- '7200'
- '--workers'
- '1'
- '--worker-class'
- uvicorn.workers.UvicornWorker
- '--log-level'
- DEBUG
env:
- name: ENV
value: BETA
image: >-
gcr.io/.../api:${trigger['tag']}
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 30
name: ms-new-api-beta
ports:
- containerPort: 8000
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
resources:
limits:
cpu: 150m
requests:
cpu: 100m
startupProbe:
failureThreshold: 30
httpGet:
path: /rest
port: 8000
periodSeconds: 120
imagePullSecrets:
- name: gcp-docker-registry
The Horizontal Pod Autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ms-new-api-beta
namespace: beta
spec:
maxReplicas: 5
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ms-new-api-beta
targetCPUUtilizationPercentage: 100
And finally, the Pod Disruption Budget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ms-new-api-beta
namespace: beta
spec:
minAvailable: 0
selector:
matchLabels:
app: ms-new-api-beta
no.scale.down.node.pod.not.enough.pdb is not complaining about the lack of a PDB. It is complaining that, if the pod is scaled down, it will be in violation of the existing PDB(s).
The "budget" is how much disruption the Pod can permit. The platform will not take any intentional action which violates that budget.
There may be another PDB in place that would be violated. To check, make sure to review pdbs in the pod's namespace:
kubectl get pdb

Readiness probe failing when second pod gets scheduled to the same node

I have a k8s service which maps to pod deployment with 2 replicas and is exposed as clusterIp service. I am seeing an issue when the 2nd pod gets scheduled to the same node the readiness probe (http call to an api in container port) is failing with "unable to connect error" . Is this due to some port conflict?
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment
namespace: demo
labels:
app: demo
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: demo
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/app-configmap.yaml") . | sha256sum }}
labels:
app: demo
spec:
containers:
- name: demo
image: demo-app-image:1.0.1
ports:
- containerPort: 8081
livenessProbe:
httpGet:
path: /healthcheck
port: 8081
initialDelaySeconds: 30
periodSeconds: 60
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 15
readinessProbe:
httpGet:
path: /healthcheck
port: 8081
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 15
volumeMounts:
- name: config-volume
mountPath: /config/app
volumes:
- name: config-volume
configMap:
name: demo-configmap
items:
- key: config
path: config.json
nodeSelector:
usage: demo-server
Service
apiVersion: v1
kind: Service
metadata:
name: demo-service
namespace: demo
labels:
app: demo-service
spec:
selector:
app: demo
ports:
- name: admin-port
protocol: TCP
port: 26001
targetPort: 8081

Readiness probe failing when second pod gets scheduled to the same container in k8s

I have a k8s service which maps to pod deployment with 2 replicas and is exposed as clusterIp service. I am seeing an issue when the 2nd pod gets scheduled to the same node the readiness probe (http call to an api in container port) is failing with "unable to connect error" . Is this due to some port conflict?
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment
namespace: demo
labels:
app: demo
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: demo
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/app-configmap.yaml") . | sha256sum }}
labels:
app: demo
spec:
containers:
- name: demo
image: demo-app-image:1.0.1
ports:
- containerPort: 8081
livenessProbe:
httpGet:
path: /healthcheck
port: 8081
initialDelaySeconds: 30
periodSeconds: 60
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 15
readinessProbe:
httpGet:
path: /healthcheck
port: 8081
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 15
volumeMounts:
- name: config-volume
mountPath: /config/app
volumes:
- name: config-volume
configMap:
name: demo-configmap
items:
- key: config
path: config.json
nodeSelector:
usage: demo-server

GKE not recognising jfrog docker image

I am using GKE and Jfrog artifactory. I am building image with tag like
cicd-docker-local.jfrog.io/stage_proj:50d3afd0
If i see artifactory i can see the image in https://cicd.jfrog.io/cicd/webapp/ which is right. But my GKE is not able to recognise the image and it throws error like
couldn't parse image reference "'cicd-docker-local.jfrog.io/stage_proj:50d3afd0'": invalid reference format: InvalidImageName
But my image exists. Is there any problem with my image and name.
Deployment file portion:
containers:
-
image: "<IMAGE_NAME>"
In yaml file
- sed -i "s%<IMAGE_NAME>%'${STAGE_CONTAINER_IMAGE}'%g" deployment.yaml
STAGE_CONTAINER_IMAGE = cicd-docker-local.jfrog.io/stage_proj:50d3afd0
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: go
name: hello-world-go
spec:
progressDeadlineSeconds: 60
replicas: 3
selector:
matchLabels:
app: go
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 33%
type: RollingUpdate
template:
metadata:
labels:
app: go
spec:
containers:
-
image: "<IMAGE_NAME>"
# image: cicd-docker-local.jfrog.io/stage_proj: 50d3afd0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 2
name: go
ports:
-
containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 2
periodSeconds: 2
If i use and sed command, i get error in kubernetes. But if i use cicd-docker-local.jfrog.io/stage_proj: 50d3afd0 directly, there is no error. Am i doing SED command wrongly?

Kubernetes and readinessprobe

I have specified a deployment in kubernetes :
apiVersion: v1
kind: List
items:
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: quote-data
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
run: quote-data
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
run: quote-data
spec:
containers:
- image: quote-data:2
imagePullPolicy: Always
name: quote-data
readinessProbe:
httpGet:
path: /api/data/health
port: 5000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1
successThreshold: 1
ports:
- containerPort: 5000
protocol: TCP
Can we use the readinessprobe to disable the traffic for this pod (few minutes after the start phase) ?
In my test, the service (type: LoadBalancer) continue to send the traffic on the bad pod even if the readinessprobe is failed. In the service, the Ready status of the pod = false but the service still continues to send on the wrong pod
UPDATE :
it seems work with a curl but not with Postman.
I have a pods different between two request with a simple curl command.
I have always the same pod with Postman
Yes, the traffic will be disabled if you put initialDelaySeconds: xxx on readinessProbe. For this is not working on all Kubernetes version, for sure not on 1.15 and bellow.