How to enable startup probe on GKE 1.16? - kubernetes

I created a deployment with liveness and readiness probes and initial delay which works fine. If I want to replace the initial delay with a startup probe the startupProbe key and its nested elements are never included in the deployment descrioptor when created with kubectl apply and get deleted from the deployment yaml in the GKE deployment editor after saving.
An example:
apiVersion: v1
kind: Namespace
metadata:
name: "test"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-sleep
namespace: test
spec:
selector:
matchLabels:
app: postgres-sleep
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 50%
template:
metadata:
labels:
app: postgres-sleep
spec:
containers:
- name: postgres-sleep
image: krichter/microk8s-startup-probe-ignored:latest
ports:
- name: postgres
containerPort: 5432
readinessProbe:
tcpSocket:
port: 5432
periodSeconds: 3
livenessProbe:
tcpSocket:
port: 5432
periodSeconds: 3
startupProbe:
tcpSocket:
port: 5432
failureThreshold: 60
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: postgres-sleep
namespace: test
spec:
selector:
app: httpd
ports:
- protocol: TCP
port: 5432
targetPort: 5432
---
with krichter/microk8s-startup-probe-ignored:latest being
FROM postgres:11
CMD sleep 30 && postgres
I'm reusing this example from the same issue with microk8s where I could solve it by changing the kubelet and kubeapi-server configuration files (see https://github.com/ubuntu/microk8s/issues/770 in case you're interested). I assume this is not possible with GKE clusters as they don't expose these files, probably for good reasons.
I assume that the feature needs to be enable since it's behind a feature gate. How can I enable it on Google Kubernetes Engine (GKE) clusters with version >= 1.16? Currently I'm using the default from the regular channel 1.16.8-gke.15.

As I mentioned in my comments, I was able to reproduce the same behavior in my test environment, and after some researches I found the reason.
In GKE, features gates are only permitted if you are using an Alpha Cluster. You can see a complete list of feature gates here
I've created an alpha cluster and applied the same yaml, it works for me, the startupProbe is there in the place.
So, you will only be able to use startupProbe in a GKE Alpha clusters, follow this documentation to create a new one.
Be aware of the limitations in alpha clusters:
Alpha clusters have the following limitations:
Not covered by the GKE SLA
Cannot be upgraded
Node auto-upgrade and auto-repair are disabled on alpha clusters
Automatically deleted after 30 days
Do not receive security updates
Also, Google don't recommend use for production workloads:
Warning: Do not use Alpha clusters or alpha features for production workloads. Alpha clusters expire after thirty days and do not receive security updates. You must migrate your data from alpha clusters before they expire. GKE does not automatically save data stored on alpha clusters.

Related

Kubernetes service routes traffic to only one of 5 pods

i'm playing around with k8s services. I have created simple Spring Boot app, that display it's version number and pod name when curling endpoint:
curl localhost:9000/version
1.3_car-registry-deployment-66684dd8c4-r274b
Then i dockerized it, pushed into my local Kind cluster and deployed with 5 replicas. Next I created service targeting all 5 pods. Lastly, i exposed service like so:
kubectl port-forward svc/car-registry-service 9000:9000
Now when curling my endpoint i expected to see randomly picked pod names, but instead I only get responses from single pod. Moreover, if i kill that one pod then my service stops working, ie i'm getting ERR_EMPTY_RESPONSE, even though there are 4 more pods available. What am I missing? Here's my deployment and service yamls:
apiVersion: apps/v1
kind: Deployment
metadata:
name: car-registry-deployment
spec:
replicas: 5
selector:
matchLabels:
app: car-registry
template:
metadata:
name: car-registry
labels:
app: car-registry
spec:
containers:
- name: car-registry
image: car-registry-database:v1.3
ports:
- containerPort: 9000
protocol: TCP
name: rest
readinessProbe:
exec:
command:
- sh
- -c
- curl http://localhost:9000/healthz | grep "OK"
initialDelaySeconds: 15
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: car-registry-service
spec:
type: ClusterIP
selector:
app: car-registry
ports:
- protocol: TCP
port: 9000
targetPort: 9000
You’re using TCP, so you’re probably using keep-alive. Try to hit it with your browser or a new tty.
Try:
curl -H "Connection: close" http://your-service:port/path
Else, check kube-proxy logs to see if there’s any additional info. Your initial question doesn’t provide much detail.

GKE sticky connections makes autoscaling uneffective because of limited pod ports (API to database)

I have an API to which I send requests, and this API connects to MongoDB through MongoClient in PyMongo. Here is a scheme of my system that I deployed in GKE:
The major part of the calculations needed for each request are made in the MongoDB, so I want the MongoDB pods to be autoscaled based on CPU usage. Thus I have an HPA for the MongoDB deployment, with minReplicas: 1.
When I send many requests to the Nginx Ingress, I see that my only MongoDB pod has 100% CPU usage, so the HPA creates a second pod. But this second pod isn't used.
After looking in the logs of my first MongoDB pod, I see that all the requests have this :
"remote":"${The_endpoint_of_my_API_Pod}:${PORT}", and the PORT only takes 12 different values (I counted them, they started repeating so I guessed that there aren't others).
So my guess is that the second pod isn't used because of sticky connections, as suggested in this answer https://stackoverflow.com/a/73028316/19501779 to one my previous questions, where there is more detail on my MongoDB deployment.
I have 2 questions :
Is the second pod not used in fact because of sticky connections between my API Pod and my first MongoDB Pod?
If this is the case, how can I overcome this issue to make the autoscaling effective?
Thanks, and if you need more info please ask me.
EDIT
Here is my MongoDB configuration:
Its Dockerfile, from which I create my MongoDB image from the VM where my original MongoDB is. A single deployment of this image works in k8s.
FROM mongo:latest
EXPOSE 27017
COPY /mdb/ /data/db
The deployment.yml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mongodb
labels:
app: mongodb
spec:
replicas: 1
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: $My_MongoDB_image
ports:
- containerPort: 27017
resources:
requests:
memory: "1000Mi"
cpu: "1000m"
imagePullSecrets: #for pulling from my Docker Hub
- name: regcred
and the service.yml and hpa.yml:
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
labels:
app: mongodb
spec:
selector:
app: mongodb
ports:
- protocol: TCP
port: 27017
targetPort: 27017
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: mongodb-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mongodb
minReplicas: 1
maxReplicas: 70
targetCPUUtilizationPercentage: 85
And I access to this service from my API Pod with PyMongo:
def get_db(database: str):
client = MongoClient(host="$Cluster_IP_of_{mongodb-service}",
port=27017,
username="...",
password="...",
authSource="admin")
return client.get_database(database)
And moreover, when a second MongoDB Pod is created thanks to autoscaling, its endpoint appears in my mongodb-service:
the HPA created a second Pod
the new Pod endpoints appears in the mongodb-service

How to scale Websocket Connections with Azure Application Gateway and AKS

We want to dynamically scale our AKS Cluster based on the number of Websocket connections.
We use Application Gateway V2 along with Application Gateway Ingress Controller on AKS as Ingress.
I configured HorizontalPodAutoscaler to scale the deployment based on the consumed memory.
When i deploy the sample app to AKS i can connect to the websocket endpoints and communicate.
However, when any scale operation happens (pods added or removed) i see connection losses on all the clients.
How can i keep the existing connections when pods are added?
How can i gracefully drain connections when pods are removed so existing clients are not affected?
I tried activating cookie based affinity on application gateway but this had no effect on the issue.
Below is the deployment i use for testing. It is based on this sample and modified a but so it allows to specify the number of connections and regularily sends ping messages to the server.
apiVersion: apps/v1
kind: Deployment
metadata:
name: wssample
spec:
replicas: 1
selector:
matchLabels:
app: wssample
template:
metadata:
labels:
app: wssample
spec:
containers:
- name: wssamplecontainer
image: marxx/websocketssample:10
resources:
requests:
memory: "100Mi"
cpu: "50m"
limits:
memory: "150Mi"
cpu: "100m"
ports:
- containerPort: 80
name: wssample
---
apiVersion: v1
kind: Service
metadata:
name: wssample-service
spec:
ports:
- port: 80
selector:
app: wssample
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: websocket-ingress
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/cookie-based-affinity: "true"
appgw.ingress.kubernetes.io/connection-draining: "true"
appgw.ingress.kubernetes.io/connection-draining-timeout: "60"
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wssample-service
port:
number: 80
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: websocket-scaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: wssample
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
Update:
I am running on a 2-node cluster with autoscaler activated to scale up to 4 nodes.
There is still plenty of memory available on the nodes
At first i thought it was an issue with browsers and javascript but i got the same results when i connected to the endpoint via a .NET Core Based Console Application (Websockets went to state 'Aborted' after the scale operation)
Update 2:
I found a pattern. The problem occurs also without HPA and can be reproduced using the following steps:
Scale Deployment to 3 Replicas
Connect 20 Clients
Manually Scale Deployment to 6 Replicas with kubectl scale command
(existing connections are still fine and clients communicate with backend)
Connect another 20 Clients
After a few seconds all the existing connections are reset
Update 3:
The AKS cluster is using kubenet networking
Same issue with Azure CNI networking though
I made a very unpleasant discovery. The outcome of this GitHub issue basically says that the behavior is by design and AGW resets all websocket connections when any backend pool rules change (which happens during scale operations).
It's possible to vote for a feature to keep those connections in those situations.

Hostname of pods in same statefulset can not be resolved

I am configuring a statefulset deploying 2 Jira DataCenter nodes. The statefulset results in 2 pods. Everything seems fine until the 2 pods try to connect to eachother. They do this with their short hostname being jira-0 and jira-1.
The jira-1 pod reports UnknownHostException when connecting to jira-0. The hostname can not be resolved.
I read about adding a headless service which I didn't have yet. After adding that I can resolve the FQDN but still no luck for the short name.
Then I read this page: DNS for Services and Pods and added:
dnsConfig:
searches:
- jira.default.svc.cluster.local
That solves my issue but I think it shouldn't be necessary to add this?
Some extra info:
Cluster on AKS with CoreDNS
Kubernetes v1.19.9
Network plugin: Kubenet
Network policy: none
My full yaml file:
apiVersion: v1
kind: Service
metadata:
name: jira
labels:
app: jira
spec:
clusterIP: None
selector:
app: jira
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: jira
spec:
serviceName: jira
replicas: 0
selector:
matchLabels:
app: jira
template:
metadata:
labels:
app: jira
spec:
containers:
- name: jira
image: atlassian/jira-software:8.12.2-jdk11
readinessProbe:
httpGet:
path: /jira/status
port: 8080
initialDelaySeconds: 120
periodSeconds: 10
livenessProbe:
httpGet:
path: /jira/
port: 8080
initialDelaySeconds: 600
periodSeconds: 10
envFrom:
– configMapRef:
name: jira-config
ports:
- containerPort: 8080
dnsConfig:
searches:
- jira.default.svc.cluster.local
That solves my issue but I think it shouldn't be necessary to add this?
From the StatefulSet documentation:
StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.
The example above will create three Pods named web-0,web-1,web-2. A StatefulSet can use a Headless Service to control the domain of its Pods.
The pod-identity is will be subdomain to the governing service, eg. in your case it will be e.g:
jira-0.jira.default.svc.cluster.local
jira-1.jira.default.svc.cluster.local

Kubernetes RC wait until pod is ready before scaling down

I have a ruby on rails app on kubernetes.
Here's what I do
kubernetes rolling-update new_file
Kubernetes began to create new pods
When the new pods are ready, Kubernetes kills the old pod.
However, although my new pod are in ready state, they are actually doing rails assets build/compressing. They aren't ready yet. How can I let kubernetes know that it's not ready yet?
This sounds like a prime example for a readiness probe: It tells Kubernetes to not take a pod into load balancing until a certain condition holds, often an HTTP endpoint that returns positively. Here's an example probe defined along a Deployment specification:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
See the user guide for a starter and follow-up links contained.