Kubernetes ClusterIP service initial delay or liveness - kubernetes

I have a Kubernetes deployment on GCP and a ClusterIP service to discover pods in this deployment. The deployment contains multiple replica set pods which come and go based on our horizontal pod scalar configuration (based on CPU Utilization).
Now, when a new replica set pod is created, it takes some time for the application to start servicing. But the ClusterIP already starts distributing requests to new replica set pod before the application is ready, which causes the requests to be not serviced.
ClusterIP service yaml:
apiVersion: v1
kind: Service
metadata:
labels:
app: service-name
tier: backend
environment: "dev"
creator: internal
name: service-name
spec:
clusterIP: None
ports:
- name: https
protocol: TCP
port: 7070
targetPort: 7070
selector:
app: dep-name
tier: "backend"
environment: "dev"
creator: "ME"
type: ClusterIP
How can the ClusterIP be told to start distributing requests to the new pod after the application starts? Can there be any initial delay or liveness probe set for this purpose?

Kubernetes provides readiness probe for it. With readiness probes, Kubernetes will not send traffic to a pod until the probe is successful. When updating a deployment, it will also leave old replica(s) running until probes have been successful on new replica. That means that if your new pods are broken in some way, they’ll never see traffic, your old pods will continue to serve all traffic for the deployment.
You need to update the deployment file with following readiness probe:
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
If your application have http probe then you can set readiness probe in HTTP mode as well.
For more information how can you use readiness probe refer:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes

You should have a readiness probe as defined in the documentation at
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes.
As defined in the documentation you should be able to configure using initialDelaySeconds and periodSeconds.
Your current behavior is probably because the service load balancer has seen that all the containers in the pod are started. You can define your readyness checks like the example below picked from documentation.
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20

Related

Re-route traffic in kubernetes to a working pod

Not sure if such if there was such a question, so pardon me if I couldn't find such.
I have a cluster based on 3 nodes, my application consists of a frontend and a backend with each running 2 replicas:
front1 - running on node1
front2 - running on node2
be1 - node1
be2 - node2
Both FE pods are served behind frontend-service
Both BE pods are service behind be-service
When I shutdown node-2, the application stopped and in my UI I could see application errors.
I've checked the logs and found out that my application attempted to reach the service type of the backend pods and it failed to respond since be2 wasn't running, the scheduler is yet to terminate the existing one.
Only when the node was terminated and removed from the cluster, the pods were rescheduled to the 3rd node and the application was back online.
I know a service mesh can help by removing the pods that aren't responding from the traffic, however, I don't want to implement it yet, and trying to understand what is the best solution to route the traffic to the healthy pods in a fast and easy way, 5 minutes of downtime is a lot of time.
Here's my be deployment spec:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: backend
name: backend
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: backend
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: backend
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-Application
operator: In
values:
- "true"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: SSL_ENABLED
value: "false"
image: quay.io/something:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /liveness
port: 16006
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 10
name: backend
ports:
- containerPort: 16006
protocol: TCP
- containerPort: 8457
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 16006
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 1500m
memory: 8500Mi
requests:
cpu: 6m
memory: 120Mi
dnsPolicy: ClusterFirst
Here's my backend service:
apiVersion: v1
kind: Service
metadata:
labels:
app: identity
name: backend
namespace: default
spec:
clusterIP: 10.233.34.115
ports:
- name: tcp
port: 16006
protocol: TCP
targetPort: 16006
- name: internal-http-rpc
port: 8457
protocol: TCP
targetPort: 8457
selector:
app: backend
sessionAffinity: None
type: ClusterIP
This is a community wiki answer. Feel free to expand it.
As already mentioned by #TomerLeibovich the main issue here was due to the Probes Configuration:
Probes have a number of fields that you can use to more precisely
control the behavior of liveness and readiness checks:
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to
0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1
for liveness and startup Probes. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness
probe means restarting the container. In case of readiness probe the
Pod will be marked Unready. Defaults to 3. Minimum value is 1.
Plus the proper Pod eviction configuration:
The kubelet needs to preserve node stability when available compute
resources are low. This is especially important when dealing with
incompressible compute resources, such as memory or disk space. If
such resources are exhausted, nodes become unstable.
Changing the threshold to 1 instead of 3 and reducing the pod-eviction solved the issue as the Pod is now being evicted sooner.
EDIT:
The other possible solution in this scenario is to label other nodes with the app backend to make sure that each backend/pod was deployed on different nodes. In your current situation one pod deployed on the node was removed from the endpoint and the application became unresponsive.
Also, the workaround for triggering pod eviction from the unhealthy node is to add tolerations to
deployment.spec. template.spec: tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 60
instead of using the default value: tolerationSeconds: 300.
You can find more information in this documentation.

Pod does not communicate with other pod through service

I have 2 pods: a server pod and a client pod (basically the client hits port 8090 to interact with the server). I have created a service (which in turn creates an endpoint) but the client pod cannot reach that endpoint and therefore it crashes:
Error :Error in client :rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp :8090: connect: connection refused")
The client pod tries to access port 8090 in its host network. What I am hoping to do is that whenever the client hits 8090 through the service it connects to the server.
I just cannot understand how I would connect these 2 pods and therefore require help.
server pod:
apiVersion: v1
kind: Pod
metadata:
name: server-pod
labels:
app: grpc-app
spec:
containers:
- name: server-pod
image: image
ports:
- containerPort: 8090
client pod :
apiVersion: v1
kind: Pod
metadata:
name: client-pod
labels:
app: grpc-app
spec:
hostNetwork: true
containers:
- name: client-pod
image: image
Service:
apiVersion: v1
kind: Service
metadata:
name: server
labels:
app: grpc-app
spec:
type: ClusterIP
ports:
- port: 8090
targetPort: 8090
protocol: TCP
selector:
app: grpc-app
Your service is selecting both the client and the server. You should change the labels so that the server should have something like app: grpc-server and the client should have app: grpc-client. The service selector should be app: grpc-server to expose the server pod. Then in your client app, connect to server:8090. You should remove hostNetwork: true.
One thing that i feel is going wrong is that the services are not ready to accept connection and your client is trying to access that therefore getting a connection refused.I faced the similar problem few days back. What i did is added a readiness and liveness probe in the yaml config file.Kubernetes provides liveness and readiness probes that are used to check the health of your containers. These probes can check certain files in your containers, check a TCP socket, or make HTTP requests.
A sample like this
spec:
containers:
- name: imagename
image: image
ports:
- containerPort: 19500
name: http
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 120
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 120
timeoutSeconds: 5
So it will check whether your application is ready to accept connection before redirecting traffic.

Kubernetes health check outside container

Can I do liveness or readiness kind of health check from out of the container. I mean, can I stop traffic to pods and restart containers in case application is not accessible.
The Http Request Liveness Probe and TCP Liveness Probe can be used to see if your application running inside of the container is reachable from the outside world:
pods/probe/http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
See this piece of documentation on configuring probes. Does that answer your question?

kubernetes connection refused during deployment

I'm trying to achieve a zero downtime deployment using kubernetes and during my test the service doesn't load balance well.
My kubernetes manifest is:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
metadata:
labels:
app: myapp
version: "0.2"
spec:
containers:
- name: myapp-container
image: gcr.io/google-samples/hello-app:1.0
imagePullPolicy: Always
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
---
apiVersion: v1
kind: Service
metadata:
name: myapp-lb
labels:
app: myapp
spec:
type: LoadBalancer
externalTrafficPolicy: Local
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
If I loop over the service with the external IP, let's say:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.35.240.1 <none> 443/TCP 1h
myapp-lb LoadBalancer 10.35.252.91 35.205.100.174 80:30549/TCP 22m
using the bash script:
while True
do
curl 35.205.100.174
sleep 0.2s
done
I receive some connection refused during the deployment:
curl: (7) Failed to connect to 35.205.100.174 port 80: Connection refused
The application is the default helloapp provided by Google Cloud Platform and running on 8080.
Cluster information:
Kubernetes version: 1.8.8
Google cloud platform
Machine type: g1-small
It looks like your request goes to a not started pod. I have avoided this by adding a few parameters:
Liveness probe to be sure app has already started
maxUnavalible: 1 to deploy pods one by one
I still have some errors, but they are acceptable because they rarely happen . During the deployment, an error may occur once or twice, so with increasing load you will have a negligible amount of errors. I mean one or two errors per 2000 requests during the deployment.

Kubernetes RC wait until pod is ready before scaling down

I have a ruby on rails app on kubernetes.
Here's what I do
kubernetes rolling-update new_file
Kubernetes began to create new pods
When the new pods are ready, Kubernetes kills the old pod.
However, although my new pod are in ready state, they are actually doing rails assets build/compressing. They aren't ready yet. How can I let kubernetes know that it's not ready yet?
This sounds like a prime example for a readiness probe: It tells Kubernetes to not take a pod into load balancing until a certain condition holds, often an HTTP endpoint that returns positively. Here's an example probe defined along a Deployment specification:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
See the user guide for a starter and follow-up links contained.