Pod deletion causes errors when using NEG - kubernetes

I am for this example running the "echoheaders" Nginx in a deployment with 2 replicas. When I delete 1 pod, I sometimes get slow responses and errors for ~40 seconds.
We are running our API-gateway in Kubernetes, and need to be able to allow the Kubernetes scheduler to handle the pods as it sees fit.
We recently wanted to introduce session affinity, and for that, we wanted to migrate to the new and shiny NEG's: Network Endpoint Groups:
https://cloud.google.com/load-balancing/docs/negs/
When using NEG we experience problems during failover. Without NEG we're fine.
deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: echoheaders
labels:
app: echoheaders
spec:
replicas: 2
selector:
matchLabels:
app: echoheaders
template:
metadata:
labels:
app: echoheaders
spec:
containers:
- image: brndnmtthws/nginx-echo-headers
imagePullPolicy: Always
name: echoheaders
readinessProbe:
httpGet:
path: /
port: 8080
lifecycle:
preStop:
exec:
# Hack: wait for kube-proxy to remove endpoint before exiting, and
# gracefully shut down
command: ["bash", "-c", "sleep 10; nginx -s quit; sleep 40"]
restartPolicy: Always
terminationGracePeriodSeconds: 60
service.yaml
apiVersion: v1
kind: Service
metadata:
name: echoheaders
labels:
app: echoheaders
annotations:
cloud.google.com/neg: '{"ingress": true}'
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: echoheaders
ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.global-static-ip-name: echoheaders-staging
name: echoheaders-staging
spec:
backend:
serviceName: echoheaders
servicePort: 80
When deleting a pod I get errors as shown in this image of
$ httping -G -K 35.190.69.21
(https://i.imgur.com/u14MvHN.png)
This is new behaviour when using NEG. Disabling NEG gives the old behaviour with working failover.
Any way to use Google LB, ingress, NEG and Kubernetes without errors during pod deletion?

In GCP load balancers a GET request will only be served a 502 after two subsequent backends fail to meet the response timeout or a impactful error occurred which seems more plausible.
What is possibly happening may be an interim period, wherein a Pod was due to be terminated and had received its SIGTERM, but was still considered healthy by the load balancer and was sent a request. Since this period was so brief, it wasn't able to complete the request and closed the connection.
A graceful service stop[1] in the machine will make that after receiving a SIGTERM, your service would continue to serve in-flight requests, but refuse new connections. This may solve your issue, but keep in mind that there is no guarantee of zero-downtime.
[1] https://landing.google.com/sre/sre-book/chapters/load-balancing-datacenter/#robust_approach_lame_duck

Related

Kubernetes service routes traffic to only one of 5 pods

i'm playing around with k8s services. I have created simple Spring Boot app, that display it's version number and pod name when curling endpoint:
curl localhost:9000/version
1.3_car-registry-deployment-66684dd8c4-r274b
Then i dockerized it, pushed into my local Kind cluster and deployed with 5 replicas. Next I created service targeting all 5 pods. Lastly, i exposed service like so:
kubectl port-forward svc/car-registry-service 9000:9000
Now when curling my endpoint i expected to see randomly picked pod names, but instead I only get responses from single pod. Moreover, if i kill that one pod then my service stops working, ie i'm getting ERR_EMPTY_RESPONSE, even though there are 4 more pods available. What am I missing? Here's my deployment and service yamls:
apiVersion: apps/v1
kind: Deployment
metadata:
name: car-registry-deployment
spec:
replicas: 5
selector:
matchLabels:
app: car-registry
template:
metadata:
name: car-registry
labels:
app: car-registry
spec:
containers:
- name: car-registry
image: car-registry-database:v1.3
ports:
- containerPort: 9000
protocol: TCP
name: rest
readinessProbe:
exec:
command:
- sh
- -c
- curl http://localhost:9000/healthz | grep "OK"
initialDelaySeconds: 15
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: car-registry-service
spec:
type: ClusterIP
selector:
app: car-registry
ports:
- protocol: TCP
port: 9000
targetPort: 9000
You’re using TCP, so you’re probably using keep-alive. Try to hit it with your browser or a new tty.
Try:
curl -H "Connection: close" http://your-service:port/path
Else, check kube-proxy logs to see if there’s any additional info. Your initial question doesn’t provide much detail.

Health checks for service returning 301 after updating deployment

We recently updated the deployment of a dropwizard service deployed using Docker and Kubernetes.
It was working correctly before, the readiness probe was yielding a healthcheck ping to internal cluster IP getting 200s. Since we updated the healthcheck pings are resulting in a 301 and the service is considered down.
I've noticed that the healthcheck is now Default kubernetes L7 Loadbalancing health check for NEG. (port is set to 80) where it was previously Default kubernetes L7 Loadbalancing health check. where the port was configurable.
The kube file is deployed via CircleCI but the readiness probe is:
kind: Deployment
metadata:
name: pes-${CIRCLE_BRANCH}
namespace: ${GKE_NAMESPACE_NAME}
annotations:
reloader.stakater.com/auto: 'true'
spec:
replicas: 2
selector:
matchLabels:
app: ***
template:
metadata:
labels:
app: ***
spec:
containers:
- name: ***
image: ***
envFrom:
- configMapRef:
name: ***
- secretRef:
name: ***
command: ['./gradlew', 'run']
resources: {}
ports:
- name: pes
containerPort: 5000
readinessProbe:
httpGet:
path: /api/healthcheck
port: pes
initialDelaySeconds: 15
timeoutSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: ***
namespace: ${GKE_NAMESPACE_NAME}
spec:
ports:
- name: pes
port: 5000
targetPort: pes
protocol: TCP
selector:
app: ***
type: LoadBalancer
Any ideas on how this needs to be configured in GCP?
I have a feeling that the new deployment has changed from legacy health check to non legacy but no idea what else needs to be set up for it to work. Does the kube file handle creating firewall rules or does that need to be done manually?
Reading the docs at https://cloud.google.com/load-balancing/docs/health-check-concepts?hl=en
EDIT:
Issue is now resolved. After GKE version was updated it is now creating a NEG healthcheck by default. We disabled this by adding below annotation to service deployment file.
metadata:
annotations:
cloud.google.com/neg: '{"ingress":false}'
Issue is now resolved. After GKE version was updated it is now creating a NEG healthcheck by default. We disabled this by adding below annotation to service deployment file.
metadata: annotations: cloud.google.com/neg: '{"ingress":false}'

How to contact a random pod for one service (mutli instance of the same pod) in the same node in kubernetes

I have one single node. On this node, I have 2 applications with multi instance (3 pods each)
App A want contact App B.
My issue is : App A contact all the time the same pod of App B.
I would like alternate pod (load balancing in round robin for example)
For example :
first request: AppPod3 respond
second request: AppPod1 respond
third request: AppPod2 respond
How i can do that ?
Thank you so much for your help ...
You can see below my conf of app B
I have tried to set timeoutSeconds for sessionaffinity but it's not working ...
kind: Deployment
metadata:
name: AppB
spec:
selector:
matchLabels:
app: AppB
replicas: 3
template:
metadata:
labels:
app: AppB
spec:
containers:
- name: AppB-container
image: image
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: AppB
labels:
app: svc-AppB
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: AppB
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 1```
The service should use backend pods in round robin fashion by default. You don't need sessionAffinity settings if the pods are stateless; otherwise you will be redirected to the same pod based on the source ID.
Maybe you can add logging to the pods and observe when they are accessed. Subsequent calls to the service should be redirected to pods in round robin fashion with minimal service configuration.
Update: this is the deployment I am using. It balances the pods as expected; each curl request sent to the service clusterip:port ends on a different pod. My k8s installation is on premise, v1.18.3.
apiVersion: apps/v1
kind: Deployment
metadata:
name: baseDeployment
labels:
app: baseApp
namespace: nm-app1
spec:
replicas: 2
selector:
matchLabels:
app: baseApp
template:
metadata:
labels:
app: baseApp
spec:
containers:
- name: baseApp
image: local-registry:5000/baseApp
ports:
- containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
name: baseService
namespace: nm-app1
spec:
selector:
app: baseApp
ports:
- protocol: TCP
port: 80
targetPort: 8080
#user3683760 you will have to apply load balancing strategies in your deployment. Kube has bunch of choices for balancing the traffic between your services. I would suggest play around with few patterns and see what best suit your needs.

Routing troubleshooting with Kubernetes Ingress

I tried to setup a GKE environment with a frontend pod (cup-fe) and a backend one, used to authenticate the user upon login (cup-auth), but I can't get my ingress to work.
Following is the frontend pod (cup-fe) running nginx with an angular app. I created also a static IP address resolved by "cup.xxx.it" and "cup-auth.xxx.it" dns:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cup-fe
namespace: default
labels:
app: cup-fe
spec:
replicas: 2
selector:
matchLabels:
app: "cup-fe"
template:
metadata:
labels:
app: "cup-fe"
spec:
containers:
- image: "eu.gcr.io/xxx-cup-yyyyyy/cup-fe:latest"
name: "cup-fe"
dnsPolicy: ClusterFirst
Then is the auth pod (cup-auth):
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cup-auth
namespace: default
labels:
app: cup-auth
spec:
replicas: 1
selector:
matchLabels:
app: cup-auth
template:
metadata:
labels:
app: cup-auth
spec:
containers:
image: "eu.gcr.io/xxx-cup-yyyyyy/cup-auth:latest"
imagePullPolicy: Always
name: cup-auth
ports:
- containerPort: 8080
protocol: TCP
- containerPort: 8443
protocol: TCP
- containerPort: 8778
name: jolokia
protocol: TCP
- containerPort: 8888
name: management
protocol: TCP
dnsPolicy: ClusterFirst
Then I created two NodePorts to expose the above pods:
kubectl expose deployment cup-fe --type=NodePort --port=80
kubectl expose deployment cup-auth --type=NodePort --port=8080
Last, I created an ingress to route external http requests towards services:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: http-ingress
namespace: default
labels:
app: http-ingress
spec:
rules:
- host: cup.xxx.it
http:
paths:
- path: /*
backend:
serviceName: cup-fe
servicePort: 80
- host: cup-auth.xxx.it
http:
paths:
- path: /*
backend:
serviceName: cup-auth
So, I can reach the frontend pod at http://cup.xxx.it, the angular app redirects me to http://cup-auth.xxx.it/login, but I get only 502 bad request. With kubectl describe ingress command, I can see an unhealthy backend for the cup-auth pod.
I paste a successful output by using cup-auth label:
$ kubectl exec -it cup-fe-7f979bb747-6lqfx wget cup.xxx.it/login
Connecting to cup.xxx.it
login 100% |********************************| 1646 0:00:00 ETA
And then the not working output:
$ kubectl exec -it cup-fe-7f979bb747-6lqfx wget cup-auth.xxx.it/login
Connecting to cup-auth.xxx.it
wget: server returned error: HTTP/1.1 502 Bad Gateway
command terminated with exit code 1
I tried and replicated your setup as much as I could, but did not have any issues.
I can call the cup-auth.testdomain.internal/login normally within and outside the pods.
Usually, the 502 errors occur when the request received to the LB couldn't forward to a backend. Since you mention that you are seeing an unhealthy backend this can be the reason.
This could be due to a wrong configuration of the health checks or a problem with your application.
First I would look at the logs to see the reason the request is failing, and eliminate that there is no issue with the health checks or with the application itself.

ISTIO: enable circuit breaking on egress

I am unable to get circuit breaking configuration to work on my elb through egress config.
ELB
elb has success rate of 25% (75% 500 error & 25% with status 200),
the elb has 4 instances, only 1 returns a successful response, other instances are configured to returns 500 error for testing purpose.
Setup
k8s: v1.7.4
istio: 0.5.0
env: k8s on aws
Egress rule
apiVersion: config.istio.io/v1alpha2
kind: EgressRule
metadata:
name: elb-egress-rule
spec:
destination:
service: xxxx.us-east-1.elb.amazonaws.com
ports:
- port: 80
protocol: http
Destination Policy
kind: DestinationPolicy
metadata:
name: elb-circuit-breaker
spec:
destination:
service: xxxx.us-east-1.elb.amazonaws.com
loadBalancing:
name: RANDOM
circuitBreaker:
simpleCb:
maxConnections: 100
httpMaxPendingRequests: 100
sleepWindow: 3m
httpDetectionInterval: 1s
httpMaxEjectionPercent: 100
httpConsecutiveErrors: 3
httpMaxRequestsPerConnection: 10
Route rules: not set
Testing
apiVersion: v1
kind: Service
metadata:
name: sleep
labels:
app: sleep
spec:
ports:
- port: 80
name: http
selector:
app: sleep
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: sleep
spec:
replicas: 1
template:
metadata:
labels:
app: sleep
spec:
containers:
- name: sleep
image: tutum/curl
command: ["/bin/sleep","infinity"]
imagePullPolicy: IfNotPresent
.
export SOURCE_POD=$(kubectl get pod -l app=sleep -o jsonpath={.items..metadata.name})
kubectl exec -it $SOURCE_POD -c sleep bash
Sending requests in parallel from the pod
#!/bin/sh
set -m # Enable Job Control
for i in `seq 100`; do # start 100 jobs in parallel
curl xxxx.us-east-1.elb.amazonaws.com &
done
Response
Currently, Istio considers an Egress Rule to designate a single host. This single host will not be ejected due to the load balancer's panic threshold of Envoy (the sidecar proxy implementation of Istio). The default panic threshold of Envoy is 50%. This means that at least two hosts are required for one host to be ejected, so the single host of an Egress Rule will not be ejected.
This practically means that httpConsecutiveErrors does not effect the external services. This lack of functionality should be partially resolved with External Services of Istio that will replace the Egress Rules.
See documentation of the Istio External Services backed by multiple endpoints -https://github.com/istio/api/blob/master/routing/v1alpha2/external_service.proto#L113