I currently have an Ingress configured on GKE (k8s 1.2) to forward requests towards my application's pods. I have a request which can take a long time (30 seconds) and timeout from my application (504). I observe that when doing so the response that i receive is not my own 504 but a 502 from what looks like the Google Loadbalancer after 60 seconds.
I have played around with different status codes and durations, exactly after 30 seconds i start receiving this weird behaviour regardless of statuscode emitted.
Anybody have a clue how i can fix this? Is there a way to reconfigure this behaviour?
Beginning with 1.11.3-gke.18, it is possible to configure timeout settings in kubernetes directly.
First add a backendConfig:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: my-bsc-backendconfig
spec:
timeoutSec: 40
Then add an annotation in Service to use this backendConfig:
apiVersion: v1
kind: Service
metadata:
name: my-bsc-service
labels:
purpose: bsc-config-demo
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"80":"my-bsc-backendconfig"}}'
spec:
type: NodePort
selector:
purpose: bsc-config-demo
ports:
- port: 80
protocol: TCP
targetPort: 8080
And viola, your ingress load balancer now has a timeout of 40 second instead of the default 30 seconds.
See https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service#creating_a_backendconfig
When creating an ingress on GKE the default setup is that a GLBC HTTP load balancer will be created with the backends that you supplied. Default it is configured at a 30 second timeout for your application to handle the request.
If you need a longer timeout you have to edit this manually after setup in the backends of your HTTP Load balancer in the google cloud console.
Related
I am using HAProxy as the ingress-controller in my GKE clusters. And exposing HAProxy service as LoadBalancer service(Internal).
Recently, I experienced an issue, where the HA-Proxy service changed its EXTERNAL-IP, and traffic stopped routing to HAProxy. This issue occurred multiple times on different days(now it has stopped). I had to manually add that new External-IP to the frontend of that Loadbalancer to allow traffic to HAProxy.
There were two pods running for HAProxy, and both had been running for days, and there was nothing in their logs. I assume it was something related to Service or GCP LB and not HAProxy itself.
I am afraid that I don't have any logs related to that.
I still don't know, what caused the service IP to change. As there were no recent changes, and the cluster and all services were running for many days properly, and suddenly this occurred.
Has anyone faced a similar issue earlier? Or what can I do to avoid such issue in future?
What could have caused the IP to change?
This is how my service is configured:
---
apiVersion: v1
kind: Service
metadata:
labels:
run: haproxy-ingress
name: haproxy-ingress
namespace: haproxy-controller
annotations:
cloud.google.com/load-balancer-type: "Internal"
networking.gke.io/internal-load-balancer-allow-global-access: "true"
cloud.google.com/network-tier: "Premium"
spec:
selector:
run: haproxy-ingress
type: LoadBalancer
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
- name: stat
port: 1024
protocol: TCP
targetPort: 1024
Found some logs:
Warning SyncLoadBalancerFailed 30m (x3570 over 13d) service-controller Error syncing load balancer: failed to ensure load balancer: googleapi: Error 409: IP_IN_USE_BY_ANOTHER_RESOURCE - IP '10.17.129.17' is already being used by another resource.
Normal EnsuringLoadBalancer 3m33s (x3576 over 13d) service-controller Ensuring load balancer
The Short answer is: External IP for the service are ephemeral.
Because HA-Proxy controller pods are recreated the HA-Proxy service is created with an ephemeral IP.
To avoid this issue, I would recommend using a static IP that you can reference in the loadBalancerIP field.
This can be done by following steps:
Reserve a static IP. (link)
Use this IP, to create a service (link)
Example YAML:
apiVersion: v1
kind: Service
metadata:
name: helloweb
labels:
app: hello
spec:
selector:
app: hello
tier: web
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
loadBalancerIP: "YOUR.IP.ADDRESS.HERE"
Unfortunately without logs it's hard to say anything for sure. You should check the audit logs that GKE ships to Cloud Logging as that might give you some idea of what happened. One option is the GCP "oops"'d the GLB and GKE recreated it, thus giving it a new IP. I've never heard of that happening with LBs though (it happens pretty often with nodes, but not LBs). A more common case would be you ran some kubectl command that inadvertently removed the Service object and then it was recreated by some management layer you have set up (Argo, Flux, Helm Operator, whatever) but delete+recreate again means it's a new LB with a new IP. The latter case should be visible in the audit logs so check those out for sure.
I have configured a web application pod exposed via apache on port 80. I'm unable to configure a service + ingress for accessing from the internet. The issue is that the backend services always report as UNHEALTHY.
Pod Config:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: webapp
name: webapp
namespace: my-app
spec:
replicas: 1
selector:
matchLabels:
name: webapp
template:
metadata:
labels:
name: webapp
spec:
containers:
- image: asia.gcr.io/my-app/my-app:latest
name: webapp
ports:
- containerPort: 80
name: http-server
Service Config:
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
type: NodePort
selector:
name: webapp
ports:
- protocol: TCP
port: 50000
targetPort: 80
Ingress Config:
kind: Ingress
metadata:
name: webapp-ingress
spec:
backend:
serviceName: webapp-service
servicePort: 50000
This results in backend services reporting as UNHEALTHY.
The health check settings:
Path: /
Protocol: HTTP
Port: 32463
Proxy protocol: NONE
Additional information: I've tried a different approach of exposing the deployment as a load balancer with external IP and that works perfectly. When trying to use a NodePort + Ingress, this issue persists.
With GKE, the health check on the Load balancer is created automatically when you create the ingress. Since the HC is created automatically, so are the firewall rules.
Since you have no readinessProbe configured, the LB has a default HC created (the one you listed). To debug this properly, you need to isolate where the point of failure is.
First, make sure your pod is serving traffic properly;
kubectl exec [pod_name] -- wget localhost:80
If the application has curl built in, you can use that instead of wget.
If the application has neither wget or curl, skip to the next step.
get the following output and keep track of the output:
kubectl get po -l name=webapp -o wide
kubectl get svc webapp-service
You need to keep the service and pod clusterIPs
SSH to a node in your cluster and run sudo toolbox bash
Install curl:
apt-get install curl`
Test the pods to make sure they are serving traffic within the cluster:
curl -I [pod_clusterIP]:80
This needs to return a 200 response
Test the service:
curl -I [service_clusterIP]:80
If the pod is not returning a 200 response, the container is either not working correctly or the port is not open on the pod.
if the pod is working but the service is not, there is an issue with the routes in your iptables which is managed by kube-proxy and would be an issue with the cluster.
Finally, if both the pod and the service are working, there is an issue with the Load balancer health checks and also an issue that Google needs to investigate.
As Patrick mentioned, the checks will be created automatically by GCP.
By default, GKE will use readinessProbe.httpGet.path for the health check.
But if there is no readinessProbe configured, then it will just use the root path /, which must return an HTTP 200 (OK) response (and that's not always the case, for example, if the app redirects to another path, then the GCP health check will fail).
I have a setup Metallb as LB with Nginx Ingress installed on K8S cluster.
I have read about session affinity and its significance but so far I do not have a clear picture.
How can I create a single service exposing multiple pods of the same application?
After creating the single service entry point, how to map the specific client IP to Pod abstracted by the service?
Is there any blog explaining this concept in terms of how the mapping between Client IP and POD is done in kubernetes?
But I do not see Client's IP in the YAML. Then, How is this service going to map the traffic to respective clients to its endpoints? this is the question I have.
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
Main concept of Session Affinity is to redirect traffic from one client always to specific node. Please keep in mind that session affinity is a best-effort method and there are scenarios where it will fail due to pod restarts or network errors.
There are two main types of Session Affinity:
1) Based on Client IP
This option works well for scenario where there is only one client per IP. In this method you don't need Ingress/Proxy between K8s services and client.
Client IP should be static, because each time when client will change IP he will be redirected to another pod.
To enable the session affinity in kubernetes, we can add the following to the service definition.
service.spec.sessionAffinity: ClientIP
Because community provided proper manifest to use this method I will not duplicate.
2) Based on Cookies
It works when there are multiple clients from the same IP, because it´s stored at web browser level. This method require Ingress object. Steps to apply this method with more detailed information can be found here under Session affinity based on Cookie section.
Create NGINX controller deployment
Create NGINX service
Create Ingress
Redirect your public DNS name to the NGINX service public/external IP.
About mapping ClientIP and POD, according to Documentation
kube-proxy is responsible for SessionAffinity. One of Kube-Proxy job
is writing to IPtables, more details here so thats how it is
mapped.
Articles which might help with understanding Session Affinity:
https://sookocheff.com/post/kubernetes/building-stateful-services/
https://medium.com/#diegomrtnzg/redirect-your-users-to-the-same-pod-by-using-session-affinity-on-kubernetes-baebf6a1733b
follow the service reference for session affinity
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
I have a Service on GKE of type LoadBalancer that points to a GKE deployment running nginx. My nginx has all of the timeouts set to 10 minutes, yet HTTP/HTTPS requests that have to wait on processing before receiving a response get cutoff with 500 errors after 30 seconds. My settings:
http {
proxy_read_timeout 600s;
proxy_connect_timeout 600s;
keepalive_timeout 600s;
send_timeout 600s;
}
Apparently there are default settings of 30 seconds in the LoadBalancer somewhere.
After pouring through documentation, I've only found a step-through at Google that outlines setting an Ingress with back-end service Load Balancer with a timeout but can't find how to do that on a Service that's Type=LoadBalancer for use with GKE. I've also reviewed all of the Kubernetes documentation for versions 1.7+ (we're on 1.8.7-gke.1) and nothing about setting a timeout. Is there a setting I can add to my yaml file to do this?
If it helps I found the following for AWS, which appears to be what I would need to have on GKE:
annotations:
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
As of April 2021 you can do this via GKE/GCE configuration. Here are the instructions.
Essentially you create a BackendConfig resource similar to this:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: my-backendconfig
spec:
timeoutSec: 40
connectionDraining:
drainingTimeoutSec: 60
(kubectl apply -f my-backendconfig.yaml)
and then connect it to your GKE service resource with an annotation:
apiVersion: v1
kind: Service
metadata:
name: my-service
labels:
purpose: bsc-config-demo
annotations:
cloud.google.com/backend-config: '{"ports": {"80":"my-backendconfig"}}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: ClusterIP
selector:
purpose: bsc-config-demo
ports:
- port: 80
protocol: TCP
targetPort: 8080
(kubectl apply -f my-service.yaml)
If you prefer, the BackendConfig resource (and Service) can be placed in a namespace with a metadata namespace designation in your yaml.
metadata:
namespace: my-namespace
So far you cannot do it from the YAML file.
There is a open feature request at the moment that I advise you to subscribe and to follow:
https://github.com/kubernetes/ingress-gce/issues/28
They were already discussing regarding this change in 2016: issue.
"Specific use case: GCE backends are provisioned with a default timeout of 30 seconds, which is not sufficient for some long requests. I'd like to be able to control the timeout per-backend."
However I would suggest you to check this part of the Google Cloud Documentation talking specifically regarding configurable response timeout.
UPDATE
Check the issue because they are making progresses
I see there was a v1.0.0 release 18 days ago. Was this the completion of the major refactoring you talked about #nicksardo ?
Is it possible yet to configure how long a connection can be idle before the LB closes it?
UPDATE
The issue mentioned above is now closed and documentation for setting the timeout (and other backend service settings) is available here:
https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service
I've got a kubernetes 1.6.2 cluster, and am creating a service like:
kind: Service
apiVersion: v1
metadata:
name: hello
namespace: myns
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
dns.alpha.kubernetes.io/internal: mydomain.com
spec:
selector:
app: hello-world
ports:
- protocol: "TCP"
port: 80
targetPort: 5000
type: LoadBalancer
I'd expect this to create an internal ELB (which it does) but also set up an A record on the AWS Route53 hosted zone for mydomain.com as per https://github.com/kubernetes/kops/tree/master/dns-controller (which it doesn't). Is there something I need to do to enable A record creation?
In route 53 create a type A record with Alias Yes prior to launching your cluster and Kurbernetes will audio update it with proper Alias Target which gets resolved to the correct IP upon app boot up when you issue
kubectl expose rs .....
This might not work since dns.alpha.kubernetes.io/internal requires that you use NodePort, at least that's what's written in here https://github.com/kubernetes/kops/issues/1082
Update:
There's an open issue about this A-record. CNAME record can be created but A record is not yet working. Forgot the issue number but I think you can find it from kops github issues.