Troubleshooting Ingress Service Unavailable 503 - kubernetes

Every request made to my my kubernetes node results in a Ingress Service Unavailable (503) response.
What are some different steps I should take to troubleshoot this issue?

So if you are asking for ingress debugging steps, mine usually go along the lines of:
Check if Service is available internally, this could be done by running a busybox container internally and just running curl commands against the endpoint
Make sure that ingress selectors match the service that you have specified
Make sure that Pods is up and running (log the pod etc).
Make sure that ingress controller is not throwing errors (log the ingress controller)
It is a bit of a vague question as you could possibly have a host of issues wrong. I would say give us more info and we could better help understand your problem (i.e show use the yaml you use to configure the ingress)

Related

Having question about publishing service in Kubernetes

My cluster has one master and two slaves(not on any cloud platform), and I create a deployment with 2 replicas so each slave has one pod, the image I’m running is tensorflow-jupyter. Then I create a NodePort type service for this deployment and I thought I can separately run these two pods at the same time, but I was wrong.
Tensorflow-jupyter have to use token it gives to login, everything is fine if there has only 1 pod, but if the replicas is 2 or more, it will have server error after login and logout by itself after I press F5, then I can’t use the token to login anymore. Similar situation happens to Wordpress, too.
I think I shouldn’t use NodePort type to doing this, but I don’t know if other service type can solve this problem. I don’t have load balancer to try and I don’t know how to use ExternalName.
Is there has any way to expose a service for a deployment with 2 or more replicas(one pod per slave)? Or I only can create a lot of deployments all with 1 pod and then expose same amount of services for each deployment?
It seems the application you're trying to deploy requires sticky session support: this is not supported out-of-the-box with the NodePort Service, you have to go for exposing your application using an Ingress resource controlled by an Ingress Controller in order to take advantage of the reverse-proxy capabilities (in this case, the sticky-session).
I'm not suggesting you use the sessionAffinity=ClientIP Service option since it's allowed only for ClusterIP Service resources and according to your question it seems the application has to be accessed outside of the cluster.

K8s does strange networking stuff that breaks application designs

I discovered a strange behavior with K8s networking that can break some applications designs completely.
I have two pods and one Service
Pod 1 is a stupid Reverse Proxy (I don't know the implementation)
Pod 2 is a Webserver
The mentioned Service belongs to pod 2, the webserver
After the initial start of my stack I discovered that Pod 1 - the Reverse Proxy is not able to reach the webserver on the first attempt for some reason, ping is working fine and curl also.
Now I tried wget mywebserver inside of Pod 1 - Reverse Proxy and got back the following:
wget mywebserver
--2020-11-16 20:07:37-- http://mywebserver/
Resolving mywebserver (mywebserver)... 10.244.0.34, 10.244.0.152, 10.244.1.125, ...
Connecting to mywebserver (mywebserver)|10.244.0.34|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.0.152|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.1.125|:80... failed: Connection refused.
Connecting to mywebserver (mywebserver)|10.244.2.177|:80... connected.
Where 10.244.2.177 is the Pod IP of the Webserver.
The problem to me it seems is that the Reverse-Proxy does not try to trigger the attempt to forward the package twice, instead it only tries once where it fails like in the wget cmd above and the request gets dropped as the backed is not reachable due to fancy K8s IPtables stuff it seems...
If I configure the reverse-proxy not to use the Service DNS-name for load-off and instead use the Pod IP (10.244.2.177) everything is working fine and as expected.
I already tried this with a variety of CNI Providers like: Flannel, Calico, Canal, Weave and also Cilium as Kube-Proxy is not used with Cilium but all of them failed and all of them doing fancy routing nobody clearly understands out-of-the-box. So my question is how can I make K8s routing work immediately at this point? I already have reimplemented my whole stack to docker-swarm just to see if it works, and it does, flawlessly! So this issue has to do something with K8s routing scheme it seems.
Just to exclude misconfiguration from my side I also tried this with different ready-to-use K8s solutions like managed K8s from Digital-Ocean and or self-hosted RKE. All have the same behavior.
Does somebody maybe have a Idea what the problem might be and how to fix this behavior of K8s?
I might also be very useful to know what actually happens at the wget request, as this remains a mystery to me.
Many thanks in advance!
It turned out that I had several misconfigurations at my K8s Deployment.
I first removed ClusterIP: None as this leads to the behavior wget shows above at my question. Beside I've set app: and tier: wrong at my deployment. Anyways now everything is working fine and wget has a proper connection.
Thanks again

How to allow only one connection per pod using nginx ingress controller

My Kubernetes cluster uses a replicaSet to run N similar pods. Each pod can only handles one websocket connection du to resource limitations. My Kubernetes uses a nginx ingress controller.
Is there any way to make nginx dispatch only one incoming websocket connection per pod and in case of lack of available pods, refuses the incoming connection ?
I'm not super familiar with the Kubernetes Nginx ingress setup, but assuming it exposes some of the Nginx configuration options for setting up groups of servers, in the server function there's a parameter called max_conns that will let you limit the number of connections to a given server. Assuming there's a mapping in the ingress controller, it should be possible to set max_conns=1 for each server that's getting created and added to the Nginx configuration under the hood.
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#server
Edit: a little cursory research and it looks like this is indeed possible. It looks like you can specify this in a ConfigMap as nginx.org/max-conns according to the master list of parameters here: https://github.com/nginxinc/kubernetes-ingress/blob/master/docs/configmap-and-annotations.md
You could use the readinessProbe with a periodSecond relatively low and, obvously, {success,failure}Threshold set to 1 in order to release or not the Pod as fast as possible.
Basically, you could set up a script or a simple HTTP endpoint that returns a failed status code in case a connection has been established: so the Pod endpoint will be removed from the Service endpoints list and will not be selected by the Ingress Controller.
Just keep in mind that this solution could be affected by race-conditions but it's the most simple one: a better solution could be using a Service Mesh but means additional complexity.

How to debug a Kubernetes service endpoint that isn't serving correctly?

I have set up a Kubernetes cluster. The cluster contains, among other things, a cluster and deployment surfacing an API webservice (based on the subway-explorer-gmaps-proxy container).
I've deployed the service externally, using the LoadBalancer service type (this is on GCP):
$kubectl get svc subway-explorer-gmaps-proxy-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
subway-explorer-gmaps-proxy-service LoadBalancer 10.35.252.232 35.224.78.225 9000:31396/TCP 19h
My understanding (and correct me if I'm wrong!) is that this service should now be queryable outside of the cluster, by visiting http://35.224.78.225 in the browser.
When running the Docker container locally, I can verify things are working correctly by navigating to the following URL:
http://localhost:49161/starting_x=-73.954527&starting_y=40.587243&ending_x=-73.977756&ending_y=40.687163
Looking at the kubectl get output, I expect visiting the following URL in the browser will serve me the content I'm looking for:
http://35.224.78.225:31396/starting_x=-73.954527&starting_y=40.587243&ending_x=-73.977756&ending_y=40.687163
But when I visit this URL, nothing gets served.
I suspect there is a non-fatal error in the deployment configuration. What is an effective way of debugging this effective way of debugging this problem? Are there access logs or a stdout stream somewhere I can check to see what's wrong?
You can try running through the official docs on debugging services: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
Beyond that, have you confirmed you're querying the load balancer on the right port? While I don't deploy on GCP, when launching a load balancer for a kubernetes service on AWS it'll accept traffic on port 80/443 and forward it to the NodePort of the service, which I'm guessing is 31396 for your case. What are the ports listed in kubectl get svc subway-explorer-gmaps-proxy-service -o yaml?
What I didn't realize is that Google Cloud has a separate firewall system, which is distinct from the connection settings managed by Kubernetes. In order to expose the application to the outside world (e.g. a web browser, for example), I need to also modify the Google Cloud Firewall rules (see for example this answer as to how).
To test that the application is working on the Kubernetes side, you need not modify cloud firewall rules. Instead, run wget, curl, or some similar data retrieval command from a different pod on the cluster, pointed at the internal IP address and port number of the pod of interest.
For example. The "hello world" pod used by the Kubernetes documentation is the busybox pod (defined here). By creating this pod in my cluster, and then running the following:
kubectl exec busybox -c busybox -- wget "10.35.249.23:9000"
I was able to confirm that the service is functioning correctly within Kubernetes. You can also use any other pod which defines a wget in the underlying OS, I just used busybox because all of my other pods use Google's Container Optimized OS, which doesn't include it.
Finally, for the purposes of debugging, I went ahead and added a /status endpoint to my API application service which serves {"status": "OK"} when the core service is working. I recommend following this pattern with other applications as well, as it gives a simple endpoint that you can test to make sure that, at a minimum, the webserver is responding to input. In my case, I discovered that the /status page is OK, but the API calls are failing, which allows me to narrow the issue down to unresolved Promises caused by a bad credentials secret.

In Kubernetes, how do I implement session affinity using an Ingress?

I'd like to implement a sticky-session Ingress controller. Cookies or IP hashing would both be fine; I'm happy as long as the same client is generally routed to the same pod.
What I'm stuck on: it seems like the Kubernetes service model means my connections are going to be proxied randomly no matter what. I can configure my Ingress controller with session affinity, but as soon as the the connection gets past the that and hits a service, kube-proxy is just going to route me randomly. There's the sessionAffinity: ClientIP flag on services, but that doesn't help me -- the Client IP will always be the internal IP of the Ingress pod.
Am I missing something? Is this possible given Kubernetes' current architecture?
An ingress controller can completely bypass kube-proxy. The haproxy controller for example, does this and goes straight to endpoints. However it doesn't use the Ingress in the typical sense.
You could do the same with the nginx controller, all you need to lookup endpoints and insert them instead of the DNS name it currently uses (i.e swap this line for a pointer to an upstream that contains the endpoints).
I evaluated the haproxy controller but could not get it running reliably with session affinity. After some research I discovered Nginx Ingress Controller which since version 0.61 also includes the nginx-sticky-module-ng module and is now running reliably since a couple of days in our test environment. I created a Gist that sets up the required Kubernetes pieces since some important configuration is a bit hard to locate in the existing documentation. Good luck!