What route do service requests pass through Kubernetes? - kubernetes

Let's say that a Service in a Kubernetes cluster is mapped to a group of cloned containers that will fulfill requests made for that service from the outside world.
What are the steps in the journey that a request from the outside world will make into the Kubernetes cluster, then through the cluster to the designated container, and then back through the Kubernetes cluster out to the original requestor in the outside world?
The documentation indicates that kube-controller-manager includes the Endpoints controller, which joins services to Pods. But I have not found specific documentation illustrating the steps in the journey that each request makes through a Kubernetes cluster.
This is important because it affects how one might design security for services, including the configuration of routing around the control plane.

Assuming you are using mostly the defaults:
Packet comes in to your cloud load balancer of choice.
It gets forwarded to a random node in the cluster.
It is received by the kernel and run through iptables.
Iptables defines a mapping rule to forward the packet to a container IP.
Unless it randomly happens to be on the same box, it then goes through your CNI network, usually some kind of overlay possibly with a wrapping and unwrapping.
It eventually gets to the container IP, and then is delivered to whatever the process inside the container is.
The Services and Endpoints system is what creates and manages the iptables rules and the cloud load balancers so that the LB knows the right node IPs and the iptables rules know the right container IPs.

Related

Clarification about Ports in Kubernetes scaling

Let's say I have a web application Backend that I want to deploy with the help of Kubernetes, how exactly does scaling work in this case.
I understand scaling in Kubernetes as: We have one a master node that orchestrates multiple worker nodes where each of the worker nodes runs 0-n different containers with the same image. My question is, if this is correct, how does Kubernetes deal with the fact that the same application use the same Port within one worker node? Does the request reach the master node which then handles this problem internally?
Does the request reach the master node which then handles this problem internally?
No, the master nodes does not handle traffic for your apps. Typically traffic meant for your apps arrive to a load balancer or gateway, e.g. Google Cloud Load Balancer or AWS Elastic Load Balancer, then the load balancer forwards the request to a replica of a matching service - this is managed by the Kubernetes Ingress resource in your cluster.
The master nodes - the control plane - is only used for management, e.g. when you deploy a new image or service.
how does Kubernetes deal with the fact that the same application use the same Port within one worker node?
Kubernetes uses a container runtime for your containers. You can try this on your own machine, e.g. when you use docker, you can create multiple containers (instances) of your app, all listening on e.g. port 8080. This is a key feature of containers - the provide network isolation.
On Kubernetes, all containers are tied together with a custom container networking. How this works, depends on what Container Networking Interface-plugin you use in your cluster. Each Pod in your cluster will get its own IP address. All your containers can listen to the same port, if you want - this is an abstraction.

Having 1 outgoing IP for kubernetes egress traffic

Current set-up
Cluster specs: Managed Kubernetes on Digital Ocean
Goal
My pods are accessing some websites but I want to use a proxy first.
Problem
The proxy I need to use is only taking 1 IP address in an "allow-list".
My cluster is using different nodes, with node-autoscaler so I have multiple and changing IP addresses.
Solutions I am thinking about
Setting-up a proxy (squid? nginx?) outside of the cluster (Currently not working when I access an HTTPS website)
Istio could let me set-up a gateway? (No knowledge of Istio)
Use GCP managed K8s, and follow the answers on Kubernetes cluster outgoing traffic IP. But all our stack is on Digital Ocean and the pricing is better there.
I am curious to know what is the best practice, easiest solution or if anyone experienced such use-case before :)
Best
You could set up all your traffic to go through istio-egressgateway.
Then you could manipulate the istio-egressgateway to always be deployed on the same node of the cluster, and whitelist that IP address.
Pros: super easy. BUT. If you are not using Istio already, to set up Istio just for this is may be killing a mosquito with a bazooka.
Cons: Need to make sure the node doesn't change the IP address. Otherwise the istio-egressgateway itself might not get deployed (if you do not have the labels added to the new node), and you will need to reconfigure everything for the new node (new IP address). Another con might be the fact that if the traffic goes up, there is an HPA, which will deploy more replicas of the gateway, and all of them will be deployed on the same node. So, if you are going to have lots of traffic, may be it would be a good idea to isolate one node, just for this purpose.
Another option would be as you are suggesting; a proxy. I would recommend an Envoy proxy directly. I mean, Istio is going to be using Envoy anyways right? So, just get the proxy directly, put it in a pod, do the same thing as I mentioned before; node affinity, so it will always run on the same node, so it will go out with the same IP.
Pros: You are not installing entire service mesh control plane for one tiny thing.
Cons: Same as before, as you still have the issue of the node IP change if something goes wrong, plus you will need to manage your own Deployment object, HPA, configure the Envoy proxy, etc. instead of using Istio objects (like Gateway and a VirtualService).
Finally, I see a third option; to set up a NAT gateway outside the cluster, and configure your traffic to go through it.
Pros: You won't have to configure any kubernetes object, therefor there will be no need to set up any node affinity, therefor there will be no node overwhelming or IP change. Plus you can remove the external IP addresses from your cluster, so it will be more secure (unless you have other workloads that need to reach internet directly). Also , probably having a single node configured as NAT will be more resilient then a kubernetes pod, running in a node.
Cons: May be a little bit more complicate to set up?
And there is this general Con, that you can whitelist only 1 IP address, so you will always have a single point of failure. Even NAT gateway; it still can fail.
The GCP static IP won't help you. What is suggesting the other post is to reserve an IP address, so you can re-use it always. But it's not that you will have that IP address automatically added to a random node that goes down. Human intervention is needed. I don't think you can have one specific node to have a static IP address, and if it goes down, the new created node will pick the same IP. That service, to my knowledge, doesn't exist.
Now, GCP does offer a very resilient NAT gateway. It is managed by Google, so shouldn't fail. Not cheap though.

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

Access a specific pod from external

We have an old service discovery system that requires processes to register its ip:port during startup. On a kubernetes cluster, we exposed a service that enables NodePort. The processes within container can register to the old system with their Pod Ip:port + HostIp. For the clients within the same kubernetes cluster, they should be able to connect to the right process via specific Pod Ip:port. For an external client, it knows the HostIp+NodePort and the specific Pod Ip:port, is there an efficient way to route the client’s request to the specific Pod? Running a proxy on each node to route the traffic (nodeport -> pod) seems inefficient due to additional proxy layer.
I guess you mean you don't want to add a Service of type NodePort as for your case that seems like an additional proxy layer. I can see how it is an additional layer in your case. Typically Kubernetes would be doing the orchestration and the Service would be part of the service-discovery mechanism. It sounds like you could use hostPort. But if you do go this route you should be aware it's not suggested practice as Kubernetes is intended for orchestration.

how kubernetes service works?

Of all the concepts from Kubernetes, I find service working mechanism is the most difficult to understand
Here is what I imagine right now:
kube-proxy in each node listen to any new service/endpoint in master API controller
If there is any new service/endpoint, it adds a rule to that node's iptables
For NodePort service, external client has to access new service through one of the node's ip and NodePort. The node will forward the request to the new service IP
Is it correct? There are still a few things I'm still not clear:
Are services lying within nodes? If so, can we ssh into nodes and inspect how services work?
Are service IPs virtual IPs and only accessible within nodes?
Most of the diagrams that I see online draw services as crossing all nodes, which make it even more difficult to imagine
kube-proxy in each node listen to any new service/endpoint in master API controller
Kubernetes uses etcd to share the current cluster configuration information across all nodes (including pods, services, deployments, etc.).
If there is any new service/endpoint, it adds a rule to that node's iptables
Internally kubernetes has a so called Endpoint Controller that is responsible for modifying the DNS configuration of the virtual cluster network to make service endpoints available via DNS (and environment variables).
For NodePort service, external client has to access new service through one of the node's ip and NodePort. The node will forward the request to the new service IP
Depending on the service type additional action is taken, e.g. to make a port available on the nodes through an automatically created clusterIP service for type nodePort. Or an external load balancer is created with the cloud provider, etc.
Are services lying within nodes? If so, can we ssh into nodes and inspect how services work?
As explained, services are manifested in the cluster configuration, the endpoint controller as well as additional things, like the clusterIP services, load balancers, etc. I cannot see a need to ssh into nodes to inspect services. Typically interacting with the cluster api should be sufficient to investigate/update the service configuration.
Are service IPs virtual IPs and only accessible within nodes?
Service IPs, like POD IPs are virtual and accessible from within the cluster network. There is a global allocation map in etcd that maintains the complete list that allows allocating unique new ones. For more information on the networking model read this blog.
For more detailed information see the docs for kubernetes components and services.