Kubernetes service discovery, CNIs and Istio differences - kubernetes

I was making some research about how K8s resolves the services using the clusterIP services and how CNIs like WeaveNet or how service meshes like Istio provide additional features to this functionality. However, I'm new on the topic and I'd like to share here what I've found to see if somebody can expand and correct my points:
Istiod has a service registry. This service registry is filled with the entries coming from K8s services clusterIPs (which in turn is the service registry of K8s) and other possible external services defined with Kind: ServiceEntry
(see seciton 5.5 of book istio in action)
This service registry is then mixed with more information about virtualservices and destination rules. These new/added K8s kinds are CRDs from Istio. They are what give the features of L7 load balancing that allow to distribute traffic by HTTP headers or URI path.
Without Istio, K8s has different (3) ways to implement the clusterIPs services concept. This services provide load balancing at L4.
https://kubernetes.io/docs/concepts/services-networking/service/
The most extended one nowadays is the iptables proxy mode. The iptables of the Linux machine are populated in bases of what theh kube-proxy provides. Kube-proxy gets those data from the kube-apiserver and (problably the core-dns). The kube-apisever will in turn consult the etcd database to know about the k8s clusterIP services. The entry of the iptables is populated with a the clusterIP->pod IP with only one pod IP out of the many pod that a deployment behind the clusterIP could be.
Any piece of code/application inside of the container could make calls directy to the kube-apiserver if using the correct authentication and get the pod address but that would be not practic
K8s can use CNIs (container network interfaces). One example of this would be Weavenet.
https://www.weave.works/docs/net/latest/overview/
Wevenet creates a new layer 2 network using Linux kernel features. One daemon sets up this L2 network and manages the routing between machines and there are various ways to attach machines to the network.
In this network the containers can be exposed to the outside world.
Weavenet implements a micro DNS server at each node. You simply name containers and the routing just can work without the use of services, including the load balancing across multiple continers with the same name.

Related

What is a Kubernetes LoadBalancer On-Prem

I understand that, in Cloud scenarios, a LoadBalancer resource refers to and provisions an external layer 4 load balancer. There is some proprietary integration by which the cluster configures this load balancing. OK.
On-prem we have no such integration so we create our own load balancer outside of the cluster which distributes traffic to all nodes - effectively, in our case to the ingress.
I see no need for the cluster to know anything about this external load balancer.
What is a LoadBalancer* then in an on-prem scenario? Why do I have them? Why do I need one? What does it do? What happens when I create one? What role does the LoadBalancer resource play in ingress traffic. What effect does the LoadBalancer have outside the cluster? Is a LoadBalancer just a way to get a new IP address? How/why/ where does the IP point to and where does the IP come from?
All questions refer to the Service of type “LoadBalancer” inside cluster and not my load balancer outside the cluster of which the cluster has no knowledge.
As pointed out in the comments a kubernetes service of type LoadBalancer can not be used by default with on-prem setups. You can use metallb to setup a service of that type in an on prem environment.
Kubernetes does not offer an implementation of network load balancers (Services of type LoadBalancer) for bare-metal clusters. [...] If you’re not running on a supported IaaS platform (GCP, AWS, Azure…), LoadBalancers will remain in the “pending” state indefinitely when created. [...] MetalLB aims to redress this imbalance by offering a network load balancer implementation that integrates with standard network equipment, so that external services on bare-metal clusters also “just work” as much as possible.
You can for example use the BGP mode to advertise the service's IP to your router, read more on that in the docs.
The project is still in beta but is promoted as production ready and used by several bigger companies.
Edit
Regarding your question in the comments:
Can I just broadcast the MAC address of my node and manually add the IP I am broadcasting to the LoadBalancer service via kubectl edit?
Yes that would work too. That's basically what metallb does, announcing the IP and updating the service.
Why need a software then? Imaging having 500 hosts that come and go with thousends of services of type LoadBalancer that come and go. You need an automation here.
Why does Kubernetes need to know this IP?
It doesn't. If you don't use an external ip, the service is still usable via it's NodePort, see for example the istio docs (with a little more details added by me):
$ kubectl get svc istio-ingressgateway -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
istio-ingressgateway LoadBalancer 192.12.129.119 <Pending> [...],80:32123/TCP,443:30994/TCP,[...]
Here the external IP is not set and stays in <Pending>. You can still use the service by pointing your traffic to <Node-IP>:32123 for plain http and to <Node-IP>:30994 for https. As you can see above those ports are mapped to 80 and 443.
If the external ip is set you can direct traffic directly to port 80 and 443 on the external load balancer. Kube-Proxy will create an iptables chain with the destination of you external ip, that basically leads from the external IP over the service ip with a load balancer configuration to a pod ip.
To investigate that set up a service of type LoadBalancer, make sure it has an external ip, connect to the host and run the iptables-save command (e.g. iptables-save | less). Search for the external ip and follow the chain until you end up at the pod.

How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number.?

Basically I have Following Hdfs Cluster setup using docker-compose:
Node 1 with IP: 192.168.1.1 having service deployed as below:
Namenode1:9000
HMaster1: 8300
ZooKeeper1:1291
Node 2 with IP: 192.168.1.2 having service deployed as below:
Namenode2:9000
ZooKeeper2:1291
How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number?
There are several great tutorials on how ingress and load balancing works in kubernetes, e.g. this one by Mark Betz. As a general rule, it helps to think in terms of services and workloads instead of specific nodes where your workloads are running on.
A workload deployed in Kubernetes (a so called Pod) has its own internal IP address, called a ClusterIP. That pod can have one or more ports open, just on that pod-owned ip address.
If you now have several pods to distribute the load, e.g. like 5 web server processes or backend logic, it would be hard for a client (inside the cluster) to keep track of all those pod IPs, because they also change when a pod is updated or just restarted due to a crash. This is why Kubernetes has a so called concept of services. Those provide a stable DNS name and IP which then transparently "forwards" to one of the healthy pods. So your client only needs to know the DNS name and not keep track of the specific pod IPs.
If you now want to expose such a service to the public, there are different ways. Either you set your service to type: LoadBalancer which then sets up some load balancer infrastructure on your cloud provider and routes traffic to the nodes and then to the pods - or - you already have an ingress controller in place and just define the routing based on host names and paths. An ingress controller itself is such a loadbalanced service with an attached cloud load balancer and also has some pods (with e.g. a traefik or nginx container) which then route your packets accordingly.
So coming back to your initial question: If you want to expose a service with several pods of the same kind, then you would first create a Service resource that matches your Pods using the selector and then you create one single ingress resource that provides a hostname/path and references this service. The ingress controller will pick up those ingress resources and configure the traefik or nginx accordingly. The ingress controller doesn't really care about the host IPs and port numbers, because it acts on the internal kubernetes ClusterIPs, so you even don't need (and shouldn't) expose such a service directly when you have an ingress in place.
I hope this answers your question regarding exposing two workloads over an ingress controller. For details, check the Kubernetes docs on Ingresses. Based on the services you named (zookeeper, hdfs) load balancing and ingresses might not be what you need for that case. Zookeeper instances should be internal in most cases and need to be adressed individually, so you might want to check out headless services, for this use case. Also check the Kubernetes docs for a way to run zookeeper.

Exposing service to the internet from a bare metal kubernetes cluster

I'm running a Kuberenets with 1 master and 2 slaves. I have a deployment and service pointing to it with type of NodePort. I'm able to access the service from the workers themselves, but I want to expose the service in a way it will load balance between the workers and without specifying a port. I'm running on bare-metal, so I can't expose the service as a LoadBalancer and use google/amazon load balancing.
How can I do that?
You can use metalLB which hooks into your Kubernetes cluster, and provides a network load-balancer implementation. In short, it allows you to create Kubernetes services of type LoadBalancer in clusters that don’t run on a cloud provider, and thus cannot simply hook into paid products to provide load-balancers.
It has two features that work together to provide this service: address allocation, and external announcemen
MetalLB requires the following to function:
A Kubernetes cluster, running Kubernetes 1.13.0 or later, that does not already have network load-balancing functionality.
A cluster network configuration that can coexist with MetalLB.
Some IPv4 addresses for MetalLB to hand out.
Depending on the operating mode, you may need one or more routers capable of speaking BGP.

Multiple Host Kubernetes Ingress Controller

I've been studying Kubernetes for a few weeks now, and using the kube-lego NGINX examples (https://github.com/jetstack/kube-lego) have successfully deployed services to Kubernetes cluster using Rancher on DigitalOcean.
I've deployed sample static sites, Wordpress, Laravel, Craft CMS, etc. All of which use custom Namespaces, Deployment, Secrets, Containers with external registries, Services, and Ingress Definitions.
Using the example (lego) NGINX Ingress Controller setup, I'm able to apply DNS to the exposed IP address of my K8s cluster, and have the resulting sites appear.
What I don't know, though, is how to allow for multiple hosts to have Ingress Controllers service the same deployments, and thus provide HA Ingress to the cluster. (by applying an external load balancer service, or geo-ip, or what-have-you).
Rancher (stable) allows me to add multiple hosts, I've spun up 3 to 5 at a time, and Kubernetes is configured and deployed across all Hosts. Furthermore, I'll define many replicas and/or deployments (listed above) and they will be spread over the cluster and accessible as would be expected. I've even specified multiple replicas of the Ingress Controller, but of course they all get scheduled on the same host, giving me only one IP address of Ingress.
So how do I allow multiple hosts (each with their own public facing IP address) to allow ingress into the cluster? I've also read about setting up multiple Ingress Controllers, but then you must specify what deployment/services are being serviced by what Ingress Controller, which then totally defeats the purpose.
Maybe I'm missing something, but if K8s multi-host is supposed to provide HA, and the Host with the Ingress Controller goes down, then the service will be rescheduled on the other Hosts, but the IP address that everything is pointing to will be dead, and thus an outage. Any way to have multiple IP Addresses to the same set of deployment/services?
I investigated my setup a bit more today, and I think I found out why I was having difficulty. The "LoadBalancer" is often mentioned as for use with Cloud Providers (in both docs, and what #fiunchinho describes). I was using it with a Rancher setup, which auto creates an HA-Proxy LoadBalancer ingress for you on the hosts.
By default, it will just schedule it on one of the hosts. You can specify that you want it scheduled globally buy providing an 'annotation' of io.rancher.scheduler.global: "true".
Like so:
annotations:
# Create load balancers on every host in the environment
io.rancher.scheduler.global: "true"
http://rancher.com/docs/rancher/v1.6/en/rancher-services/load-balancer/
I preferred LoadBalancer over NodePort because I wanted the ability to send port 80 (and in the future port 443) to any of the Nodes, and have them successfully fulfil my request by inspecting the Host header, and directing as-needed.
These LBs can also be setup in the Rancher UI under the "Infrastructure Stack" menu. I have successfully removed the single LB, and re-added one with an "Always run one instance of this container on every host" option enabled.
After this was configured, I could make a request to any of the Hosts for any of the Ingresses, and get a response, no matter what host the container was scheduled on.
https://rancher.com/docs/rancher/v1.6/en/rancher-services/load-balancer/
So cool!
The ingress controller is deployed like any regular pod. That means that you can have as many replicas as you'd like, which will be spread among all your nodes.
You need a Service object that group all the pods for the ingress controller.
Then you just need to expose that Service to outside the cluster. You can do that using a LoadBalancer service if you are on a cloud provider. Or you can use just a NodePort service.
The point is that the service will balance the traffic that your ingress controller receives between all the pods that are running on different kubernetes nodes. If one of the nodes goes down, it doesn't really matter, because there are other nodes containing ingress controller pods.

Routing traffic to kubernetes cluster

I have a question related to Kubernetes networking.
I have a microservice (say numcruncherpod) running in a pod which is serving requests via port 9000, and I have created a corresponding Service of type NodePort (numcrunchersvc) and node port which this service is exposed is 30900.
My cluster has 3 nodes with following IPs:
192.168.201.70,
192.168.201.71
192.168.201.72
I will be routing the traffic to my cluster via reverse proxy (nginx). As I understand in nginx I need to specify IPs of all these cluster nodes to route the traffic to the cluster, is my understanding correct ?
My worry is since nginx won't have knowledge of cluster it might not be a good judge to decide the cluster node to which the traffic should be sent to. So is there a better way to route the traffic to my kubernetes cluster ?
PS: I am not running the cluster on any cloud platform.
This answer is a little late, and a little long, so I ask for forgiveness before I begin. :)
For people not running kubernetes clusters on Cloud Providers there are 4 distinct options for exposing services running inside the cluster to the world outside.
Service of type: NodePort. This is the simplest and default. Kubernetes assigns a random port to your service. Every node in the cluster listens for traffic to this particular port and then forwards that traffic to any one of the pods backing that service. This is usually handled by kube-proxy, which leverages iptables and load balances using a round-robin strategy. Typically since the UX for this setup is not pretty, people often add an external "proxy" server, such as HAProxy, Nginx or httpd to listen to traffic on a single IP and forward it to one of these backends. This is the setup you, OP, described.
A step up from this would be using a Service of type: ExternalIP. This is identical to the NodePort service, except it also gets kubernetes to add an additional rule on all kubernetes nodes that says "All traffic that arrives for destination IP == must also be forwarded to the pods". This basically allows you to specify any arbitrary IP as the "external IP" for the service. As long as traffic destined for that IP reaches one of the nodes in the cluster, it will be routed to the correct pod. Getting that traffic to any of the nodes however, is your responsibility as the cluster administrator. The advantage here is that you no longer have to run an haproxy/nginx setup, if you specify the IP of one of the physical interfaces of one of your nodes (for example one of your master nodes). Additionally you cut down the number of hops by one.
Service of type: LoadBalancer. This service type brings baremetal clusters at parity with cloud providers. A fully functioning loadbalancer provider is able to select IP from a pre-defined pool, automatically assign it to your service and advertise it to the network, assuming it is configured correctly. This is the most "seamless" experience you'll have when it comes to kubernetes networking on baremetal. Most of LoadBalancer provider implementations use BGP to talk and advertise to an upstream L3 router. Metallb and kube-router are the two FOSS projects that fit this niche.
Kubernetes Ingress. If your requirement is limited to L7 applications, such as REST APIs, HTTP microservices etc. You can setup a single Ingress provider (nginx is one such provider) and then configure ingress resources for all your microservices, instead of service resources. You deploy your ingress provider and make sure it has an externally available and routable IP (you can pin it to a master node, and use the physical interface IP for that node for example). The advantage of using ingress over services is that ingress objects understand HTTP mircoservices natively and you can do smarter health checking, routing and management.
Often people combine one of (1), (2), (3) with (4), since the first 3 are L4 (TCP/UDP) and (4) is L7. So things like URL path/Domain based routing, SSL Termination etc is handled by the ingress provider and the IP lifecycle management and routing is taken care of by the service layer.
For your use case, the ideal setup would involve:
A deployment for your microservice, with health endpoints on your pod
An Ingress provider, so that you can tweak/customize your routing/load-balancing as well as use for SSL termination, domain matching etc.
(optional): Use a LoadBalancer provider to front your Ingress provider, so that you don't have to manually configure your Ingress's networking.
Correct. You can route traffic to any or all of the K8 minions. The K8 network layer will forward to the appropriate minion if necessary.
If you are running only a single pod for example, nginx will most likely round-robin the requests. When the requests hit a minion which does not have the pod running on it, the request will be forwarded to the minion that does have the pod running.
If you run 3 pods, one on each minion, the request will be handled by whatever minion gets the request from nginx.
If you run more than one pod on each minion, the requests will be round-robin to each minion, and then round-robin to each pod on that minion.