OpenShift/OKD, what is the difference between deployment, service, route, ingress? - kubernetes

Could you please explain use of each "Kind" of OpenShift in a short sentences?
It is okay, that deployment contains data about, image source, pod counts, limits etc.
With the route we can determine the URL for each deployment as well as Ingress, but what is the difference and when should use route and when ingress?
And what is the exact use of service?
Thanks for your help in advance!

Your question cannot be answered simply in short words or one line answers, go through the links and explore more,
Deployment: It is used to change or modify the state of the pod. A pod can be one or more running containers or a group of duplicate pods called ReplicaSets.
Service: Each pod is given an IP address when using a Kubernetes service. The service provides accessibility, connects the appropriate pod automatically, and this address may not be directly identifiable.
Route:Similar to the Kubernetes Ingress resource, OpenShift's Route was developed with a few additional features, including the ability to split traffic between multiple backends.
Ingress: It offers routing rules for controlling who can access the services in a Kubernetes cluster.
Difference between route and ingress?
OpenShift uses HAProxy to get (HTTP) traffic into the cluster. Other Kubernetes distributions use the NGINX Ingress Controller or something similar. You can find more in this doc.
when to use route and ingress: It depends on your requirements. From the image below you can find the feature of the ingress and route and you select according to your requirements.
Exact use of service:
Each pod in a Kubernetes cluster has its own unique IP address. However, the IP addresses of the Pods in a Deployment change as they move around. Therefore, using Pod IP addresses directly is illogical. Even if the IP addresses of the member Pods change, you will always have a consistent IP address with a Service.
A Service also provides load balancing. Clients call a single, dependable IP address, and the Service's Pods distribute their requests evenly.

Related

Route traffic in kubernetes based on IP address

We have need to test a pod on our production kubernetes cluster after a data migration before we expose it to our users. What we'd like to do is route traffic from our internal ip addresses to the correct pods, and all other traffic to a maintenance pod. Is there a way we can achieve this, or do we need to briefly expose an ip address on the cluster so we can access the pods directly?
What ingress controller are you using?
Generally ingress-controllers tend to support header-based routing, rather that source ip based one.
Say, ingress-nginx supports header-based canary deployments out of box.
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#canary.
Good example here.
https://medium.com/#domi.stoehr/canary-deployments-on-kubernetes-without-service-mesh-425b7e4cc862
Note, that ingress-nginx canary implementation actually requires your services to be in different namespaces.
You could try to configure your ingress-nginx with use-forwarded-headers and try to employ X-Forwarded-For header for routing.
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#use-forwarded-headers

How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number.?

Basically I have Following Hdfs Cluster setup using docker-compose:
Node 1 with IP: 192.168.1.1 having service deployed as below:
Namenode1:9000
HMaster1: 8300
ZooKeeper1:1291
Node 2 with IP: 192.168.1.2 having service deployed as below:
Namenode2:9000
ZooKeeper2:1291
How does Traefik / Ngnix - (Ingress Controllers) forwards request to two different services having configured with same port number?
There are several great tutorials on how ingress and load balancing works in kubernetes, e.g. this one by Mark Betz. As a general rule, it helps to think in terms of services and workloads instead of specific nodes where your workloads are running on.
A workload deployed in Kubernetes (a so called Pod) has its own internal IP address, called a ClusterIP. That pod can have one or more ports open, just on that pod-owned ip address.
If you now have several pods to distribute the load, e.g. like 5 web server processes or backend logic, it would be hard for a client (inside the cluster) to keep track of all those pod IPs, because they also change when a pod is updated or just restarted due to a crash. This is why Kubernetes has a so called concept of services. Those provide a stable DNS name and IP which then transparently "forwards" to one of the healthy pods. So your client only needs to know the DNS name and not keep track of the specific pod IPs.
If you now want to expose such a service to the public, there are different ways. Either you set your service to type: LoadBalancer which then sets up some load balancer infrastructure on your cloud provider and routes traffic to the nodes and then to the pods - or - you already have an ingress controller in place and just define the routing based on host names and paths. An ingress controller itself is such a loadbalanced service with an attached cloud load balancer and also has some pods (with e.g. a traefik or nginx container) which then route your packets accordingly.
So coming back to your initial question: If you want to expose a service with several pods of the same kind, then you would first create a Service resource that matches your Pods using the selector and then you create one single ingress resource that provides a hostname/path and references this service. The ingress controller will pick up those ingress resources and configure the traefik or nginx accordingly. The ingress controller doesn't really care about the host IPs and port numbers, because it acts on the internal kubernetes ClusterIPs, so you even don't need (and shouldn't) expose such a service directly when you have an ingress in place.
I hope this answers your question regarding exposing two workloads over an ingress controller. For details, check the Kubernetes docs on Ingresses. Based on the services you named (zookeeper, hdfs) load balancing and ingresses might not be what you need for that case. Zookeeper instances should be internal in most cases and need to be adressed individually, so you might want to check out headless services, for this use case. Also check the Kubernetes docs for a way to run zookeeper.

two Loadbalancing in kubernetes

As I can see in below diagram I figure out in kubernetes we have two loadbalancer. One of them loadbalance between nodes and one of them loadbalance between pods.
If I use them both I have two loadbalancer.
Imagine some user want to connect to 10.32.0.5 the kubernetes send its request to node1(10.0.0.1) and after that send the request to pod (10.32.0.5) in nod3(10.0.0.3) but it is unuseful because the best route is to send request nod3(10.0.0.3) directly.
Why the NodePort is insufficient for load-balancing?
Why the NodePort is not LoadBalancer?(it LoadBalance between pods in different node but why we need another load balancer?)
note: I know that if I use NodePort and the node goes down it creates problem but I can say that I can use keepalived for it. The question is
why we need to loadbalance between nodes? keepalived attract all request to one IP.
Why we have two loadbalancer?
Wether you have two load-balancers depends on your setup.
In your example you have 3 nginx pods and 1 nginx service to access the pods. The service builds an abstraction layer, so you don't have to know how many pods there are and what IP addresses they have. You just have to talk to the service and it will loadbalance to one of the pods (docs).
It now depends on your setup how you access the service:
you might want to publish the service via NodePort. Then you can directly access the service on a node.
you might also publish it via LoadBalancer. This gives you another level of abstraction and the caller needs to know less about the actual setup of your cluster.
See docs for details.

What's the exactly flow chart of an outside request comes into k8s pod via Ingress?

all
I knew well about k8s' nodePort and ClusterIP type in services.
But I am very confused about the Ingress way, because how will a request come into a pod in k8s by this Ingress way?
Suppose K8s master IP is 1.2.3.4, after Ingress setup, and can connect to backend service(e.g, myservice) with a port(e.g, 9000)
Now, How can I visit this myservice:9000 outside? i.e, through 1.2.3.4? As there's no entry port on the 1.2.3.4 machine.
And many docs always said visit this via 'foo.com' configed in the ingress YAML file. But that is really funny, because xxx.com definitely needs DNS, it's not a magic to let you new-invent any xxx.com you like be a real website and can map your xxx.com to your machine!
The key part of the picture is the Ingress Controller. It's an instance of a proxy (could be nginx or haproxy or another ingress type) and runs inside the cluster. It acts as an entrypoint and lets you add more sophisticated routing rules. It reads Ingress Resources that are deployed with apps and which define the routing rules. This allows each app to say what the Ingress Controller needs to do for routing to it.
Because the controller runs inside the cluster, it needs to be exposed to the outside world. You can do this by NodePort but if you're using a cloud provider then it's more common to use LoadBalancer. This gives you an external IP and port that reaches the Ingress controller and you can point DNS entries at that. If you do point DNS at it then you have the option to use routing rules base on DNS (such as using different subdomains for different apps).
The article 'Kubernetes NodePort vs LoadBalancer vs Ingress? When should I use what?' has some good explanations and diagrams - here's the diagram for Ingress:

Multiple Host Kubernetes Ingress Controller

I've been studying Kubernetes for a few weeks now, and using the kube-lego NGINX examples (https://github.com/jetstack/kube-lego) have successfully deployed services to Kubernetes cluster using Rancher on DigitalOcean.
I've deployed sample static sites, Wordpress, Laravel, Craft CMS, etc. All of which use custom Namespaces, Deployment, Secrets, Containers with external registries, Services, and Ingress Definitions.
Using the example (lego) NGINX Ingress Controller setup, I'm able to apply DNS to the exposed IP address of my K8s cluster, and have the resulting sites appear.
What I don't know, though, is how to allow for multiple hosts to have Ingress Controllers service the same deployments, and thus provide HA Ingress to the cluster. (by applying an external load balancer service, or geo-ip, or what-have-you).
Rancher (stable) allows me to add multiple hosts, I've spun up 3 to 5 at a time, and Kubernetes is configured and deployed across all Hosts. Furthermore, I'll define many replicas and/or deployments (listed above) and they will be spread over the cluster and accessible as would be expected. I've even specified multiple replicas of the Ingress Controller, but of course they all get scheduled on the same host, giving me only one IP address of Ingress.
So how do I allow multiple hosts (each with their own public facing IP address) to allow ingress into the cluster? I've also read about setting up multiple Ingress Controllers, but then you must specify what deployment/services are being serviced by what Ingress Controller, which then totally defeats the purpose.
Maybe I'm missing something, but if K8s multi-host is supposed to provide HA, and the Host with the Ingress Controller goes down, then the service will be rescheduled on the other Hosts, but the IP address that everything is pointing to will be dead, and thus an outage. Any way to have multiple IP Addresses to the same set of deployment/services?
I investigated my setup a bit more today, and I think I found out why I was having difficulty. The "LoadBalancer" is often mentioned as for use with Cloud Providers (in both docs, and what #fiunchinho describes). I was using it with a Rancher setup, which auto creates an HA-Proxy LoadBalancer ingress for you on the hosts.
By default, it will just schedule it on one of the hosts. You can specify that you want it scheduled globally buy providing an 'annotation' of io.rancher.scheduler.global: "true".
Like so:
annotations:
# Create load balancers on every host in the environment
io.rancher.scheduler.global: "true"
http://rancher.com/docs/rancher/v1.6/en/rancher-services/load-balancer/
I preferred LoadBalancer over NodePort because I wanted the ability to send port 80 (and in the future port 443) to any of the Nodes, and have them successfully fulfil my request by inspecting the Host header, and directing as-needed.
These LBs can also be setup in the Rancher UI under the "Infrastructure Stack" menu. I have successfully removed the single LB, and re-added one with an "Always run one instance of this container on every host" option enabled.
After this was configured, I could make a request to any of the Hosts for any of the Ingresses, and get a response, no matter what host the container was scheduled on.
https://rancher.com/docs/rancher/v1.6/en/rancher-services/load-balancer/
So cool!
The ingress controller is deployed like any regular pod. That means that you can have as many replicas as you'd like, which will be spread among all your nodes.
You need a Service object that group all the pods for the ingress controller.
Then you just need to expose that Service to outside the cluster. You can do that using a LoadBalancer service if you are on a cloud provider. Or you can use just a NodePort service.
The point is that the service will balance the traffic that your ingress controller receives between all the pods that are running on different kubernetes nodes. If one of the nodes goes down, it doesn't really matter, because there are other nodes containing ingress controller pods.