Of all the concepts from Kubernetes, I find service working mechanism is the most difficult to understand
Here is what I imagine right now:
kube-proxy in each node listen to any new service/endpoint in master API controller
If there is any new service/endpoint, it adds a rule to that node's iptables
For NodePort service, external client has to access new service through one of the node's ip and NodePort. The node will forward the request to the new service IP
Is it correct? There are still a few things I'm still not clear:
Are services lying within nodes? If so, can we ssh into nodes and inspect how services work?
Are service IPs virtual IPs and only accessible within nodes?
Most of the diagrams that I see online draw services as crossing all nodes, which make it even more difficult to imagine
kube-proxy in each node listen to any new service/endpoint in master API controller
Kubernetes uses etcd to share the current cluster configuration information across all nodes (including pods, services, deployments, etc.).
If there is any new service/endpoint, it adds a rule to that node's iptables
Internally kubernetes has a so called Endpoint Controller that is responsible for modifying the DNS configuration of the virtual cluster network to make service endpoints available via DNS (and environment variables).
For NodePort service, external client has to access new service through one of the node's ip and NodePort. The node will forward the request to the new service IP
Depending on the service type additional action is taken, e.g. to make a port available on the nodes through an automatically created clusterIP service for type nodePort. Or an external load balancer is created with the cloud provider, etc.
Are services lying within nodes? If so, can we ssh into nodes and inspect how services work?
As explained, services are manifested in the cluster configuration, the endpoint controller as well as additional things, like the clusterIP services, load balancers, etc. I cannot see a need to ssh into nodes to inspect services. Typically interacting with the cluster api should be sufficient to investigate/update the service configuration.
Are service IPs virtual IPs and only accessible within nodes?
Service IPs, like POD IPs are virtual and accessible from within the cluster network. There is a global allocation map in etcd that maintains the complete list that allows allocating unique new ones. For more information on the networking model read this blog.
For more detailed information see the docs for kubernetes components and services.
Related
I have a kubernetes cluster v 1.19.4 set up with several nodes and services. Some of these services need to access an external resource residing outside the cluster. I have understood that this external resource can be defined by a service without label selector and then manually creating an Endpoints object to set the resource IP.
My question is one of convenience; the external service needs to be configured to allow connections from distinct IPs, and because of that I'd like to route all requests to this service to be handled by a specific node; so I know that the cluster-services will only communicate to this external resource via a single IP, regardless of from which node the request originates.
Is this possible by assigning nodeSelectors to the service definition in kubernetes or endpoints-object, or by some other means?
The requirement could be described with a networking likeness: I'd like to achieve what a router does for it's clients: present a single ip regardless of which client makes an outbound request to the wan.
thanks
services are a virtual thing, there is pod backing them per se so no nodes. However you can run the pod using nodeSelectors and providing hostname IP. Then you can have service communicating to that pod (and specific node) using labels
We have an old service discovery system that requires processes to register its ip:port during startup. On a kubernetes cluster, we exposed a service that enables NodePort. The processes within container can register to the old system with their Pod Ip:port + HostIp. For the clients within the same kubernetes cluster, they should be able to connect to the right process via specific Pod Ip:port. For an external client, it knows the HostIp+NodePort and the specific Pod Ip:port, is there an efficient way to route the client’s request to the specific Pod? Running a proxy on each node to route the traffic (nodeport -> pod) seems inefficient due to additional proxy layer.
I guess you mean you don't want to add a Service of type NodePort as for your case that seems like an additional proxy layer. I can see how it is an additional layer in your case. Typically Kubernetes would be doing the orchestration and the Service would be part of the service-discovery mechanism. It sounds like you could use hostPort. But if you do go this route you should be aware it's not suggested practice as Kubernetes is intended for orchestration.
I have set up an experimental local Kubernetes cluster with one master and three slave nodes. I have created a deployment for a custom service that listens on port 10001. The goal is to access an exemplary endpoint /hello with a stable IP/hostname, e.g. http://<master>:10001/hello.
After deploying the deployment, the pods are created fine and are accessible through their cluster IPs.
I understand the solution for cloud providers is to create a load balancer service for the deployment, so that you can just expose a service. However, this is apparently not supported for a local cluster. Setting up Ingress seems overkill for this purpose. Is it not?
It seems more like kube proxy is the way to go. However, when I run kube proxy --port <port> on the master node, I can access http://<master>:<port>/api/..., but not the actual pod.
There are many related questions (e.g. How to access services through kubernetes cluster ip?), but no (accepted) answers. The Kubernetes documentation on the topic is rather sparse as well, so I am not even sure about what is the right approach conceptually.
I am hence looking for a straight-forward solution and/or a good tutorial. It seems to be a very typical use case that lacks a clear path though.
If an Ingress Controller is overkill for your scenario, you may want to try using a service of type NodePort. You can specify the port, or let the system auto-assign one for you.
A NodePort service exposes your service at the same port on all Nodes in your cluster. If you have network access to your Nodes, you can access your service at the node IP and port specified in the configuration.
Obviously, this does not load balance between nodes. You can add an external service to help you do this if you want to emulate what a real load balancer would do. One simple option is to run something like rocky-cli.
An Ingress is probably your simplest bet.
You can schedule the creation of an Nginx IngressController quite simply; here's a guide for that. Note that this setup uses a DaemonSet, so there is an IngressController on each node. It also uses the hostPort config option, so the IngressController will listen on the node's IP, instead of a virtual service IP that will not be stable.
Now you just need to get your HTTP traffic to any one of your nodes. You'll probably want to define an external DNS entry for each Service, each pointing to the IPs of your nodes (i.e. multiple A/AAAA records). The ingress will disambiguate and route inside the cluster based on the HTTP hostname, using name-based virtual hosting.
If you need to expose non-HTTP services, this gets a bit more involved, but you can look in the nginx ingress docs for more examples (e.g. UDP).
We wanted podnames to be resolved to IP's to configure the seed nodes in an akka cluster. This was happenning by using the concept of a headless service and stateful sets in Kubernetes. But, how do I expose a headless service externally to hit an endpoint from outside?
It is hard to expose a Kubernetes service to the outside, since this would require some complex TCP proxies. The reason for this is, that the headless services is only a DNS record with an IP for each pod. But these IPs are only reachable from within the cluster.
One solution is to expose this via Node ports, which means the ports are opened on the host itself. Unfortunately this makes the service discovery harder, because you don't know which host has a scheduled pod on it.
You can setup node ports via:
the services: https://kubernetes.io/docs/user-guide/services/#type-nodeport
or directly in the Pod by defining spec.containers[].ports[].hostPort
Another alternative is to use a LoadBalancer, if your cloud provider supports that. Unfortunately you cannot address each instance itself, since they share the same IP. This might not be suitable for your application.
I have a Kubernetes cluster (1.3.2) in the the GKE and I'd like to connect VMs and services from my google project which shares the same network as the cluster.
Is there a way for a VM that's internal to the subnet but not internal to the cluster itself to connect to the service without hitting the external IP?
I know there's a ton of things you can do to unambiguously determine the IP and port of services, such as the ENVs and DNS...but the clusterIP is not reachable outside of the cluster (obviously).
Is there something I'm missing? An important component to this is that this is meant to be a service "public" to the project, such that I don't know which VMs on the project will want to connect to the service (this could rule out loadBalancerSourceRanges). I understand the endpoint which the services actually wraps is the internal IP I can hit, but the only good way to get to that IP is though the Kube API or kubectl, both of which are not prod-ideal ways of hitting my service.
Check out my more thorough answer here, but the most common solution to this is to create bastion routes in your GCP project.
In the simplest form, you can create a single GCE Route to direct all traffic w/ dest_ip in your cluster's service IP range to land on one of your GKE nodes. If that SPOF scares you, you can create several routes pointing to different nodes, and traffic will round-robin between them.
If that management overhead isn't something you want to do going forward, you could write a simple controller in your GKE cluster to watch the Nodes API endpoint, and make sure that you have a live bastion route to at least N nodes at any given time.
GCP internal load balancing was just released as alpha, so in the future, kube-proxy on GCP could be implemented using that, which would eliminate the need for bastion routes to handle internal services.