Is it possible to configure a pod to prioritize using `hostNetwork` but still reference internal service endpoints? - kubernetes

I have a statefulset that I need to run using the host network, purely for performance reasons. But I also want to be able to reference service-name endpoints. Is it possible to do this? ClusterFirstWithHostNet does not work because it doesn't prioritize using the host's network. The dnsConfig configuration might be promising, but I don't know how I would configure it to do what I'm asking about.

This is a community wiki answer. Feel free to expand it.
It might be possible if the app can select random port to listen during start and change if port is busy. However, Kubernetes is not involved in the selecting port for the application.
Statefulset requires a headless service, so it doesn't have an IP and works as a set of DNS records in coredns. A record would probably contain the same IP for the replicas on the same node, but SRV record may actually provide a proper endpoint.
For further reference, please take a look at the below sources:
How do I get individual pod hostnames in a Deployment registered and looked up in Kubernetes?
SRV records

Related

Change Kubernetes Instance Template to open HTTPS port

I was using NodePort to host a webapp on Google Container Engine (GKE). It allows you to directly point your domains to the node IP address, instead of an expensive Google load balancer. Unfortunately, instances are created with HTTP ports blocked by default, and an update locked down manually changing the nodes, as they are now created using and Instance Group/and an Immutable Instance Template.
I need to open port 443 on my nodes, how do I do that with Kubernetes or GCE? Preferably in an update resistant way.
Related github question: https://github.com/nginxinc/kubernetes-ingress/issues/502
Using port 443 on your Kubernetes nodes is not a standard practice. If you look at the docs you and see the kubelet option --service-node-port-range which defaults to 30000-32767. You could change it to 443-32767 or something. Note that every port under 1024 is restricted to root.
In summary, it's not a good idea/practice to run your Kubernetes services on port 443. A more typical scenario would be an external nginx/haproxy proxy that sends traffic to the NodePorts of your service. The other option you mentioned is using a cloud load balancer but you'd like to avoid that due to costs.
Update: A deamonset with a nodeport can handle the port opening for you. nginx/k8s-ingress has a nodeport on 443 which gets exposed by a custom firewall rule. the GCE UI will not show「Allow HTTPS traffic」as checked, because its not using the default rule.
You can do everything you do on the GUI Google Cloud Console using the Cloud SDK, most easily through the Google Cloud Shell. Here is the command for adding a network tag to a running instance. This works, even though the GUI disabled the ability to do so
gcloud compute instances add-tags gke-clusty-pool-0-7696af58-52nf --zone=us-central1-b --tags https-server,http-server
This also works on the beta, meaning it should continue to work for a bit.
See https://cloud.google.com/sdk/docs/scripting-gcloud for examples on how to automate this. Perhaps consider running on a webhook when downtime is detected. Obviously none of this is ideal.
Alternatively, you can change the templates themselves. With this method you can also add a startup to new nodes, which allows you do do things like fire a webhook with the new IP Address for a round robin low downtime dynamic dns.
Source (he had the opposite problem, his problem is our solution): https://stackoverflow.com/a/51866195/370238
If I understand correctly, if nodes can be destroyed and recreated themselves , how are you going to rest assured that certain service behind port reliably available on production w/o any sort of load balancer which takes care of route orchestration diverting port traffic to new node(s)

Expose each pod in a statefulset to the internet without a custom proxy

I have a StatefulSet with pods server-0, server-1, etc. I want to expose them directly to the internet with URLs like server-0.mydomain.com or like mydomain.com/server-0.
I want to be able to scale the StatefulSet and automatically be able to access the new pods from the internet. For example, if I scale it up to include a server-2, I want mydomain.com/server-2 to route requests to the new pod when it's ready. I don't want to have to also scale some other resource or create another Service to achieve that effect.
I could achieve this with a custom proxy service that just checks the request path and forwards to the correct pod internally, but this seems error-prone and wasteful.
Is there a way to cause an Ingress to automatically route to different pods within a StatefulSet, or some other built-in technique that would avoid custom code?
I don't think you can do it. Being part of the same statefulSet, all pods up to pod-x, are targeted by a service. As you can't define which pod is going to get a request, you can't force "pod-1.yourapp.com" or "yourapp.com/pod-1" to be sent to pod-1. It will be sent to the service, and the service might sent it to pod-4.
Even though if you could, you would need to dynamically update your ingress rules, which can cause a downtime of minutes, easily.
With the custom proxy, I see it impossible too. Note that it would need to basically replace the service behind the pods. If your ingress controller knows that it needs to deliver a packet to a service, now you have to force it to deliver to your proxy. But how?
A Kubernetes service is a set of iptables (or IPVS) rules that will redirect a packet with the ServiceIP as a destination address to ONE OF THE PODS that have the same label.
from Kubernetes Services documentation
The service installs iptables rules which select a backend Pod. By default, the choice of backend is random.
Which refers to the fact that a service is not able to distinguish between different pods in the same set.
If you want to force the selection of a specific Pod out of the set by changing the iprules (fairly simple), or by adding any type of proxy is problematic:
let's say you configured pod-1 and pod-2 (1.1.1.1 and 1.1.1.2 respectively), and you configured iptables rules to DNAT requests with destination pod-1.myserver.com to 1.1.1.1 and same for pod-2. (you may ask why the IP, and it's simply because it's the only way to distinguish between these pods)
This approach will fail whenever a pod restarts, let's say pod-1 failed, Kubernetes won't recreate the same pod with same IP and name, instead will create pod-3 with a different IP and updates the iptables accordingly. As a result, all the packets going toward 1.1.1.1 will be dropped until you update the proxy or iptables again.
In fact, that's one of the reasons why we use service to access pods instead of accessing them directly since the Pod IP can change however the service IP won't.
However, since this very specific part of kubernetes was my work for the last 4 months, I have developed a python script to edit the iptables and to choose a specific pod, my conclusion of that work was it's costy and time-consuming and will impose the server to go offline for a couple of seconds when the pods are changed, you can take a look at the code, it definitely works but its not recommended.
This problem is a kubernetes problem and the solution is changing the source code of Kube-proxy, which is my current work.
I suggest you read my answer explaining how kubernetes services exactly work in this question: Which service is doing load balancing between kubernetes nodes?

How can I do port discovery with Kubernetes service discovery?

I have an HPC cluster application where I am looking to replace MPI and our internal cluster management software with a combination of Kubernetes and some middleware, most likely ZMQ or RabbitMQ.
I'm trying to design how best to do peer discovery on this system using Kubernetes' service discovery.
I know Kubernetes can provide a DNS name for a given service, and that's great, but is there a way to also dynamically discover ports?
For example, assuming I replaced the MPI middleware with ZeroMQ, I would need a way for ranks (processes on the cluster) to find each other. I know I could simply have the ranks issue service creation messages to the Kubernetes discovery mechanism and get a hostname like myapp_mypid_rank_42 fairly easily, but how would I handle the port?
If possible, it would be great if I could just do:
zmqSocket.connect("tcp://myapp_mypid_rank_42");
but I don't think that would work since I have no port number information from DNS.
How can I have Kubernetes service discovery also provide a port in as simple a manner as possible to allow ranks in the cluster to discover each other?
Note: The registering process knows its port and can register it with the K8s service discovery daemon. The problem is a quick and easy way to get that port number back for the processes that want it. The question I'm asking is whether or not there is a mechanism as simple as a DNS host name, or will I need to explicitly query both hostname and port number from the k8s daemon rather than simply building a hostname based on some agreed upon rule (like building a string from myapp_mypid_myrank)?
Turns out the best way to do this is with a DNS SRV record:
https://kubernetes.io/docs/concepts/services-networking/service/#discovering-services
https://en.wikipedia.org/wiki/SRV_record
A DNS SRV record provides both a hostname/IP and a port for a given request.
Luckily, Kubernetes service discovery supports SRV records and provides them on the cluster's DNS.
I think in the most usual case you should know the port number to access your services.
But if it is useful, Kubernetes add some environment variables to every pod to ease autodiscovery of all services. For example {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT. Docs here

How to access Kubernetes pod in local cluster?

I have set up an experimental local Kubernetes cluster with one master and three slave nodes. I have created a deployment for a custom service that listens on port 10001. The goal is to access an exemplary endpoint /hello with a stable IP/hostname, e.g. http://<master>:10001/hello.
After deploying the deployment, the pods are created fine and are accessible through their cluster IPs.
I understand the solution for cloud providers is to create a load balancer service for the deployment, so that you can just expose a service. However, this is apparently not supported for a local cluster. Setting up Ingress seems overkill for this purpose. Is it not?
It seems more like kube proxy is the way to go. However, when I run kube proxy --port <port> on the master node, I can access http://<master>:<port>/api/..., but not the actual pod.
There are many related questions (e.g. How to access services through kubernetes cluster ip?), but no (accepted) answers. The Kubernetes documentation on the topic is rather sparse as well, so I am not even sure about what is the right approach conceptually.
I am hence looking for a straight-forward solution and/or a good tutorial. It seems to be a very typical use case that lacks a clear path though.
If an Ingress Controller is overkill for your scenario, you may want to try using a service of type NodePort. You can specify the port, or let the system auto-assign one for you.
A NodePort service exposes your service at the same port on all Nodes in your cluster. If you have network access to your Nodes, you can access your service at the node IP and port specified in the configuration.
Obviously, this does not load balance between nodes. You can add an external service to help you do this if you want to emulate what a real load balancer would do. One simple option is to run something like rocky-cli.
An Ingress is probably your simplest bet.
You can schedule the creation of an Nginx IngressController quite simply; here's a guide for that. Note that this setup uses a DaemonSet, so there is an IngressController on each node. It also uses the hostPort config option, so the IngressController will listen on the node's IP, instead of a virtual service IP that will not be stable.
Now you just need to get your HTTP traffic to any one of your nodes. You'll probably want to define an external DNS entry for each Service, each pointing to the IPs of your nodes (i.e. multiple A/AAAA records). The ingress will disambiguate and route inside the cluster based on the HTTP hostname, using name-based virtual hosting.
If you need to expose non-HTTP services, this gets a bit more involved, but you can look in the nginx ingress docs for more examples (e.g. UDP).

kube2sky in kubernetes with multiple api servers

It a Kubernetes cluster where everything is highly available, the DNS is a key piece of the system, everything relies on the DNS.
The pod kube2sky has a parameter "-kube_master_url" where, afaik, you can only specify one api server node.
You might have multiple api servers for redundancy behing a service, but if the one that kube2sky is using gets down, the whole DNS system gets down too, hence, the highly availabily of the cluster is gone.
For other pods, you can use the internal DNS name of the api server service, but in this case, you can't since this is the actual DNS service.
Any idea how to solve this issue?
In its standard configuration, kube2sky doesn't actually rely on having a single apiserver IP address to use. Instead, it uses the virtual IP of the kubernetes service that gets auto-created in every cluster, and which the kube-proxy sets up iptables rules for. It's briefly explained in the docs on github.
Also, it's recommended that replicated masters are put behind a load balancer in such high-availability configurations to avoid problems like this with client tools.