How to add dns entry in a remote machine based on the assigned hostname and worker node ip in kubernetes - kubernetes

I have a kubernetes cluster which has two worker nodes. Each worker node will have one pod. I have configured in helm chart, the hostname of those pods will be pod-0.test.com and pod-1.test.com. I have pointed the coredns to forward any DNS requests that matches ".com" domain to a remote machine where unbound is running which will take of actual DNS resolution.
.com:53 {
errors
cache 30
forward . <remote machine IP>
}
Let's take worker-0 node IP is 10.x.y.z and worker-1 node IP is 10.a.b.c and Let's say, pod-0.test.com sits in worker-0 and pod-1.test.com sits in worker-1. I have DNS entries configured in unbound of remote machine which will resolve as below:
pod-0.test.com -> 10.x.y.z
pod-1.test.com -> 10.a.b.c
When I uninstall pods and reinstall it, there are chances where pod-0.test.com will sit in worker-1 and pod-1.test.com will sit in worker-0. So if the pods gets swapped between worker nodes, I need to again change the unbound configuration and restart unbound. I will come to know which pod sits in which worker node only after pod gets installed but I need to have proper DNS entries in the remote machine configured in prior to this otherwise pods will gets restart when the pod hostname is resolved to wrong IP.
So what I am looking for is to overcome this issue somehow by automating to have proper DNS entries configured according to the worker node IP where the pod sits in. Is there any way to achieve this? Or Is there a possibility that pod or coredns will itself add proper DNS entry in the remote machine (which is configured in forward directive of coredns) before it is coming up like pre-install step? I need to have this pod hostname to worker node IP resolution should happen properly in both remote machine as well as inside the pods.
It would be really helpful if someone has an approach to handle this issue. Thanks in advance!

there is a work around for this issue you can use node selectors for deploying your pods on the same node. If you don’t want to do it in this way, if you are implementing this via a pipeline you can add a few steps to your pipeline for making the entries the flow goes as below.
Trigger CI/CD pipeline → Pod getting deployed → execute kubectl command for getting pods on each node → ssh into the remote machine give sudo privileges if required & change the required config files.
Use the below command to get details of pods running on a particular node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node>

Have you tried using Node Affinity. You can schedule a given pod always to the same Node using node labels. Simply you can use kubernetes.io/hostname label key to select the Node as below:
First Pod
apiVersion: v1
kind: Pod
metadata:
name: pod-0
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker1
hostname: pod-0.test.com
containers:
...
...
Second Pod
apiVersion: v1
kind: Pod
metadata:
name: pod-1
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker2
hostname: pod-1.test.com
containers:
...
...

Related

Install Kubernetes-embedded pods run only specific node

I have a Kubernetes cluster, and running 3 nodes. But I want to run my app on only two nodes. So I want to ask, Can I run other pods (Kubernetes extensions) in the Kubernetes cluster only on a single node?
node = Only Kubernetes pods
node = my app
node = my app
Yes, you can run the application POD on only two nodes and other extension Kubernetes POD on a single node.
When you say Kubernetes extension POD by that consider some external third-party PODs like Nginx ingress controller and other not default system POD like kube-proxy, kubelet, etc those should require to run each available node.
Option 1
You can use the Node affinity to schedule PODs on specific nodes.
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/hostname
operator: In
values:
- node-1
- node-2
containers:
- name: with-node-affinity
image: nginx
Option 2
You can use the taint & toleration to schedule the PODs on specific nodes.
Certain kube-system pods like kube-proxy, the CNI pods (cilium/flannel) and other daemonSet must run on each of the worker node, you can not stop them. If that is not the case for you, a node can be taint to noSchedule using below command.
kubectl taint nodes type=<a_node_label>:NoSchedule
The further enhancement you can explore https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

2-Node Cluster, Master goes down, Worker fails

We have a 2 node K3S cluster with one master and one worker node and would like "reasonable availability" in that, if one or the other nodes goes down the cluster still works i.e. ingress reaches the services and pods which we have replicated across both nodes. We have an external load balancer (F5) which does active health checks on each node and only sends traffic to up nodes.
Unfortunately, if the master goes down the worker will not serve any traffic (ingress).
This is strange because all the service pods (which ingress feeds) on the worker node are running.
We suspect the reason is that key services such as the traefik ingress controller and coredns are only running on the master.
Indeed when we simulated a master failure, restoring it from a backup, none of the pods on the worker could do any DNS resolution. Only a reboot of the worker solved this.
We've tried to increase the number of replicas of the traefik and coredns deployment which helps a bit BUT:
This gets lost on the next reboot
The worker still functions when the master is down but every 2nd ingress request fails
It seems the worker still blindly (round-robin) sends traffic to a non-existant master
We would appreciate some advice and explanation:
Should not key services such as traefik and coredns be DaemonSets by default?
How can we change the service description (e.g. replica count) in a persistent way that does not get lost
How can we get intelligent traffic routing with ingress to only "up" nodes
Would it make sense to make this a 2-master cluster?
UPDATE: Ingress Description:
kubectl describe ingress -n msa
Name: msa-ingress
Namespace: msa
Address: 10.3.229.111,10.3.229.112
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
tls-secret terminates service.ourdomain.com,node1.ourdomain.com,node2.ourdomain.com
Rules:
Host Path Backends
---- ---- --------
service.ourdomain.com
/ gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
node1.ourdomain.com
/ gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
node2.ourdomain.com
/ gateway:8443 (10.42.0.100:8443,10.42.1.115:8443)
Annotations: kubernetes.io/ingress.class: traefik
traefik.ingress.kubernetes.io/router.middlewares: msa-middleware#kubernetescrd
Events: <none>
Your goals seems can be achievable with a few K8S internal features (not specific to Traffic):
Assure you have 1 replica of Ingress Controller's Pod on each Node => use Daemon Set as a installation method
To fix the error from Ingress Description set the correct load Balancer IP of Ingress Controller's Service.
Use external Traffic Policy to "Local" - this assures that traffic is routed to local endpoints only (Controller Pads running on Node accepting traffic from Load Balancer)
externalTrafficPolicy - denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints. There are two available options: Cluster (default) and Local. Cluster obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading. Local preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type Services, but risks potentially imbalanced traffic spreading.
apiVersion: v1
kind: Service
metadata:
name: example-service
spec:
selector:
app: example
ports:
- port: 8765
targetPort: 9376
externalTrafficPolicy: Local
type: LoadBalancer
Service name of Ingress Backend should use external Traffic Policy externalTrafficPolicy: Local too.
Running single node or two node masters in k8s cluster is not recommended and it doesnt tolerate failure of master components. Consider running 3 masters in your kubernetes cluster.
Following link would be helpful -->
https://netapp-trident.readthedocs.io/en/stable-v19.01/dag/kubernetes/kubernetes_cluster_architecture_considerations.html

Kubernetes PodAffinity not able to deploy pods

So I have this problem, and try to implement podAffinity to solve it.
I have 3 nodes and want to deploy 2 pods on the same node. In the Deployment YAML files I have service:git under metadata.labels, and the following is the affinity setting:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: service
operator: In
values:
- git
topologyKey: kubernetes.io/hostname
But the pods failed to deploy, I got the following error:
0/3 nodes are available: 3 node(s) didn't match pod affinity rules, 3 node(s) didn't match pod affinity/anti-affinity.
Are there any problems with my configuration?
If not, I guess maybe it is because when the first pod is deployed, the system will try to find a node that contains a pod with the label service: git and fail (because it is the first one), and another pod also fail because of the same reason. Is this correct?
But then how to solve the problem (without resorting to workarounds)?
You are using "requiredDuringSchedulingIgnoredDuringExecution:" so it will be looking a "running(already)" pod which has a label "service: git" and itseems you do not already have any pod with that label. so following a is a quick workaround where a test pod will be created with label "service: git" . so that podAffinity rule will find a destination node ( that would be the node where this testpod will be running)
kubectl run testpod --image=busybox --labels="service=git" -- sleep infinite
Once above pod is UP .. all the pods in your deployment also should get created.
if not delete the deployment and re-apply it.
If you need a elegant solution then you can consider using "preferredDuringSchedulingIgnoredDuringExecution" instead of "requiredDuringSchedulingIgnoredDuringExecution"
Update Sept 2022:
The "requiredDuringSchedulingIgnoredDuringExecution:" is effectively ignored when it is the first pod of the deployment and the pod does in fact get scheduled - otherwise if no pods are there then not even the first one will be able to be deployed. The second pod will then obviously see that first pod running and satisfy the rule and so on. This has been confirmed by testing.

Kubernetes local cluster Pod hostPort - application not accessible

I am trying to access a web api deployed into my local Kubernetes cluster running on my laptop (Docker -> Settings -> Enable Kubernetes). The below is my Pod Spec YAML.
kind: Pod
apiVersion: v1
metadata:
name: test-api
labels:
app: test-api
spec:
containers:
- name: testapicontainer
image: myprivaterepo/testapi:latest
ports:
- name: web
hostPort: 55555
containerPort: 80
protocol: TCP
kubectl get pods shows the test-api running. However, when I try to connect to it using http://localhost:55555/testapi/index from my laptop, I do not get a response. But, I can access the application from a container in a different pod within the cluster (I did a kubectl exec -it to a different container), using the URL
http://test-api pod cluster IP/testapi/index
. Why cannot I access the application using the localhost:hostport URL?
I'd say that this is strongly not recommended.
According to k8s docs: https://kubernetes.io/docs/concepts/configuration/overview/#services
Don't specify a hostPort for a Pod unless it is absolutely necessary. When you bind a Pod to a hostPort, it limits the number of places the Pod can be scheduled, because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol.
If you only need access to the port for debugging purposes, you can use the apiserver proxy or kubectl port-forward.
If you explicitly need to expose a Pod's port on the node, consider using a NodePort Service before resorting to hostPort.
So... Is the hostPort really necessary on your case? Or a NodePort Service would solve it?
If it is really necessary , then you could try using the IP that is returning from the command:
kubectl get nodes -o wide
http://ip-from-the-command:55555/testapi/index
Also, another test that may help your troubleshoot is checking if your app is accessible on the Pod IP.
UPDATE
I've done some tests locally and understood better what the documentation is trying to explain. Let me go through my test:
First I've created a Pod with hostPort: 55555, I've done that with a simple nginx.
Then I've listed my Pods and saw that this one was running on one of my specific Nodes.
Afterwards I've tried to access the Pod in the port 55555 through my master node IP and other node IP without success, but when trying to access through the Node IP where this Pod was actually running, it worked.
So, the "issue" (and actually that's why this approach is not recommended), is that the Pod is accessible only through that specific Node IP. If it restarts and start in a different Node, the IP will also change.

Kubernetes: Dynamically identify node and taint

I have an application pod which will be deployed on k8s cluster
But as Kubernetes scheduler decides on which node this pod needs to run
Now I want to add taint to the node dynamically where my application pod is running with NOschedule so that no new pods will be scheduled on this node
I know that we can use kubectl taint node with NOschedule if I know the node name but I want to achieve this dynamically based on which node this application pod is running
The reason why I want to do this is this is critical application pod which shouldn’t have down time and for good reasons I have only 1 pod for this application across the cluster
Please suggest
In addition to #Rico answer.
You can use feature called node affinity, this is still a beta but some functionality is already implemented.
You should add a label to your node, for example test-node-affinity: test. Once this is done you can Add the nodeAffinity of field affinity in the PodSpec.
spec:
...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test-node-affinity
operator: In
values:
- test
This will mean the POD will look for a node with key test-node-affinity and value test and will be deployed there.
I recommend reading this blog Taints and tolerations, pod and node affinities demystified by Toader Sebastian.
Also familiarise yourself with Taints and Tolerations from Kubernetes docs.
You can get the node where your pod is running with something like this:
$ kubectl get pod myapp-pod -o=jsonpath='{.spec.nodeName}'
Then you can taint it:
$ kubectl taint nodes <node-name-from-above> key=value:NoSchedule
or the whole thing in one command:
$ kubectl taint nodes $(kubectl get pod myapp-pod -o=jsonpath='{.spec.nodeName}') key=value:NoSchedule