In a computer cluster, I have the IP address of one of the computing nodes. This computing node has a name in Slurm configuration. How can I find the name that Slurm uses for this computing node?
The node names in slurm.conf must correspond to their hostname, as returned by the hostname -s command, and Slurm expects that those names resolve to the correct IPs.
So you should be able to run
getent hosts <IP>
to get something like
$ getent hosts 10.1.1.1
10.1.1.1 node001.cluster
In the above example, the node name as known by Slurm would be node001, which you can confirm with scontrol show node node001.
Related
I have tried to change the NIC ip of the worker node directly. It seems that the master node automatically updates the ip information of the worker node. And it does not have any negative impact on the kubernetes cluster. Is it the simple and correct way to change the worker node ip? Or are there some other important steps that I have missed?
I created a mini cluster using kubeadm with two ubuntu18.04 VMs in one public network.
Indeed changing IP address of the worker node doesn't affect the cluster at all unless new IP address doesn't interfere with --pod-network-cidr.
Kubelet is responsible for it and it uses several options:
The kubelet is the primary "node agent" that runs on each node. It can
register the node with the apiserver using one of: the hostname; a
flag to override the hostname; or specific logic for a cloud provider.
For instance if you decide to change a hostname of worker node, it will become unreachable.
There are two ways to change IP address properly:
Re-join the worker node with new IP (already changed) to the cluster
Configure kubelet to advertise specific IP address.
Last option can be done by following:
modifying /etc/systemd/system/kubelet.service.d/10-kubeadm.conf with adding KUBELET_EXTRA_ARGS=--node-ip %NEW_IP_ADDRESS%.
sudo systemctl daemon-reload since config file was changed
sudo systemctl restart kubelet.service
Useful links:
Specify internal ip for worker nodes - (it's a bit old in terms of how it's done (it should be done as I described above), however the idea is the same).
CLI tools - Kubelet
I have a single master cluster with 3 worker nodes. The master node has one network interface of 10Gb capacity and all worker nodes have two interfaces: 10Gb and 40Gb interface. They are all connected via a switch.
By default, Kubernetes binds to the default network eth0 which is 10Gb for the worker nodes. How do I specify the 40Gb interface at joining?
The kubeadm init command has a --apiserver-advertise-address argument but this is for the apiserver. Is there any equivalent option for the worker nodes so the communciation between master and worker (and between workers) are realised on the 40Gb link?
Please note that this is a bare-metal on-prem installation with OSS Kubernetes v1.20.
You can use the --hostname-override flag to override the default kubelet behavior. The default name of the kubelet equals to the hostname and it's ip address default to the interface's ip address default gateway.
For more details please visit this issue.
There is nothing specific, you would have to manage this at the routing level. If you're using BGP internally it would usually do this automatically because the faster link will have a higher metric but if you're using a simpler static routing setup then you may need to tweak things.
Pods live on internal virtual adapters so they don't listen on any physical interface (for all CNIs I know of anyway, except the AWS one).
I would like to know how internal-ip of the node in a kubernetes cluster is set. When I see in my environment it is taking its network ip as internal-ip. What if the IP address changes if the same node is reconnected from a different network. I also want to know how to set that internal ip of a node to 127.0.0.1 for running the cluster on the standalone server which will help to ship the entire cluster as ova image. Does my question makes sense? I would like know your opinions on this
I have deployed a Kubernetes cluster to GCP. For this cluster, I added some deployments. Those deployments are using external resources that protected with security policy to reject connection from unallow IP address.
So, in order to pod to connect the external resource, I need manually allow the node (who hosting the pod) IP address.
It's also possible to me to allow range of IP address, where one of my nodes are expected to be running.
Untill now, I just find their internal IP addresses range. It looks like this:
Pod address range 10.16.0.0/14
The question is how to find the range of external IP addresses for my nodes?
Let's begin with the IPs that are assigned to Nodes:
When we create a Kubernetes cluster, GCP in the backend creates compute engines machines with a specific internal and external IP address.
In your case, just go to the compute engine section of the Google Cloud Console and capture all the external IPs of the VM whose initials starts with gke-(*) and whitelist it.
Talking about the range, as such in GCP only the internal IP ranges are known and external IP address are randomly assigned from a pool of IPs hence you need to whitelist it one at a time.
To get the pod description and IPs run kubectl describe pods.
If you go to the compute engine instance page it shows the instances which make the cluster. it shows the external ips on the right side. For the the ip of the actual pods use the Kubectl command.
I have a cluster with container range 10.101.64.0/19 on a net A and subnet SA with ranges 10.101.0.0/18. On the same subnet, there is VM in GCE with IP 10.101.0.4 and it can be pinged just fine from within the cluster, e.g. from a node with 10.101.0.3. However, if I go to a pod on this node which got address 10.101.67.191 (which is expected - this node assigns addresses 10.101.67.0/24 or something), I don't get meaningful answer from that VM I want to access from this pod. Using tcpdump on icmp, I can see that when I ping that VM machine from the pod, the ping gets there but I don't receive ACK in the pod. Seems like VM is just throwing it away.
Any idea how to resolve it? Some routes or firewalls? I am using the same topology in the default subnet created by kubernetes where this work but I cannot find anything relevant which could explain this (there are some routes and firewall rules which could influence it but I wasn't successful when trying to mimic them in my subnet)
I think it is a firewall issue.
Here I've already provided the solution on Stakoverflow.
It may help to solve your case.