I'm setting up a 2-node Kubernetes system, following the Docker Multi-Node instructions.
My problem is that kubectl get nodes only shows the master, not the worker node as well.
The setup appears to have worked, with all the expected containers running (as far as I know)
I've confirmed that networking works via flannel.
The subnet of the work node appears in the master's subnet list.
So everything looks good, except the node isn't showing up.
My questions:
Am I right in thinking the worker node should now be visible from 'get nodes'?
Does it matter whether the MASTER_IP used to do the setup was the master node's public IP address, or the docker IP? (I've tried both..)
Where do I start with debugging this?
Any pointers gratefully accepted...
Versions:
Ubuntu Trusty 14.04 LTS on both master and worker
Kubernetes v1.1.4
hyperkube:v1.0.3
Answering my own #cloudplatform question...
It turned out to be a problem in worker.sh in Kubernetes v1.1.4.
kubectl is called with "--hostname-override=$(hostname -i)"
On this machine, that returns the IPv6 address.
The K8s code is trying to turn that into a DNS name, and fails.
So looking at the log file for the kubectl container, we see this:
I0122 15:57:33.891577 1786 kubelet.go:1942] Recording NodeReady event message for node 2001:41c9:1:41f::131
I0122 15:57:33.891599 1786 kubelet.go:790] Attempting to register node 2001:41c9:1:41f::131
I0122 15:57:33.894076 1786 kubelet.go:793] Unable to register 2001:41c9:1:41f::131 with the apiserver: Node "2001:41c9:1:41f::131" is invalid: [metadata.name: invalid value '2001:41c9:1:41f::131': must be a DNS subdomain (at most 253 characters, matching regex [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*): e.g. "example.com", metadata.labels: invalid value '2001:41c9:1:41f::131': must have at most 63 characters, matching regex (([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?: e.g. "MyValue" or ""]
So that's my problem. Take that out and it all works well.
So in answer to my 3 questions:
Yes, the worker node should be visible immediately in 'get nodes'.
I don't think it matters for getting it to work; it may matter for security reasons.
First step after checking that the basic networking is right and the containers are running: look at the log file for the new node's kubectl container.
Update: I wrote this blog post to explain how I got it working http://blog.willmer.org/2016/11/kubernetes-bytemark/
Related
I created a k8s installed by k0s on the aws ec2 instance. In order to make delivery new cluster faster, I try to make an AMI for it.
However, I started a new ec2 instance, the internal IP changed and the node become NotReady
ubuntu#ip-172-31-26-46:~$ k get node
NAME STATUS ROLES AGE VERSION
ip-172-31-18-145 NotReady <none> 95m v1.21.1-k0s1
ubuntu#ip-172-31-26-46:~$
Would it be possible to reconfigure it ?
Work around
I found a work around to make the AWS AMI working
Short answer
install node with kubelet's --extra-args
update the kube-api to the new IP and restart the kubelet
Details :: 1
In the kubernete cluster, the kubelet plays the node agent node. It will tell kube-api "Hey, I am here and my name is XXX".
The name of a node is its hostname and could not be changed after created. It could be set by --hostname-override.
If you don't change the node name, the kube-api will try to use the hostname then got errors caused by old-node-name not found.
Details :: 2
To k0s, it put kubelet' KUBECONFIG in the /var/lib/k0s/kubelet.conf, there was a kubelet api server location
server: https://172.31.18.9:6443
In order to connect a new kube-api location, please update it
Did you check the kubelet logs? Most likely it's a problem with certificates. You cannot just make an existing node into ami and hope it will work since certificates are signed for specific IP.
Check out the awslabs/amazon-eks-ami repo on github. You can check out how aws does its k8s ami.
There is a files/bootstrap.sh file in repo that is run to bootstrap an instance. It does all sort of things that are instance specific which includes getting certificates.
If you want to "make delivery new cluster faster", I'd recommend to create an ami with all dependencies but without an actual k8s boostraping. Install the k8s (or k0s in your case) after you start the instance from ami, not before. (Or figure out how to regenerate certs and configs that are node specific.)
Whenever I set up a Rancher Kubernetes cluster with RKE, the cluster sets up perfectly. However, I'm getting the following warning message:
WARN[0011] [reconcile] host [host.example.com] is a control plane node without reachable Kubernetes API endpoint in the cluster
WARN[0011] [reconcile] no control plane node with reachable Kubernetes API endpoint in the cluster found
(in the above message, the host.example.com is a placeholder for my actual host name, this message is given for each controlplane host specified in the cluster.yml)
How can I modify the RKE cluster.yml file or any other setting to avoid this warning?
I don't believe you can suppress this warning since as you indicate in your comments, the warning is valid on the first rke up command. It is only a warning, and a valid one at that, even though your configuration appears to have a handle on that. If you are worried about the logs, you could perhaps have your log aggregation tool ignore the warning if it is in close proximity to the initial rke up command, or even filter it out. However, I would think twice about filtering blindly on it as it would indicate a potential issue (if, for example, you thought the control plane containers were already running).
I have an weave network plugin.
inside my folder /etc/cni/net.d there is a 10-weave.conf
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
}
My weave pods are running and the dns pod is also running
But when i want to run a pod like a simple nginx wich will pull an nginx image
The pod stuck at container creating , describe pod gives me the error , failed create pod sandbox.
When i run journalctl -u kubelet i get this error
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
is my network plugin not good configured ?
i used this command to configure my weave network
kubectl apply -f https://git.io/weave-kube-1.6
After this won't work i also tried this command
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
I even tried flannel and that gives me the same error.
The system i am setting kubernetes on is a raspberry pi.
I am trying to build a raspberry pi cluster with 3 nodes and 1 master with kubernetes
Dose anyone have ideas on this?
Thank you all for responding to my question. I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great are IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
Looking at the pertinent code in Kubernetes and in CNI, the specific error you see seems to indicate that it cannot find any files ending in .json, .conf or .conflist in the directory given.
This makes me think it could be something as the conf file not being present on all the hosts, so I would verify that as a first step.
I'm Deploying a k8s cluster on CentOS with version of v1.5.1
Having three nodes:
kube-01(master)
kube-02
kube-03
Having a deployment with one pod, named Deployment-A with a pod ip Pod-A-IP, deployed on kube-03
Having a deployment with two pods, named Deployment-B. Each work node has one pod. We call Pod-B-02 on kube-02 and Pod-B-03 on kube-03
Exposing Deployment-A using type NodePort, I have a Cluster IP Service-A-IP
Pod-B-02 access Service-A-IP, OK
Pod-B-03 access Service-A-IP, time out
kube-02 access Service-A-IP, OK
kube-03 access Service-A-IP, OK
It seems that accessing a Service from pods in same Node with the Service's backend will encounter this problem.
Updated on Mon Feb 20 16:22:47 CST 2017
I have captured the network traffic on Pod-B-03
10.244.1.10 is the pod ip of Pod-B-03
10.107.25.245 is Service-A-IP
10.244.1.2 is Pod-A-IP
I'm using flannel. I suggested there is something wrong with flannel?
The problem you describe is something that I had in the past if I remember correct...but I had many network problems with many different error sources. If it is really the same issue, then probably setting net.bridge.bridge-nf-call-iptables and net.bridge.bridge-nf-call-ip6tables to 1 may help. You could try this on all hosts first:
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
Then check the service networking again without rebooting your machine. If this helps, persist the change into /etc/sysctl.conf or /etc/sysctl.d/
Kubernetes network problems tend to have uncountable sources of errors, making it really hard to debug this without enough information. It would be good if you could provide some additional information about how you did setup the cluster (kube-up, kargo, kops, kubeadm, ...), which cloud you use (or bare-metal?) and which network solution you chose (weave, calico, cloud provider based, ...)
It may also help to see the output of iptables -L -t nat and the kube-proxy logs as most service related problems can be debugged with this information.
EDIT I've just found the Kubernetes issue where I had this solution from: https://github.com/kubernetes/kubernetes/issues/33798
I have new setup of Kubernetes and I created replication with 2. However what I see when I do " kubectl get pods' is that one is running another is "pending". Yet when I go to my 7 test nodes and do docker ps I see that all of them are running.
What I think is happening is that I had to change the default insecure port from 8080 to 7080 (the docker app actually runs on 8080), however I don't know how to tell if I am right, or where else to look.
Along the same vein, is there any way to setup config for kubectl where I can specify the port. Doing kubectl --server="" is a bit annoying (yes I know I can alias this).
If you changed the API port, did you also update the nodes to point them at the new port?
For the kubectl --server=... question, you can use kubectl config set-cluster to set cluster info in your ~/.kube/config file to avoid having to use --server all the time. See the following docs for details:
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_set-cluster.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_set-context.html
http://kubernetes.io/v1.0/docs/user-guide/kubectl/kubectl_config_use-context.html