I created a k8s installed by k0s on the aws ec2 instance. In order to make delivery new cluster faster, I try to make an AMI for it.
However, I started a new ec2 instance, the internal IP changed and the node become NotReady
ubuntu#ip-172-31-26-46:~$ k get node
NAME STATUS ROLES AGE VERSION
ip-172-31-18-145 NotReady <none> 95m v1.21.1-k0s1
ubuntu#ip-172-31-26-46:~$
Would it be possible to reconfigure it ?
Work around
I found a work around to make the AWS AMI working
Short answer
install node with kubelet's --extra-args
update the kube-api to the new IP and restart the kubelet
Details :: 1
In the kubernete cluster, the kubelet plays the node agent node. It will tell kube-api "Hey, I am here and my name is XXX".
The name of a node is its hostname and could not be changed after created. It could be set by --hostname-override.
If you don't change the node name, the kube-api will try to use the hostname then got errors caused by old-node-name not found.
Details :: 2
To k0s, it put kubelet' KUBECONFIG in the /var/lib/k0s/kubelet.conf, there was a kubelet api server location
server: https://172.31.18.9:6443
In order to connect a new kube-api location, please update it
Did you check the kubelet logs? Most likely it's a problem with certificates. You cannot just make an existing node into ami and hope it will work since certificates are signed for specific IP.
Check out the awslabs/amazon-eks-ami repo on github. You can check out how aws does its k8s ami.
There is a files/bootstrap.sh file in repo that is run to bootstrap an instance. It does all sort of things that are instance specific which includes getting certificates.
If you want to "make delivery new cluster faster", I'd recommend to create an ami with all dependencies but without an actual k8s boostraping. Install the k8s (or k0s in your case) after you start the instance from ami, not before. (Or figure out how to regenerate certs and configs that are node specific.)
Related
I have my kubernetes nodes on different vms . each VM has 1 kubernetes node . in total I have 7 worker nodes
While trying to create POD on 1 node I get ImagepullBackOff error while docker pull on the same node is successful .
rest of the worker nodes are working fine
My docker registry is already set as insecure-regiry in daemon.json
pls help
ImagePullBackOff is almost always a typo in the image name. Make sure you specified the name correctly.
You need to describe the Pod using: kubectl describe pod <name>. It will show a more detailed message why pulling fails.
The kubernetes service account attached to the Pod is probably not able to pull the image. The service account must have the correct ImagePullSecrets.
When no service account is configured, it uses the default service account.
kubectl get sa -o yaml
This will give a list of ImagePullSecrets attached to this service account. See if you have created the correct secret and attached it to the service account.
resolved the issue.
issue was the Container runtime. affected nodes were using containrd as runtime and I setup these nodes to access my insecure registry for containerd . everything was OK after that.
Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.
Upgrade Kube-aws v1.15.5 cluster to the next version 1.16.8.
Use Case:
I want to keep the Same node label for Master and Worker nodes as I'm using in v1.15 .
When I tried to upgrade the cluster to V1.16 the --node-labels is restricted to use 'node-role'
If I keep the node role as "node-role.kubernetes.io/master" the kubelet fails to start after upgrade. if I remove the label, kubectl get node output shows none for the upgraded node.
How do I reproduce?
Before the upgrade I took a backup of 'cp /etc/sysconfig/kubelet /etc/sysconfig/kubelet-bkup' have removed "-role" from it and once the upgrade is completed, I have moved the kubelet sysconfig by replacing the edited file 'mv /etc/sysconfig/kubelet-bkup /etc/sysconfig/kubelet'. Now I could able to see the Noderole as Master/Worker even after kubelet service restart.
The Problem I'm facing now?
Though I perform the upgrade on the existing cluster successfully. The cluster is running in AWS as Kube-aws model. So, the ASG would spin up a new node whenever Cluster-Autoscaler triggers it.
But, the new node fails to join to the cluster since the node label "node-role.kubernetes.io/master" exists in the code base.
How can I add the node-role dynamically in the ASG scale-in process?. Any solution would be appreciated.
Note:
(Kubeadm, kubelet, kubectl )- v1.16.8
I have sorted out the issue. I have created a Python code that watches the node events. So whenever ASG spins up a new node, after it joins to the cluster, the node wil be having a role "" , later the python code will add a appropriate label to the node dynamically.
Also, I have created a docker image with the base of python script I created for node-label and it will run as a pod. The pod will be deployed into the cluster and it does the job of labelling the new nodes.
Ref my solution given in GitHub
https://github.com/kubernetes/kubernetes/issues/91664
I have created as a docker image and it is publicly available
https://hub.docker.com/r/shaikjaffer/node-watcher
Thanks,
Jaffer
I'm trying to run kubelet with --cloud-provider=aws flag but it fails with the following error:
kubelet_node_status.go:107] Unable to register node "ip-172-28-68-69.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-69.eu-west-1.compute.internal" is forbidden: node "k8s-master.my.fqdn" cannot modify node "ip-172-28-68-69.eu-west-1.compute.internal"
I already tried to set --host-override flag to "k8s-master.my.fqdn" with no success.
(kubectl get nodes:
NAME STATUS ROLES AGE VERSION
k8s.my.fqdn Ready <none> 29m v1.8.1)
How should I start kubelet in order to successful register on/to AWS?
I solved my issue in this way:
Don't change default amazon hostname to your own because --host-override flag isn't working.
Init node like: kubeadm init --pod-network-cidr=10.233.0.0/16 --node-name=$(curl http://169.254.169.254/latest/meta-data/local-hostname) or simply use kubespray as a cluster management solution.
BTW if you want to integrate with amazon it's better to leave amazon hostname as is. Same I found in kubespray doc:
The next step is to make sure the hostnames in your inventory file are identical to your internal hostnames in AWS. This may look something like ip-111-222-333-444.us-west-2.compute.internal
I've installed Kubernetes via Vagrant on OS X and everything seems to be working fine, but I'm unsure how kubectl is able to communicate with the master node despite being local to the workstation filesystem.
How is this implemented?
kubectl has a configuration file that specifies the location of the Kubernetes apiserver and the client credentials to authenticate to the master. All of the commands issued by kubectl are over the HTTPS connection to the apiserver.
When you run the scripts to bring up a cluster, they typically generate this local configuration file with the parameters necessary to access the cluster you just created. By default, the file is located at ~/.kube/config.
In addition to what Robert said: the connection between your local CLI and the cluster is controlled through kubectl config set, see the docs.
The Getting started with Vagrant section of the docs should contain everything you need.