I have a Stacked master K8s cluster (etcd is also local/internal) with three master and 9 worker nodes.
And my cluster version is currently 1.12.3, while going through etcd commands, i tried listing the etcd member, executing
ETCDCTL_API=3 etcdctl member list
, and found that the client Url's of master2 and master3 is wrong.
Below is the image,
As per my understanding ip for peers and client should be same, but as I can see IP is 127.0.0.1 in case of master2 and master3.
When I check the endpoint status I get below error as,
Failed to get the status of endpoint :2379 (context deadline exceeded)
while I am successfully getting the status for master1,
Could anyone please help me out in solving this.
Things I tried:
1) Edited the manifest file, etcd pods got restarted, but still nothing changed when I listed the member.
2) I have also successfully removed and added master3 in the etcd cluster, and this worked (IP's got corrected and getting the status of master3), but when I did the same for master2 getting error as
"error validating peerURLs {{ID: xyz, PeerUrls:xyz, clienturl:xyz},{&ID:xyz......}}: member count is unequal"
Editing etcd manifest file and correcting the IP worked for me.
Previously it wasn't working because there was one etcd.yml.bkp file present in the manifest folder (probably i took the backup of etcd manifest there it self before upgrading) and found that etcd pods referring to that yml file, removing that yml file from manifest folder resolved the issue.
Also found IP mentioned in the kube-apiserver.yml files was incorrect, for correcting it tried below two methods both worked:
Manually edited the file and corrected the IP
Or, We can generate a new manifest file for kube-api server executing kubeadm init
phase control-plane apiserver --kubernetes-version 1.14.5
Related
I have my kubernetes nodes on different vms . each VM has 1 kubernetes node . in total I have 7 worker nodes
While trying to create POD on 1 node I get ImagepullBackOff error while docker pull on the same node is successful .
rest of the worker nodes are working fine
My docker registry is already set as insecure-regiry in daemon.json
pls help
ImagePullBackOff is almost always a typo in the image name. Make sure you specified the name correctly.
You need to describe the Pod using: kubectl describe pod <name>. It will show a more detailed message why pulling fails.
The kubernetes service account attached to the Pod is probably not able to pull the image. The service account must have the correct ImagePullSecrets.
When no service account is configured, it uses the default service account.
kubectl get sa -o yaml
This will give a list of ImagePullSecrets attached to this service account. See if you have created the correct secret and attached it to the service account.
resolved the issue.
issue was the Container runtime. affected nodes were using containrd as runtime and I setup these nodes to access my insecure registry for containerd . everything was OK after that.
I created a k8s installed by k0s on the aws ec2 instance. In order to make delivery new cluster faster, I try to make an AMI for it.
However, I started a new ec2 instance, the internal IP changed and the node become NotReady
ubuntu#ip-172-31-26-46:~$ k get node
NAME STATUS ROLES AGE VERSION
ip-172-31-18-145 NotReady <none> 95m v1.21.1-k0s1
ubuntu#ip-172-31-26-46:~$
Would it be possible to reconfigure it ?
Work around
I found a work around to make the AWS AMI working
Short answer
install node with kubelet's --extra-args
update the kube-api to the new IP and restart the kubelet
Details :: 1
In the kubernete cluster, the kubelet plays the node agent node. It will tell kube-api "Hey, I am here and my name is XXX".
The name of a node is its hostname and could not be changed after created. It could be set by --hostname-override.
If you don't change the node name, the kube-api will try to use the hostname then got errors caused by old-node-name not found.
Details :: 2
To k0s, it put kubelet' KUBECONFIG in the /var/lib/k0s/kubelet.conf, there was a kubelet api server location
server: https://172.31.18.9:6443
In order to connect a new kube-api location, please update it
Did you check the kubelet logs? Most likely it's a problem with certificates. You cannot just make an existing node into ami and hope it will work since certificates are signed for specific IP.
Check out the awslabs/amazon-eks-ami repo on github. You can check out how aws does its k8s ami.
There is a files/bootstrap.sh file in repo that is run to bootstrap an instance. It does all sort of things that are instance specific which includes getting certificates.
If you want to "make delivery new cluster faster", I'd recommend to create an ami with all dependencies but without an actual k8s boostraping. Install the k8s (or k0s in your case) after you start the instance from ami, not before. (Or figure out how to regenerate certs and configs that are node specific.)
Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.
When I try to kubeadm reset -f, it report the etcd server can not be removed, you must remove it manually.
failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints. Please manually remove this etcd member using etcdctl
Is this a control-plane (master) node?
If not: simply running kubectl delete node <node_id> should suffice (see reference below). This will update etcd and take care of the rest of cleanup. You'll still have to diagnose what caused the node to fail to reset in the first place if you're hoping to re-add it... but that's a separate problem. See discussion e.g., here on a related issue:
If the node is hard failed and you cannot call kubeadm reset on it, it requires manual steps. you'd have to:
Remove the control-plane IP from the kubeadm-config CM ClusterStatus
Remove the etcd member using etcdctl
Delete the Node object using kubectl (if you don't want the Node around anymore)
1 and 2 apply only to control-plane nodes.
Hope this helps — if you are dealing with a master node, I'd be happy to include examples of what commands to run.
I'm trying to run kubelet with --cloud-provider=aws flag but it fails with the following error:
kubelet_node_status.go:107] Unable to register node "ip-172-28-68-69.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-69.eu-west-1.compute.internal" is forbidden: node "k8s-master.my.fqdn" cannot modify node "ip-172-28-68-69.eu-west-1.compute.internal"
I already tried to set --host-override flag to "k8s-master.my.fqdn" with no success.
(kubectl get nodes:
NAME STATUS ROLES AGE VERSION
k8s.my.fqdn Ready <none> 29m v1.8.1)
How should I start kubelet in order to successful register on/to AWS?
I solved my issue in this way:
Don't change default amazon hostname to your own because --host-override flag isn't working.
Init node like: kubeadm init --pod-network-cidr=10.233.0.0/16 --node-name=$(curl http://169.254.169.254/latest/meta-data/local-hostname) or simply use kubespray as a cluster management solution.
BTW if you want to integrate with amazon it's better to leave amazon hostname as is. Same I found in kubespray doc:
The next step is to make sure the hostnames in your inventory file are identical to your internal hostnames in AWS. This may look something like ip-111-222-333-444.us-west-2.compute.internal