Kubernetes "Unable to register node" with cloud-provider=aws - kubernetes

I'm trying to run kubelet with --cloud-provider=aws flag but it fails with the following error:
kubelet_node_status.go:107] Unable to register node "ip-172-28-68-69.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-69.eu-west-1.compute.internal" is forbidden: node "k8s-master.my.fqdn" cannot modify node "ip-172-28-68-69.eu-west-1.compute.internal"
I already tried to set --host-override flag to "k8s-master.my.fqdn" with no success.
(kubectl get nodes:
NAME STATUS ROLES AGE VERSION
k8s.my.fqdn Ready <none> 29m v1.8.1)
How should I start kubelet in order to successful register on/to AWS?

I solved my issue in this way:
Don't change default amazon hostname to your own because --host-override flag isn't working.
Init node like: kubeadm init --pod-network-cidr=10.233.0.0/16 --node-name=$(curl http://169.254.169.254/latest/meta-data/local-hostname) or simply use kubespray as a cluster management solution.
BTW if you want to integrate with amazon it's better to leave amazon hostname as is. Same I found in kubespray doc:
The next step is to make sure the hostnames in your inventory file are identical to your internal hostnames in AWS. This may look something like ip-111-222-333-444.us-west-2.compute.internal

Related

not able to pull image in POD ,getting ImagePullBackOff

I have my kubernetes nodes on different vms . each VM has 1 kubernetes node . in total I have 7 worker nodes
While trying to create POD on 1 node I get ImagepullBackOff error while docker pull on the same node is successful .
rest of the worker nodes are working fine
My docker registry is already set as insecure-regiry in daemon.json
pls help
ImagePullBackOff is almost always a typo in the image name. Make sure you specified the name correctly.
You need to describe the Pod using: kubectl describe pod <name>. It will show a more detailed message why pulling fails.
The kubernetes service account attached to the Pod is probably not able to pull the image. The service account must have the correct ImagePullSecrets.
When no service account is configured, it uses the default service account.
kubectl get sa -o yaml
This will give a list of ImagePullSecrets attached to this service account. See if you have created the correct secret and attached it to the service account.
resolved the issue.
issue was the Container runtime. affected nodes were using containrd as runtime and I setup these nodes to access my insecure registry for containerd . everything was OK after that.

How to reconfigure the IP of a k8s node

I created a k8s installed by k0s on the aws ec2 instance. In order to make delivery new cluster faster, I try to make an AMI for it.
However, I started a new ec2 instance, the internal IP changed and the node become NotReady
ubuntu#ip-172-31-26-46:~$ k get node
NAME STATUS ROLES AGE VERSION
ip-172-31-18-145 NotReady <none> 95m v1.21.1-k0s1
ubuntu#ip-172-31-26-46:~$
Would it be possible to reconfigure it ?
Work around
I found a work around to make the AWS AMI working
Short answer
install node with kubelet's --extra-args
update the kube-api to the new IP and restart the kubelet
Details :: 1
In the kubernete cluster, the kubelet plays the node agent node. It will tell kube-api "Hey, I am here and my name is XXX".
The name of a node is its hostname and could not be changed after created. It could be set by --hostname-override.
If you don't change the node name, the kube-api will try to use the hostname then got errors caused by old-node-name not found.
Details :: 2
To k0s, it put kubelet' KUBECONFIG in the /var/lib/k0s/kubelet.conf, there was a kubelet api server location
server: https://172.31.18.9:6443
In order to connect a new kube-api location, please update it
Did you check the kubelet logs? Most likely it's a problem with certificates. You cannot just make an existing node into ami and hope it will work since certificates are signed for specific IP.
Check out the awslabs/amazon-eks-ami repo on github. You can check out how aws does its k8s ami.
There is a files/bootstrap.sh file in repo that is run to bootstrap an instance. It does all sort of things that are instance specific which includes getting certificates.
If you want to "make delivery new cluster faster", I'd recommend to create an ami with all dependencies but without an actual k8s boostraping. Install the k8s (or k0s in your case) after you start the instance from ami, not before. (Or figure out how to regenerate certs and configs that are node specific.)

Replacing dead master in Kubernetes 1.15 cluster with stacked control plane

I have a Kubernetes cluster with 3-master stacked control plane - so each master also has its own etcd instance running locally. The problem I am trying solve is this:
"If one master dies such that it cannot be restarted, how do I replace it?"
Currently, when I try to add the replacement master into the cluster, I get the following error while running kubeadm join:
[check-etcd] Checking that the etcd cluster is healthy
I0302 22:43:41.968068 9158 local.go:66] [etcd] Checking etcd cluster health
I0302 22:43:41.968089 9158 local.go:69] creating etcd client that connects to etcd pods
I0302 22:43:41.986715 9158 etcd.go:106] etcd endpoints read from pods: https://10.0.2.49:2379,https://10.0.225.90:2379,https://10.0.247.138:2379
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 10.0.2.49:2379: connect: no route to host
The 10.0.2.49 node is the one that died. These nodes are all running in an AWS AutoScaling group, so I don't have control over the addresses.
I have drained and deleted the dead master node using kubectl drain and kubectl delete; and I have used etcdctl to make sure the dead node was not in the member list.
Why is it still trying to connect to that node's etcd?
It is still trying to connect to the member because etcd maintains a list of members in its store -- that's how it knows to vote on quorum decisions. I don't believe etcd is unique in that way -- most distributed key-value stores know their member list
The fine manual shows how to remove a dead member, but it also warns to add a new member before removing unhealthy ones.
There is also a project etcdadm that is designed to smooth over some of the rough edges about etcd cluster management, but I haven't used it to say what it is good at versus not
The problem turned out to be that the failed node was still listed in the ConfigMap. Further investigation led me to the following thread, which discusses the same problem:
https://github.com/kubernetes/kubeadm/issues/1300
The solution that worked for me was to edit the ConfigMap manually.
kubectl -n kube-system get cm kubeadm-config -o yaml > tmp-kubeadm-config.yaml
manually edit tmp-kubeadm-config.yaml to remove the old server
kubectl -n kube-system apply -f tmp-kubeadm-config.yaml
I believe updating the etcd member list is still necessary to ensure cluster stability, but it wasn't the full solution.

GKE : How to get number of nodes and pods using API

Currently, I obtaine various information from the GoogleCloudPlatform management console screen, but in the future I would like to obtain it using API.
The information obtained is as follows.
Kubernetes Engine>Clusters>Cluster Size
Kubernetes Engine>Workloads>Pods
Please teach the API corresponding to each information acquisition.
GKE UI under the hood calls Kubernetes API to get information and show in UI.
You can use kubectl to query Kubernetes API to get that information.
kubectl get nodes
kubectl get pods
If you turn on the verbose mode in kubectl then it will show what REST API its calling on the kubernetes api server.
kubectl --v=8 get nodes
kubectl --v=8 get pods
The REST API for nodes and pods are
GET https://kubernetes-api-server-endpoint:6443/api/v1/nodes?limit=500
GET https://kubernetes-api-server-endpoint:6443/api/v1/namespaces/default/pods?limit=500
Here is the doc on how to configure Kubectl to connect with GKE.
Here is the doc from kubernetes on different ways to access Kubernetes API.
You can also use kubectl proxy for trying it out.
Remember to call above rest apis you need to authenticate to kubernetes api server either with a certificate or with a bearer token.
You need to:
install your command line
connect to your project
connect to your cluster
retrieve the number of pod inside your cluster
Install your command line
You can use your prefered command line or you can use the active cloud shell of your browser (the online command line interface integrated to Google Cloud Platform).
Option A) Using your own command line program, you need to install Google Cloud command (gcloud) on your machine.
Option B) Otherwise if you use the active cloud shell, just click on the active cloud shell button on the top of the page.
Connect to your project
(only for option A)
Login to your gcloud platform: gcloud auth login
$ gcloud auth login
Your browser has been opened to visit:
https://accounts.google.com/signin/oauth/oauthchooseaccount?client_id=65654645461.apps.googleusercontent.com&as=yJ_pR_9VSHEGFKSDhzpiw&destination=http%3A%2F%2Flocalhost%3A8085&approval_state=!ChRVVHYTE11IxY2FVbTIxb2xhbTk0SBIfczcxb2xyQ3hfSFVXNEJxcmlYbTVkb21pNVlhOF9CWQ%E2%88%99AJDr988AKKKKKky48vyl43SPBJ-gsNQf8w57Djasdasd&oauthgdpr=1&oauthriskyscope=1&xsrfsig=ChkAASDasdmanZsdasdNF9sDcdEftdfECwCAt5Eg5hcHByb3ZhbF9zdGF0ZRILZGVzdGluYXRpb24ASDfsdf1Eg9vYXV0aHJpc2t5c2NvcGU&flowName=GeneralOAuthFlow
Connect to your project: gcloud config set project your_project_id
$ gcloud projects list
PROJECT_ID NAME PROJECT_NUMBER
first-project-265905 My Project 117684542848
second-project-435504 test 895475526863
$ gcloud config set project first-project-265905
Connect to your cluster
Connected to your project, you need to connect to your cluster.
gcloud container clusters get-credentials your_cluster_name
$ gcloud container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
test-cluster-1 asia-northeast1-a 1.33.33-gke.24 45.600.23.72 f1-micro 1.13.11-gke.14 3 RUNNING
$ gcloud container clusters get-credentials test-cluster-1
Fetching cluster endpoint and auth data.
kubeconfig entry generated for test-cluster-1.
Retrieve the number of nodes/pods inside your cluster
inside a given name space run the command
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-test-cluster-1-default-pool-d85b49-2545 NotReady 24m v1.13.11-gke.14
gke-test-cluster-1-default-pool-d85b49-2dr0 NotReady 3h v1.13.11-gke.14
gke-test-cluster-1-default-pool-d85b49-2f31 NotReady 1d v1.13.11-gke.14
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 0/1 Pending 0 44s
nginx 0/1 Pending 0 1m
Speaking about Python, Kubernetes Engine API could be used in this case.
Kubernetes Engine > Clusters > Cluster Size
In particular, a method get(projectId=None, zone=None, clusterId=None, name=None, x__xgafv=None)
returns an object that contains "currentNodeCount" value.
Kubernetes Engine > Workloads > Pods
A code example for listing pods could be found here:
Access Clusters Using the Kubernetes API

Problem getting pods stats from kubelet and cri-o

We are running Kubernetes with the following configuration:
On-premise Kubernetes 1.11.3, cri-o 1.11.6 and CentOS7 with UEK-4.14.35
I can't make crictl stats to return pods information, it returns empty list only. Has anyone run into the same problem?
Another issue we have, is that when I query the kubelet for stats/summary it returns an empty pods list.
I think that these two issues are related, although I am not sure which one of them is the problem.
I would recommend checking kubelet service to verify health status and debug any suspicious events within the cluster. I assume that CRI-O runtime engine can select kubelet as the main Pods information provider because of its managing Pod lifecycle role.
systemctl status kubelet -l
journalctl -u kubelet
In case you found some errors or dubious events, share it in a comment below this answer.
However, you can use metrics-server, which will collect Pod metrics in the cluster and enable kube-apiserver flags for Aggregation Layer. Here is a good article about Horizontal Pod Autoscaling and monitoring resources via Prometheus.