I have the cluster created many time ago without kubeadm (maybe it was kubespray, but the configuration for that also lost).
Is any way exists to add nodes to that cluster or attach kubeadm to current configuration or extend without erasing by kubespray?
If Kubeadm was used to generate the original cluster then you can log into the Master and run kubeadm token generate. This will generate an API Token for you. With this API token your worker nodes will be able to preform an authenticated CSR against your Master to perform a joining request. You can follow this guide from there to add a new node with the command kubeadm join.
Related
We’re providing our own AMI node images for EKS using the self-managed node feature.
The challenge I’m currently having is how to fetch the kubernetes version from within the EKS node as it starts up.
I’ve tried IMDS - which unfortunately doesn’t seem to have it:
root#ip-10-5-158-249:/# curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/
ami-id
ami-launch-index
ami-manifest-path
autoscaling/
block-device-mapping/
events/
hostname
iam/
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
local-hostname
local-ipv4
mac
metrics/
network/
placement/
profile
reservation-id
It also doesn’t seem to be passed in by the EKS bootstrap script - seems AWS is baking a single K8s version into each AMI. (install-worker.sh).
This is different from Azure’s behaviour of baking a bunch of Kubelets into a single VHD.
I’m hoping for something like IMDS or a passed in user-data param which can be used at runtime to symlink kubelet to the correct kubelet version binary.
Assumed you build your AMI base on the EKS optimized AMI; one of the possible way is use kubelet --version to capture the K8s version in your custom built; as you knew EKS AMI is coupled with the control plane version. If you are not using EKS AMI, you will need to make aws eks describe-cluster call to get cluster information in order to join the cluster; which the version is provided at cluster.version.
I have setup a cluster on AWS using kops. I want to connect to the cluster from my local machine.
I have to do cat ~/.kube/config, copy the content and replace it with my local kube config to access to the cluster.
The problem is that it expires after certain amount of time. Is there a way to get permanent access to the cluster?
Not sure if you can get permanent access to the cluster, but based on official kOps documentation you can just run kops update cluster command with --admin={duration} flag and set expire time to a very big value.
For example - let set it for almost 10 years:
kops update cluster {your-cluster-name} --admin=87599h --yes
Then just copy as usual your config file to the client.
Based on official release notes, to back to the previous behaviour just use value 87600h.
I have the following scenario in the lab and would like to see if its possible to recover. The cluster is broken but very expected since I was testing how far I could go with breaking the cluster and still be able to recover.
Env:
Kubernetes 1.16.3
Kubespray
I was experimenting a bit and don't have any data on this cluster but I am still very curious if it's possible to recover. I have a healthy 3 node etcd cluster with the original configuration (all namespaces, workloads, configmaps etc). I don't have the original SSL certs for the control plane.
I removed all nodes from the cluster (kubeadm reset). I have original manifests and kubelet config and try to re-init master nodes. It is quite more successful than I thought it would be but not where I want it to be.
After successful kubeadm init, the kubelet and control plane containers start successfully but the corresponding pods are not created. I am able to use the kube API with kubectl and see the nodes, namespaces, deployments, etc.
In the kube-system namespace all daemonsets still exist but the pods won't start with the following message:
49m Warning FailedCreate daemonset/kube-proxy Error creating: Timeout: request did not complete within requested timeout
The kubelet logs the following re control plane pods
Jul 21 22:30:02 k8s-master-4 kubelet[13791]: E0721 22:30:02.088787 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-scheduler-k8s-master-4_kube-system(3e128801ef687b022f6c8ae175c9c56d)": Timeout: request did not complete within requested timeout
Jul 21 22:30:53 k8s-master-4 kubelet[13791]: E0721 22:30:53.089517 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-controller-manager-k8s-master-4_kube-system(da5cfae13814fa171a320ce0605de98f)": Timeout: request did not complete within requested timeout
During kubeadm reset/init process I already have some steps so I can get to where I am now (delete serviceaccounts to reset the tokens, delete some configmaps (kuebadm etc))
My question is - is it possible to recover the control plane without the certs. And if its complicated but still possible process I would still like to know.
All help appreciated
Henro
is it possible to recover the control plane without the certs.
Yes, should be able to. The certs 🔏 are required but they don't have to be the very same ones that you created the cluster initially with. All the certificates including the CA can be rotated across the board. The kubelet even supports certificate auto-rotation. The configurations need to match everywhere though. Meaning the CA needs to be the same that created the CSRs and cert keys/certs need to be created from the same CSRs. 🔑
Also, all the components need to use the same CA and be able to authenticate with the API server (kube-controller-manager, kube-scheduler, etc) 🔐. I'm not entirely sure about the logs that you are seeing but it looks like the kube-controller-manager and kube-scheduler are not able to authenticate and join the cluster. So I would take a look at their cert configurations:
/etc/kubernetes/kube-controller-manager.conf
/etc/kubernetes/kube-scheduler.conf
Also, you would find every PKI component that you need to verify under /etc/kubernetes/pki
✌️
Hello I am facing a kubeadm join problem on a remote server.
I want to create a multi-server, multi-node Kubernetes Cluster.
I created a vagrantfile to create a master node and N workers.
It works on a single server.
The master VM is a bridge Vm, to make it accessible to the other available Vms on the network.
I choose Calico as a network provider.
For the Master node this's what I've done:
Using ansible :
Initialize Kubeadm.
Installing a network provider.
Create the join command.
For Worker node:
I execute the join command to join the running master.
I created successfully the cluster on one single hardware server.
I am trying to create regular worker nodes on another server on the same LAN, I ping to the master successfully.
To join the Master node using the generated command.
kubeadm join 192.168.2.50:6443 --token ecqb8f.jffj0hzau45b4ro2
--ignore-preflight-errors all
--discovery-token-ca-cert-hash
sha256:94a0144fe419cfb0cb70b868cd43pbd7a7bf45432b3e586713b995b111bf134b
But it showed this error:
error execution phase preflight: couldn't validate the identity of the API Server:
could not find a JWS signature in the cluster-info ConfigMap for token ID "ecqb8f"
I am asking if there is any specific network configuration to join the remote master node.
It seems token is expired or removed. You can create token manually by running:
kubeadm token create --print-join-command
Use the output as join command.
If you see the output as:
"
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "s1isfw"
To see the stack trace of this error execute with --v=5 or higher
" on a node while joining k8s cluster.
Reason:
This issue arises when the token is expired. TTL for token is 23 hours by default, since the time they've been generated, either when kubeadm init is done or generated separately.
In such a case, you can first check if the token you're using for joining the worker to master can be retrieved by command on master :
kubeadm token list
Steps:
Case 1). if you see NO OUTPUT of the above command, then the best deal is to generate token again from master:
on master execute: kubeadm token create --print-join-command
copy everything and structure if necessary and execute this as a command on worker node.
Check the nodes from master. This worker should now have joined the cluster.
Case 2). if you see an output with
TOKEN, TTL, EXPIRES, USAGES, DESCRIPTION, EXTRA GROUPS.
Check the host entries and pinging among the nodes (master and workers).
(firewall could also cause this.)
use this token again on the workers.
OR go with case 1.
Just wanted to add 1 more thing :
DO NOT USE --ignore-preflight-errors all
as nodes(master to work) commands would show errors later. In my env, I do not use this.
I'm a bit confused by this, because it was working for days without issue.
I use to be able to join nodes to my cluster withoout issue. I would run the below on the master node:
kubeadm init .....
After that, it would generate a join command and token to issue to the other nodes I want to join. Something like this:
kubeadm join --token 99385f.7b6e7e515416a041 192.168.122.100
I would run this on the nodes, and they would join without issue. The next morning, all of a sudden this stopped working. This is what I see when I run the command now:
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for
production clusters.
[preflight] Running pre-flight checks
[tokens] Validating provided token
[discovery] Created cluster info discovery client, requesting info from "http://192.168.122.100:9898/cluster-info/v1/?token-id=99385f"
[discovery] Cluster info object received, verifying signature using given token
[discovery] Cluster info signature and contents are valid, will use API endpoints [https://192.168.122.100:6443]
[bootstrap] Trying to connect to endpoint https://192.168.122.100:6443
[bootstrap] Detected server version: v1.6.0-rc.1
[bootstrap] Successfully established connection with endpoint "https://192.168.122.100:6443"
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
failed to request signed certificate from the API server [cannot create certificate signing request: the server could not find the requested resource]
It seems like the node I'm trying to join does successfully connect to the API server on the master node, but for some reason, it now fails to request a certificate.
Any thoughts?
To me
sudo service kubelet restart
didn't work.
What I did was the following:
Copied from master node contents of /etc/kubernetes/* into slave nodes at same location /etc/kubernetes
I tried again "kubeadm join ..." command. This time the nodes joined the cluster without any complaint.
I think this is a temporary hack, but worked!
ok, I just stop and started kubelet on the master node as shown below, and things started working again:
sudo service kubelet stop
sudo service kubelet start
EDIT:
This only seemed to work on time for me.